🧞

Evaluating LLMs for Arabizi Translation

Apr 22, 2025

Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

Overview

  • Authors: Perla Al Almaoui, Pierrette Bouillon, Simon Hengchen
  • Submitted: 28 Feb 2025, Revised 17 Apr 2025
  • Conference: Accepted to MT Summit 2025 (Track: Implementation and Case Studies)
  • Field: Computer Science > Computation and Language

Introduction

  • Linguistic Phenomena: Emergence of Arabizi
    • Hybrid form of Arabic using Latin characters and numbers
    • Represents spoken dialects of Arab communities
    • Predominantly used on social media
  • Challenges:
    • Informal structure
    • Cultural nuances make machine translation difficult

Research Objective

  • Purpose: Evaluate the translation of Arabizi using LLMs (Large Language Models)
  • Focus:
    • Translate Arabizi to Modern Standard Arabic and English
    • Dialects rarely studied previously

Methodology

  • Evaluation Criteria:
    • Combination of human evaluators and automatic metrics

Key Questions

  • Which Arabic dialects are translated most effectively by LLMs?
  • Are translations into English more accurate than those into Arabic?

Conclusion

  • The study is critical in understanding the capabilities and limitations of current LLMs in handling informal and culturally nuanced language forms like Arabizi.

References