Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?
Overview
Authors: Perla Al Almaoui, Pierrette Bouillon, Simon Hengchen
Submitted: 28 Feb 2025, Revised 17 Apr 2025
Conference: Accepted to MT Summit 2025 (Track: Implementation and Case Studies)
Field: Computer Science > Computation and Language
Introduction
Linguistic Phenomena: Emergence of Arabizi
Hybrid form of Arabic using Latin characters and numbers
Represents spoken dialects of Arab communities
Predominantly used on social media
Challenges:
Informal structure
Cultural nuances make machine translation difficult
Research Objective
Purpose: Evaluate the translation of Arabizi using LLMs (Large Language Models)
Focus:
Translate Arabizi to Modern Standard Arabic and English
Dialects rarely studied previously
Methodology
Evaluation Criteria:
Combination of human evaluators and automatic metrics
Key Questions
Which Arabic dialects are translated most effectively by LLMs?
Are translations into English more accurate than those into Arabic?
Conclusion
The study is critical in understanding the capabilities and limitations of current LLMs in handling informal and culturally nuanced language forms like Arabizi.