Writing Arabizi: Orthographic Variation in Romanized Lebanese Arabic on Twitter
Abstract
Over the past few decades, a new form of writing has emerged across the Arab world. Known as Arabizi, it is
a type of Romanized Arabic that uses Latin characters instead of Arabic script. It is mainly used by youth in
technology-related contexts such as social media and texting, and has made many older Arabic speakers
fear that more standard forms of Arabic may be in danger because of its use.
Prior work on Arabizi suggests that although it is used frequently on social media, its orthography is not yet
standardized (Palfreyman and Khalil, 2003; Abdel-Ghaffar et al., 2011). Therefore, this thesis aimed to
examine orthographic variation in Romanized Lebanese Arabic, which has rarely been studied as a
Romanized dialect. It was interested in how often Arabizi is used on Twitter in Lebanon and the extent of its
orthographic variation. Using Twitter data collected from Beirut, tweets were analyzed to discover the most
common orthographic variants in Arabizi for each Arabic letter, as well as the overall rate of Arabizi use.
Results show that Arabizi was not used as frequently as hypothesized on Twitter, probably because of its low
prestige and increased globalization. However, its consonants are relatively standardized, while its vowels
show more variation.
This thesis adds to the existing conversation about Romanized Arabic by presenting a detailed study of
orthographic variation in Lebanese Arabic. The results could have useful implications for Arabic language
ideology and technological endeavors, such as natural language processing or translation programs.