Defining a Matrix Language in Language Mixing




Sharath, Vivek

Journal Title

Journal ISSN

Volume Title





Researchers of bilingual code-switching often assume that one of the participating languages serves as the ‘base’ or ‘matrix’ into which elements of the other language are embedded. However, the means by which the matrix language of a clause or extended discourse is determined remains much debated: Is has been variously associated with the numerical frequency of lemmas, with the predominant closed class or functional morphemes, or with the first language in a left-to- right parsing, oftentimes with contradictory results. The matrix language of “Being bilingüe is más sexy” would be either Spanish or English, depending on the language annotation of sexy; but it would be unambiguously English, as established by the gerund and copula or by its initial ordering in the surface string. Accurate identification of the matrix language for bilingual text or speech is important for linguists because it is proposed to be predictive of the grammatical constraints that are observed in code-switching. And, in natural language processing, detection of the matrix language can inform the selection of tools as researchers seek to analyze mixed-language data, which is ever increasing. This poster presentation demonstrates several metrics for easily quantifying and visualizing the matrix language, at various levels of analysis, in ways that are valid and replicable. The metrics were developed by the Bilingual Annotations Tasks (BATs) research group, an interdisciplinary cohort directed by Professors Bullock and Toribio and MA candidate Gualberto Guzmán.

LCSH Subject Headings