Defining a Matrix Language in Language Mixing

dc.contributorToribio, Almeida
dc.contributorBullock, Barbara
dc.creatorSharath, Vivek
dc.date.accessioned2018-05-03T13:39:38Z
dc.date.available2018-05-03T13:39:38Z
dc.date.issued2018
dc.descriptionResearchers of bilingual code-switching often assume that one of the participating languages serves as the ‘base’ or ‘matrix’ into which elements of the other language are embedded. However, the means by which the matrix language of a clause or extended discourse is determined remains much debated: Is has been variously associated with the numerical frequency of lemmas, with the predominant closed class or functional morphemes, or with the first language in a left-to- right parsing, oftentimes with contradictory results. The matrix language of “Being bilingüe is más sexy” would be either Spanish or English, depending on the language annotation of sexy; but it would be unambiguously English, as established by the gerund and copula or by its initial ordering in the surface string. Accurate identification of the matrix language for bilingual text or speech is important for linguists because it is proposed to be predictive of the grammatical constraints that are observed in code-switching. And, in natural language processing, detection of the matrix language can inform the selection of tools as researchers seek to analyze mixed-language data, which is ever increasing. This poster presentation demonstrates several metrics for easily quantifying and visualizing the matrix language, at various levels of analysis, in ways that are valid and replicable. The metrics were developed by the Bilingual Annotations Tasks (BATs) research group, an interdisciplinary cohort directed by Professors Bullock and Toribio and MA candidate Gualberto Guzmán.en_US
dc.description.departmentLinguisticsen_US
dc.identifierdoi:10.15781/T29S1M32X
dc.identifier.urihttp://hdl.handle.net/2152/65027
dc.language.isoengen_US
dc.relation.ispartofResearch Weeken_US
dc.rights.restrictionOpenen_US
dc.subjectlinguisticsen_US
dc.subjectcode-switchingen_US
dc.subjectlanguageen_US
dc.subjectswitchingen_US
dc.subjectSpanishen_US
dc.subjectEnglishen_US
dc.subjectbilingualen_US
dc.subjectcommunicationen_US
dc.subjectmatrixen_US
dc.subjectnatural language processingen_US
dc.subjectdata scienceen_US
dc.subjectdata analyticsen_US
dc.subjectdataen_US
dc.titleDefining a Matrix Language in Language Mixingen_US
dc.typePosteren_US

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Vivek_Sharath_UG_Poster.pdf
Size:
207.63 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.66 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections