The language-tagging & orthographic normalization of spoken mixed-language data, with a focus on Texas German

Date

2022-08-11

Authors

Blevins, Margaret Marie

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Spoken data from language-contact situations is extremely varied. This heterogeneity makes it difficult to make comparisons across corpora, and to use corpus linguistic tools on the data. Standardized systems have been proposed for other kinds of linguistic annotation, such as phonetic transcription, e.g., IPA (International Phonetic Association 1999); orthographic transcription, e.g., GAT (Selting 1998; Selting et al. 2009; Schmidt et al. 2015); and POS-tagging, e.g., STTS (Schiller et al. 1999; Westpfahl et al. 2017), but there is no standardized system for normalization or language-tagging. I propose such a system, i.e., I construct and implement guidelines for the systematic normalization and language-tagging of (German-English) data. My normalization and language-tagging system is based on the annotation systems of ten existing German variety corpora. As a case-study for the implementation of my proposed normalization and language-tagging guidelines, I also construct a corpus of Texas German using transcriptions of conversational interviews from the Texas German Dialect Project (Boas et al. 2010).

Description

LCSH Subject Headings

Citation