1 August 2011 - 31 July 2014 (actual work began in April 2011)
Using modern and historical texts, newspapers, and recordings from Uyghur, a Turkic language of Chinese Turkestan, this project creates an annotated corpus. Its aim is to understand the typological development of complex verb constructions. Light verbs (also termed “auxiliaries”), express semantic nuances such as speaker intention, irony, and agency. Light verbs (LVs) modulate the meaning of the main verb and differ from other complex predicates in quantity, semantic range, and structural properties, and have been described both as V+N and V-V sequences. These two structures may or may not be related, and their semantics are poorly understood cross-linguistically, regardless of the term used (e.g. “LVs”, coverbs, restructuring predicates). Uyghur [ISO 639-3: uig] has both V+N and V-V constructions. This project analyzes co-occurrence properties of V-V and V+N sequences, investigating the formal and semantic properties of these complex predicates. Here is an example of a V-V LV sequence:
men | tünügün | shundaq | hérip | kettim. | ‘I was so (totally) exhausted yesterday.’ |
pn1sg | yesterday | so.much | be.tired-cnv | leave- Pst.Dir1sg |
The project focuses on V-V sequences, since these are unusually elaborated in Uyghur, expressing aspectual and actional nuances of the preceding lexical V, as well as telicity, agency, and directionality. Dwyer hypothesizes that LVs form a verb class distinct from lexical verbs and auxiliaries; are defined by a range of formal and functional criteria; and can be diachronically unstable (contra Butt 2003). Dwyer’s pilot study also suggests potential universal principles in relations between LVs, argument structure, and subcategorization. The project evaluates the diachronic development of monoclausal V-V and V+N sequences cross-linguistically. The quantitative and qualitative corpus work entails complex annotation schemes and data mining. The project entails building corpus databases, linguistically annotating texts, querying the database, and eliciting grammaticality judgments from native speakers. We will annotate and query an XML-structured multi-genre corpus of modern Uyghur (ca. 200,000 words), and early modern Uyghur (1891-1935, ca. 70,000 words), and also prepare a smaller database of antecedent and related languages (Chagatay, Karakhanid, and Uzbek) of V-V and V+N complex predicates. Diachronic comparisons allow the evaluation of LV stability and grammaticalization clines; analyzing co-occurrences (and controlling for variation) within Turkic is a prerequisite for cross-linguistic comparisons.
The project tests hypotheses on 16 syntactic, semantic, prosodic, and pragmatic properties in two periods of Uyghur. Properties which are not computationally tractable (primarily semantics) will rely on native speaker evaluations from student assistants and project consultants. Methods are both quantitative and qualitative, informed and assisted by a corpus consultant. Deliverables include (1) evidenced-based cross-linguistic criteria for the complex predicates that we provisionally call “LVs”; (2) the first annotated corpus for a major Central Asian Turkic language; (3) three publications; and (4) web dissemination of corpus samples.
Research on complex predicates demands tests that can extend our understanding of complex predicates cross-linguistically. This corpus-based study advances linguistic science by integrating formal and semantic approaches, providing explicit, testable criteria for complex predicates, revealing fundamental organizational properties of language. The project extends the syntactic-semantic approach by also considering agency and telicity. Our corpus-driven approach to elicitation (using altered corpus examples to explore grammaticality and gaps) is also methodologically innovative. Furthermore, the pilot study shows that we can indeed test pattern stability, clines, and syntactic properties even with a shallow diachronic design, thus contributing to grammaticalization theory. Investigating the origins and diverse realizations of complex predicates allows us to refine our definition and understanding of the understudied category of LVs cross-linguistically. Finally, the project is transformative in its multilayered empirical basis: the use of a semi-automated corpus for a Less Commonly Taught Language in combination with elicited input; the comparison of two time periods and areal/antecedent languages; and the investigation different levels within linguistic theory, from phonetics to pragmatics. These methods can be an example of how other projects approach similar problems in linguistics.
Making available the first tagged corpus of a Turkic language other than Turkish benefits the academic and Uyghur-learning public, as well as two graduate students. In applied linguistics, the teaching of Turkic languages will benefit directly from this work: the Uyghur language program at the University of Kansas has already benefited from the pilot work reported here; both the corpus and its analysis will be immediately applied to the second-year Uyghur textbook (in preparation) and a new Uyghur-language summer school planned for 2013. Finally, linguistics students will be trained in corpus creation and analysis (a rarity in the United States, especially for Less Commonly Taught Languages). Such training allows linguistics students to be agile researchers and widely employable in both basic and applied areas.
Credit: Painting of Mahmud al-Kashgari © 1981 by Ghazi Emet.