Lecturer(s)
|
-
Radimský Jan, prof. PhDr. Ph.D.
-
Šmídová Markéta, Mgr. Ph.D.
|
Course content
|
1. Corpus linguistics, corpus, types of corpora, technical questions and methodological issues. Corpus driven approach and corpus based approach. Methodology of linguistic research, validity, reliability. 2. History of corpus linguistics, corpus typology according to different criteria. 3. Representativeness in corpus design. 4. The corpus statistics: frequency (absolute and relative), comparing frequencies. Co-occurrence measures (MI-score, T-score). The term "collocation". The so-called. "Statistical" and "functional" conception of collocation. 5. The Czech National Corpus, its composition, research capabilities, use for translation practice. Intercorp parallel corpus. 6. Selected national corpora (French: Frantext, SketchEngine, Le Monde, Italian La Repubblica, CORIS / CODIS, ITWAC; Spanish: Crae, Ancora, Coser, Cluvi). 7. Corpus annotation and its types. Search using regular expressions and CQL queries, processing outputs - work with frequency lists. 8. Applied Computational Linguistics issues: information retrieval, information extraction, question answering, text summarization, term extraction. 9. Machine translation - rule-based systems and statistical machine translation, hybrid systems. 10. Solving specific problems using corpora and machine translation tools.
|
Learning activities and teaching methods
|
Monologic (reading, lecture, briefing), Demonstration, Activating (simulations, games, drama)
|
Learning outcomes
|
The course introduces students to basic concepts, methods and problems of the Corpus and Computational linguistics and to the tools that this discipline provides in order to solve applied linguistic issues, particularly with regard to the translation of the text in natural language.
Studenti budou schopni pracovat s textovým korpusem, vyhledávat v něm pomocí regulárních výrazů, využívat lingvistické anotace korpusu. Na paralelních korpusech budou schopni dohledávat respondenty výrazů v cizím jazyce. Budou rozumět základním principům fungování strojového překladu a počítačem podporovaného překladu, což jim umožní efektivně využívat tyto nástroje ve vlastní praxi. Students will be able to work with text corpora, to search them using regular expressions and to use linguistic annotation of a corpus. They will also be able to find out equivalent expression in foreign languages using parallel corpora. They will understand the basic principles of the machine translation and computer aided translation, which will allow them to use the respective tools efficiently in their own practice.
|
Prerequisites
|
No special prerequisites.
|
Assessment methods and criteria
|
Oral examination, Student performance assessment
Class participation, individual work, oral examination.
|
Recommended literature
|
-
ČERMÁK - KLÍMOVÁ - PETKEVIČ (2000). Studie z korpusové lingvistiky. Praha..
-
ČERMÁK, F. - BLATNÁ, R. (eds.), (2005). Jak využívat Český národní korpus. Praha..
-
ČERMÁK, F. - BLATNÁ, R. (2006). Korpusová lingvistika: Stav a modelové přístupy. Praha..
-
ČERMÁK, F. (1995). Jazykový korpus: Prostředek a zdroj poznání. Slovo a Slovesnost, č. 56, s. 119 - 140..
-
KOLEKTIV AUTORŮ (2000). Český národní korpus. Úvod a příručka uživatele. Praha..
-
RADIMSKÝ, JAN (2005). Des méthodes de vérification en linguistique. In: Čermák Petr, Tláskal Jaromír (editores): Las lenguas románicas: su unidad y diversidad, Praha, Univerzita Karlova v Praze, Filozofická fakulta, s. 178-184..
-
RADIMSKÝ, JAN (2007). Projet et construction d'un corpus des textes européens (CORTE). Etudes romanes de Brno, Sborník prací FF MU, L 28, Brno, s. 207-216..
-
ŠTÍCHA, FRANTIŠEK (1994). Čas korpusové lingvistiky. Slovo a slovesnost, 55, s. 141-145..
-
ŠULC, MICHAL (1999). Korpusová lingvistika (první vstup). Praha..
-
TEUBERT, WOLFGANG (ed.), (2007). Text Corpora and Multilingual Lexicography. John Benjamins..
-
WILLIAMS, G. (2005). La linguistique de corpus. Rennes, Presses universitaires de Rennes..
|