Course: Corpus and Computational Linguistics for translators

» List of faculties » FFI » URO
Course title Corpus and Computational Linguistics for translators
Course code URO/0KKLP
Organizational form of instruction Lecture + Seminar
Level of course Master
Year of study not specified
Semester Winter
Number of ECTS credits 4
Language of instruction Czech
Status of course Compulsory
Form of instruction Face-to-face
Work placements This is not an internship
Recommended optional programme components None
Course availability The course is available to visiting students
Lecturer(s)
  • Radimský Jan, prof. PhDr. Ph.D.
  • Šmídová Markéta, Mgr. Ph.D.
Course content
1. Corpus linguistics, corpus, types of corpora, technical questions and methodological issues. Corpus driven approach and corpus based approach. Methodology of linguistic research, validity, reliability. 2. History of corpus linguistics, corpus typology according to different criteria. 3. Representativeness in corpus design. 4. The corpus statistics: frequency (absolute and relative), comparing frequencies. Co-occurrence measures (MI-score, T-score). The term "collocation". The so-called. "Statistical" and "functional" conception of collocation. 5. The Czech National Corpus, its composition, research capabilities, use for translation practice. Intercorp parallel corpus. 6. Selected national corpora (French: Frantext, SketchEngine, Le Monde, Italian La Repubblica, CORIS / CODIS, ITWAC; Spanish: Crae, Ancora, Coser, Cluvi). 7. Corpus annotation and its types. Search using regular expressions and CQL queries, processing outputs - work with frequency lists. 8. Applied Computational Linguistics issues: information retrieval, information extraction, question answering, text summarization, term extraction. 9. Machine translation - rule-based systems and statistical machine translation, hybrid systems. 10. Solving specific problems using corpora and machine translation tools.

Learning activities and teaching methods
Monologic (reading, lecture, briefing), Demonstration, Activating (simulations, games, drama)
Learning outcomes
The course introduces students to basic concepts, methods and problems of the Corpus and Computational linguistics and to the tools that this discipline provides in order to solve applied linguistic issues, particularly with regard to the translation of the text in natural language.
Studenti budou schopni pracovat s textovým korpusem, vyhledávat v něm pomocí regulárních výrazů, využívat lingvistické anotace korpusu. Na paralelních korpusech budou schopni dohledávat respondenty výrazů v cizím jazyce. Budou rozumět základním principům fungování strojového překladu a počítačem podporovaného překladu, což jim umožní efektivně využívat tyto nástroje ve vlastní praxi. Students will be able to work with text corpora, to search them using regular expressions and to use linguistic annotation of a corpus. They will also be able to find out equivalent expression in foreign languages using parallel corpora. They will understand the basic principles of the machine translation and computer aided translation, which will allow them to use the respective tools efficiently in their own practice.
Prerequisites
No special prerequisites.

Assessment methods and criteria
Oral examination, Student performance assessment

Class participation, individual work, oral examination.
Recommended literature
  • ČERMÁK - KLÍMOVÁ - PETKEVIČ (2000). Studie z korpusové lingvistiky. Praha..
  • ČERMÁK, F. - BLATNÁ, R. (eds.), (2005). Jak využívat Český národní korpus. Praha..
  • ČERMÁK, F. - BLATNÁ, R. (2006). Korpusová lingvistika: Stav a modelové přístupy. Praha..
  • ČERMÁK, F. (1995). Jazykový korpus: Prostředek a zdroj poznání. Slovo a Slovesnost, č. 56, s. 119 - 140..
  • KOLEKTIV AUTORŮ (2000). Český národní korpus. Úvod a příručka uživatele. Praha..
  • RADIMSKÝ, JAN (2005). Des méthodes de vérification en linguistique. In: Čermák Petr, Tláskal Jaromír (editores): Las lenguas románicas: su unidad y diversidad, Praha, Univerzita Karlova v Praze, Filozofická fakulta, s. 178-184..
  • RADIMSKÝ, JAN (2007). Projet et construction d'un corpus des textes européens (CORTE). Etudes romanes de Brno, Sborník prací FF MU, L 28, Brno, s. 207-216..
  • ŠTÍCHA, FRANTIŠEK (1994). Čas korpusové lingvistiky. Slovo a slovesnost, 55, s. 141-145..
  • ŠULC, MICHAL (1999). Korpusová lingvistika (první vstup). Praha..
  • TEUBERT, WOLFGANG (ed.), (2007). Text Corpora and Multilingual Lexicography. John Benjamins..
  • WILLIAMS, G. (2005). La linguistique de corpus. Rennes, Presses universitaires de Rennes..


Study plans that include the course
Faculty Study plan (Version) Category of Branch/Specialization Recommended year of study Recommended semester