Information package & Course catalogue

UNIVERSITY OF SOUTH BOHEMIA IN ČESKÉ BUDĚJOVICE

for academic year 2026/2027
UNIVERSITY OF SOUTH BOHEMIA IN ČESKÉ BUDĚJOVICE

Česky

Course: Corpus and Computational Linguistics for translators

» List of faculties » FFI » URO

Course title	Corpus and Computational Linguistics for translators
Course code	URO/0KKLP
Organizational form of instruction	Lecture + Seminar
Level of course	Master
Year of study	not specified
Semester	Winter
Number of ECTS credits	4
Language of instruction	Czech
Status of course	Compulsory
Form of instruction	Face-to-face
Work placements	This is not an internship
Recommended optional programme components	None
Course availability	The course is available to visiting students

Lecturer(s)
Radimský Jan, prof. PhDr. Ph.D. Šmídová Markéta, Mgr. Ph.D.
Course content
1. Corpus linguistics, corpus, types of corpora, technical questions and methodological issues. Corpus driven approach and corpus based approach. Methodology of linguistic research, validity, reliability. 2. History of corpus linguistics, corpus typology according to different criteria. 3. Representativeness in corpus design. 4. The corpus statistics: frequency (absolute and relative), comparing frequencies. Co-occurrence measures (MI-score, T-score). The term "collocation". The so-called. "Statistical" and "functional" conception of collocation. 5. The Czech National Corpus, its composition, research capabilities, use for translation practice. Intercorp parallel corpus. 6. Selected national corpora (French: Frantext, SketchEngine, Le Monde, Italian La Repubblica, CORIS / CODIS, ITWAC; Spanish: Crae, Ancora, Coser, Cluvi). 7. Corpus annotation and its types. Search using regular expressions and CQL queries, processing outputs - work with frequency lists. 8. Applied Computational Linguistics issues: information retrieval, information extraction, question answering, text summarization, term extraction. 9. Machine translation - rule-based systems and statistical machine translation, hybrid systems. 10. Solving specific problems using corpora and machine translation tools.
Learning activities and teaching methods
Monologic (reading, lecture, briefing), Demonstration, Activating (simulations, games, drama)
Learning outcomes
The course introduces students to basic concepts, methods and problems of the Corpus and Computational linguistics and to the tools that this discipline provides in order to solve applied linguistic issues, particularly with regard to the translation of the text in natural language. Studenti budou schopni pracovat s textovým korpusem, vyhledávat v něm pomocí regulárních výrazů, využívat lingvistické anotace korpusu. Na paralelních korpusech budou schopni dohledávat respondenty výrazů v cizím jazyce. Budou rozumět základním principům fungování strojového překladu a počítačem podporovaného překladu, což jim umožní efektivně využívat tyto nástroje ve vlastní praxi. Students will be able to work with text corpora, to search them using regular expressions and to use linguistic annotation of a corpus. They will also be able to find out equivalent expression in foreign languages using parallel corpora. They will understand the basic principles of the machine translation and computer aided translation, which will allow them to use the respective tools efficiently in their own practice.
Prerequisites
No special prerequisites.
Assessment methods and criteria
Oral examination, Student performance assessment Class participation, individual work, oral examination.
Recommended literature
ČERMÁK - KLÍMOVÁ - PETKEVIČ (2000). Studie z korpusové lingvistiky. Praha.. ČERMÁK, F. - BLATNÁ, R. (eds.), (2005). Jak využívat Český národní korpus. Praha.. ČERMÁK, F. - BLATNÁ, R. (2006). Korpusová lingvistika: Stav a modelové přístupy. Praha.. ČERMÁK, F. (1995). Jazykový korpus: Prostředek a zdroj poznání. Slovo a Slovesnost, č. 56, s. 119 - 140.. KOLEKTIV AUTORŮ (2000). Český národní korpus. Úvod a příručka uživatele. Praha.. RADIMSKÝ, JAN (2005). Des méthodes de vérification en linguistique. In: Čermák Petr, Tláskal Jaromír (editores): Las lenguas románicas: su unidad y diversidad, Praha, Univerzita Karlova v Praze, Filozofická fakulta, s. 178-184.. RADIMSKÝ, JAN (2007). Projet et construction d'un corpus des textes européens (CORTE). Etudes romanes de Brno, Sborník prací FF MU, L 28, Brno, s. 207-216.. ŠTÍCHA, FRANTIŠEK (1994). Čas korpusové lingvistiky. Slovo a slovesnost, 55, s. 141-145.. ŠULC, MICHAL (1999). Korpusová lingvistika (první vstup). Praha.. TEUBERT, WOLFGANG (ed.), (2007). Text Corpora and Multilingual Lexicography. John Benjamins.. WILLIAMS, G. (2005). La linguistique de corpus. Rennes, Presses universitaires de Rennes..

Study plans that include the course

Faculty	Study plan (Version)	Category of Branch/Specialization	Recommended year of study	Recommended semester

UNIVERSITY OF SOUTH BOHEMIA IN ČESKÉ BUDĚJOVICE, date of update: 09.07.2026 23:50. Data created for academic year 2026/2027