Lecturer(s)
|
-
Prokýšek Miloš, PhDr. Ph.D.
-
Budík Ondřej, Ing.
-
Geyer Jakub, Mgr. Ph.D.
-
Bukovský Ivo, doc. Ing. Ph.D.
|
Course content
|
1. Relational and NoSQL data storages 2. Datawarehouse a. Star, Snowflake and Data Vault patterns b. ETL, OLAP, OLTP 3. Distributed database systems a. CAP theorem b. Master-slave, mirroring, sharding 4. NoSQL database systems a. Key-value b. Column oriented c. Document databases d. Graph databases e. Time-series databases 5. Large datasets a. Velocity, variability, volume b. Unstructured data c. ELT processing, curated data 6. Stream data processing a. Buffering b. Distribution c. Storing d. Real-time processing 7. Data mining a. Data sources and datatypes b. Data matrix c. Data storages 8. Similarity measurement, methods of cluster analysis 9. Basic data models a. Linear and log-linear regression 10. Data modelling a. Decision trees, association rules 11. Classificatory a. k-NN b. naive bayes classifier 12. Data lakes a. Distributed filesystems b. Hadoop - family solutions
|
Learning activities and teaching methods
|
Monologic (reading, lecture, briefing)
- Class attendance
- 56 hours per semester
- Preparation for classes
- 56 hours per semester
- Semestral paper
- 20 hours per semester
- Preparation for exam
- 20 hours per semester
|
Learning outcomes
|
The aim of the course is to deepen students' knowledge in the field of data storage techniques and data processing. The course focuses on big data processing techniques and, data storage in non-relational databases and data analyses and mining.
Knowledge of advanced architectures and methods for data processing.
|
Prerequisites
|
Knowledge of relational databases and basic knowledge of query and programming languages.
|
Assessment methods and criteria
|
Oral examination
Semestral test: Practical test (data processing and analyses), end of semester (credit week). Exam: Oral examination with two theoretical topics. The student must answer each question at least satisfactorily.
|
Recommended literature
|
-
A. GORELIK. The Enterprise Big Data Lake: Delivering the Promise of Big Data and Data Science, 1st Edition, O'Reilly Media 2019, ISBN: 978-1491931554.
-
C. CHURCHER. Beginning Database Design: From Novice to Professional. 1st Corrected ed., Apress 2007. ISBN: 978-1590597699.
-
J. GRUS. Data Science from Scratch: First Principles with Python, 2nd Edition, O'Reilly Media 2019, ISBN: 978-1492041139.
-
P.-N. TAN, M. STEINBACH, A. KARPATNE, V. KUMAR. Introduction to Data Mining (2nd edition), 2018. ISBN 978-0133128901.
|