Lecturer(s)


Skrbek Miroslav, Ing. Ph.D.

Beránek Ladislav, doc. Ing. CSc.

Course content

1. Introduction, goals of data mining, knowledge extraction process 2. Data sources, data types, methods and formats for data storage 3. Statistics: mean, variance, median, correlation, normal distribution 4. RapidMiner, basic principles, simple project creation 5. Preprocessing: data normalizing, feature extraction from data, text documents, web pages and images 6. Dimension reduction of data, Principal Component Analysis, feature ranking and feature selection 7. Metrics and cluster analysis 8. Simple models of data: linear and logistic regression 9. Data modeling: decision trees, association rules 10. Classifiers: kNN, Naive Bayes classifier 11. Model evaluation and testing 12. Advanced modeling methods 13. Interpreting results and report creation

Learning activities and teaching methods

Monologic (reading, lecture, briefing), Laboratory
 Preparation for classes
 18 hours per semester
 Class attendance
 52 hours per semester
 Semestral paper
 40 hours per semester
 Preparation for exam
 40 hours per semester

Learning outcomes

The aim of the course is to teach students the basis of data mining directed to bioinformatics. The course provides topics covering the complete process of data mining: data acquisition, data preprocessing, data analysis, knowledge extraction, data visualization and reporting. . Students will learn the most commonly used principles and algorithms. In exercises, students will acquire practical data mining skills using simple tabletype tools and a sophisticated datamining tool.
The student acquires basic knowledge in data mining and practical experience with data mining tool.

Prerequisites

Programming in Java. Excel and basic knowledge of statistics and operating systems.

Assessment methods and criteria

Written examination, Seminar work, Interim evaluation
Each student may take 100 points (55 points examination, 45 points tutorial). The assessment requirement is equal to 25 points per semester. For passing examination, the total number of points (examination and tutorial) must be greater or equal to 50 and the examination test must be evaluated to one half points or more. If any of these conditions is not satisfied, the student fails.

Recommended literature


Berka, P. Dobývání znalostí z databází. Academia, 2003. ISBN 8020010629.

PangNing Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar. Introduction to Data Mining (2nd edition). 2018. ISBN 9780133128901.
