Lecturer(s)
|
-
Skrbek Miroslav, Ing. Ph.D.
|
Course content
|
1. Introduction, goals of data mining, knowledge extraction process 2. Data sources, data types, methods and formats for data storage 3. Statistics: mean, variance, median, correlation, normal distribution 4. RapidMiner, basic principles, simple project creation 5. Preprocessing: data normalizing, feature extraction from data, text documents, web pages and images 6. Dimension reduction of data, Principal Component Analysis, feature ranking and feature selection 7. Metrics and cluster analysis 8. Simple models of data: linear and logistic regression 9. Data modeling: decision trees, association rules 10. Classifiers: k-NN, Naive Bayes classifier 11. Model evaluation and testing 12. Advanced modeling methods 13. Interpreting results and report creation
|
Learning activities and teaching methods
|
Monologic (reading, lecture, briefing), Laboratory
- Preparation for classes
- 18 hours per semester
- Class attendance
- 52 hours per semester
- Semestral paper
- 40 hours per semester
- Preparation for exam
- 40 hours per semester
|
Learning outcomes
|
The aim of the course is to teach students the basis of data mining directed to bioinformatics. The course provides topics covering the complete process of data mining: data acquisition, data pre-processing, data analysis, knowledge extraction, data visualization and reporting. . Students will learn the most commonly used principles and algorithms. In exercises, students will acquire practical data mining skills using simple table-type tools and a sophisticated datamining tool.
The student acquires basic knowledge in data mining and practical experience with data mining tool.
|
Prerequisites
|
Programming in Java. Excel and basic knowledge of statistics and operating systems.
|
Assessment methods and criteria
|
Written examination, Seminar work, Interim evaluation
Each student may take 100 points (55 points examination, 45 points tutorial). The assessment requirement is equal to 25 points per semester. For passing examination, the total number of points (examination and tutorial) must be greater or equal to 50 and the examination test must be evaluated to one half points or more. If any of these conditions is not satisfied, the student fails.
|
Recommended literature
|
-
Berka, P. Dobývání znalostí z databází. Academia, 2003. ISBN 80-200-1062-9.
-
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar. Introduction to Data Mining (2nd edition). 2018. ISBN 978-0133128901.
|