Course: The New Statistics for Experimental Biologists

» List of faculties » FPR » KMB
Course title The New Statistics for Experimental Biologists
Course code KMB/929
Organizational form of instruction Lecture + Lesson
Level of course Master
Year of study not specified
Frequency of the course In each academic year, in the winter semester
Semester Winter
Number of ECTS credits 5
Language of instruction English
Status of course unspecified
Form of instruction Face-to-face
Work placements This is not an internship
Recommended optional programme components None
Lecturer(s)
  • Tolar Nikolas, Mgr.
  • Kulish Vladimír, doc. Ing. PhD., DSc.
Course content
Lecture content: 1. Introduction: what is statistics & why to study it; why the new statistics; scientific method & where statistics fits in; difference between probability theory and statistics; inductive versus deductive reasoning; some historical notes (e.g., Sir Ronald Fisher) 2. Randomness in a nutshell: concept of randomness; probability as a quantitative measure of randomness; ergodic principle; unexpectedness & risk; randomness & information; redundancy; complexity & organisation; phenomenon of life from probabilistic viewpoint: life as organised complexity 3. Randomness harnessed: some paradoxes of randomness (e.g., survival of leaders); determinism versus predictability; predictable outcomes of random processes (e.g., Monte Carlo method, hill climbing method, Brownian motion, etc.) 4. Distributions: Bernoulli trials; concept of probability distribution (discrete & continuous); simple distributions (from binomial to Poisson to normal); some other simple distributions (e.g., gamma, beta, etc.); entropy of a distribution (information revisited); information & complexity 5. Statistical modelling: statistical modelling versus probabilistic modelling; simple examples of statistical modelling; Chargaff's Rule; Hardy-Weinberg equilibrium; modelling sequential dependencies (Markov chains); Bayesian thinking; modelling in the case of dependencies; mixture modelling: finite & infinite mixture models 6. Clustering: why to cluster data; quantitative measures of similarity; nonparametric mixture detection; some examples of clustering; hierarchical clustering; clustering as a means for de-noising 7. Hypothesis testing: testing versus classification; the five steps (algorithm) of hypothesis testing; types of error & test; the family-wise error rate; Bonferroni method; the false discovery rate; Benjamini-Hochberg algorithm; independent hypothesis weighting 8. Multivariate analysis: matrices and their motivation; data summaries & preparation; preprocessing the data; principal component analysis (PCA): dimension reduction; regressing one variable on the other; new linear combinations; singular value decomposition, etc. High-throughput count data: some core concepts; count data & its challenges; modelling count data: dispersion & normalisation; basic analysis with examples; default choices and possible modifications; multi-factor designs and linear models; generalised linear models; further statistical concepts (e.g., sharing of dispersion information, count data transformations, log2-tests) 10. Some methods for new analysis: generalised entropy of a probability distribution (Rényi entropy); the Kullback-Leibler divergence as a quantitative measure of statistical distance between distributions; some applications (e.g., DNA sequences seen as texts, signals seen as fractal (or multifractal) time series, choosing data from databases/libraries) 11 & 12. Design of high throughput experiments and their analyses: types of experiments; bias and noise; basic principles in the design of experiments; mean-variance relationships and variance-stabilising transformations; data quality assessment and quality control; longitudinal data; data integration; reproducible research; data representation; statistical sufficiency; efficient computing 13. Conclusion: course overview & summary; Q & A session with students Content of tutorials: The proper use of statistical methods using R (and R-Studio) software is taught to students through practical sessions (tutorials).

Learning activities and teaching methods
unspecified
Learning outcomes
The fundamentals of statistical reasoning and applications of data analysis in the realm of biological sciences, including some relatively new methods in statistics (e.g., the use of information criteria and multi-model inference) are the focus of this introductory statistical course. Starting with the most fundamental concepts of probability and statistics, students gradually learn how to correctly apply the common statistical techniques used in experimental biology. This covers accurate experimental planning and sampling as well as accurate statistical analysis interpretation. The curriculum of the course is structured to meet the needs of students who are evaluating data for their own projects, particularly those who are writing theses. With the statistical software R, data analysis practical abilities are developed.

Prerequisites
Understanding of mathematics to the extent that is taught in secondary school (linear algebra, probability theory).

Assessment methods and criteria
unspecified
Attendance of and participation in practical sessions (tutorials) are compulsory and checked (maximum two absences without a valid reason are allowed). Skills acquired during tutorials are checked in the form of a written test (0 - 5 points) conducted in the mid of the semester; the result of this mid-term test is combined with the result of a written final test (0 - 45 points) conducted in the end of the semester. The minimum total amount of points for pre-exam credit is 31. In addition, before the final exam, students are to submit an essay (mocking a research paper) focused on data analysis. The said essay is graded (0 - 15 points) and the grade is counted towards the total grade of the written final exam, which consists of two questions (0 - 15 points each). The total minimum amount of points to pass the exam equals 31 points.
Recommended literature
  • Andy Hector, The New Statistics with R: an Introduction for Biologists, Oxford University Press, 2015..
  • Susan Holmes & Wolfgang Huber, Modern Statistics for Modern Biology, Cambridge University Press, 2019 (also available in an online form at https://web.stanford.edu/class/bios221/book/).
  • Wim P. Krijnen, Applied Statistics for Bioinformatics using R, Hanze University, Groningen, NL, 2009.


Study plans that include the course
Faculty Study plan (Version) Category of Branch/Specialization Recommended year of study Recommended semester