Course: Artificial intelligence (AI) in biological data acquisition and analysis

« Back
Course title Artificial intelligence (AI) in biological data acquisition and analysis
Course code KMB/024
Organizational form of instruction Lecture + Practice + Seminar
Level of course Doctoral
Year of study not specified
Frequency of the course In each academic year, in the summer semester.
Semester Summer
Number of ECTS credits 2
Language of instruction English
Status of course Compulsory-optional
Form of instruction Face-to-face
Work placements This is not an internship
Recommended optional programme components None
Lecturer(s)
  • Bondar Alexey, Mgr. Ph.D.
  • Kulik Natallia, Ing. Ph.D.
  • Kulich Ivan, Mgr. Ph.D.
  • Opelka Jakub, Mgr.
Course content
Combined lecture-practical blocks: 1. Introduction to Machine Learning (ML), Deep Learning (DL) and AI in Biology Introductory session. Basic terminology of the topic, general principles behind AI, ML, and DL. Introduction to general use AI tools in biology (ChatGPT, Perplexity, Copilot, Grok, Gemini, NightCafe). Introduction to Python, Jupyter, and Colab for ML/DL in Biology. 2. Data in biology: Types, Challenges and Preprocessing. Principles of handling of different types of biological data (DNA/RNA sequence data, protein sequence data, microscopy images, large pools of text). Data normalization and cleaning, handling of outliers and missing data. Data anonymization. Data preparation for model training and execution. 3. Supervised learning algorithms Supervised learning principles, data annotation, feature selection. Common supervised algorithms (logistic regression, random forest). Annotating a dataset and training a supervised model based on annotated data. Model utilization for a test dataset. 4. Unsupervised learning algorithms Principles and areas of application for unsupervised learning. Common unsupervised algorithms (k-means, principal component analysis). Noise2Void. Application of unsupervised learning algorithms for pattern discovery. 5. Model Evaluation and Validation Evaluation of ML/DL models and validation of obtained results. Parameters for assessment of ML/DL models. Statistical testing of the results, overfitting indicators, reliable generalization parameters. 6. Generative Models in Biology Variational Autoencoders and Generative Adversarial Networks in biology. Large language models (LLMs) and their utilization in biological data processing. Seminar 1: sharing personal experience in using AI for studies and research. 7. Convolution Neural Networks in Bioimaging AI and DL in microscopy data analysis. Convolution filters and local patterns. U-Net neural network, YOLO object detection. Adaptive optics. Cell detection and classification with Cellpose and QuPath. 8. Recurrent Neural Networks and Sequence Models DNA/RNA sequence classification. Network memory. Long short-term memory, Gated Recurrent Unit. Practical exercise in human splice site (SpliceAI) and transcription factor binding (DeepBind) prediction 9. AI in Protein Structure Prediction and Drug Discovery Prediction of protein structures, binding interfaces, protein complexes. Synthetic ligands and drug discovery. AlphaFold, and RoseTTAFold for protein structure analysis 10. AI in Proteomics and Protein Function Prediction AI application for determination of protein functional activity, interactions and localization. Practical exercise in prediction of protein subcellular localization and creation of synthetic images. 11. Natural Language Processing and text mining in biology Natural language processing vs text mining in biological context. Available large text databases (PubMed, UniProt). Classical text mining vs Deep learning-based approaches. Text Mining with BioBERT 12. Ethical concerns and future of AI in Biology Data ownership, sensitive data, energy consumption. Hallucinating AI. Future directions, consciousness and technological singularity. Seminar 2: showcase of AI utilization in own research

Learning activities and teaching methods
  • Class attendance - 30 hours per semester
  • Preparation for exam - 10 hours per semester
  • Preparation for classes - 10 hours per semester
Learning outcomes
The course aims to provide the students with essential concepts of Artificial Intelligence, Machine Learning and Deep Learning, and make the students familiar and comfortable with diverse available tools for biological applications including their own research.
The course is primarily intended for PhD students but interested MSc students can sign up as well.
Prerequisites
Basic understanding of Python, Google Colab, and Unix Shell are required. Programming experience is an advantage but not essential. Suggested training courses: Google Colab: https://colab.research.google.com/ Unix Shell: https://swcarpentry.github.io/shell-novice/ Python: https://swcarpentry.github.io/python-novice-inflammation/ https://swcarpentry.github.io/python-novice-gapminder/

Assessment methods and criteria
unspecified
The student must successfully complete all practical exercises and 2 seminars during the course (60%), and pass a written test (40%)
Recommended literature
  • AI in Biological Sciences 2022: https://www.mdpi.com/2075-1729/12/9/1430.
  • AI/ML in Space Biology https://canvas.instructure.com/courses/9595041/assignments/syllabus.
  • Exemplary generative AI course: https://www.coursera.org/learn/generative-ai-for-everyone.
  • Google AI training centre: https://www.ai.google/get-started/learn-ai-skills/.
  • Python programming essential for AI applications: https://www.deeplearning.ai/short-courses/ai-python-for-beginners/.


Study plans that include the course
Faculty Study plan (Version) Category of Branch/Specialization Recommended year of study Recommended semester