PROGRAMMING AND MACHINE LEARNING FOR BIOLOGICAL DATA

Course objectives

After completing the course, learners will be able to: Run Python programs Store data in programs Use built-in functions Detect syntax errors occurring in programs Read tabular data Visualise and statistically analyse tabular data Plot biological data Create functions Repeat actions with loops Make choices Determine where errors occurred Manage errors and exceptions Make programs readable Use software that other people have written Recognize various data formats to represent DNA/RNA sequence data Independently write Python scripts to Read in sequence data using Python or BioPython modules Parse data files Run external programs Read input from the command line Describe a wide range of machine learning techniques Recognize what machine learning method is most applicable to given data analysis problems Transform biological data for ML application. In particular, transform sequence data into a machine-readable format for input into a machine learning pipeline Preprocess Biological Sequence Data for Natural Language Processing Build a Random Forest model (RF) to classify a set of sequences

Channel 1
ALLEGRA VIA Lecturers' profile

Program - Frequency - Exams

Course program
Modern methods in biology, such high-throughput sequencing, generate enormous amounts of data. To discover information in these data and to ask the right questions, new methods from data mining, artificial intelligence and deep learning need to be developed and applied. The course is structured into three modules plus a final 4-hour session where a simulation of the exam will be conducted. MODULE 1 - INTRODUCTION Introduction to the course Understand how the class is composed Presentation of the syllabus Work environment Python programming language: Python introduction Introduce Python language and the Python interpreter Understand the many ways to run Python: examples of the many ways to run Python programmes/scripts. First Python commands and how they work in different environments (shell terminal, including the interactive interpreter and text editors where to write code to run in a shell terminal, code editors, IDEs, and Notebooks) Work environment: Anaconda > Jupyter Lab, Jupyter notebooks. Google Colab Machine Learning: Introduction to the very first concepts of ML ML for biological data: usefulness and challenges The relationship between AI and Machine Learning Discuss why ML can be very helpful in biology and the differences between "traditional" programming and ML Introduction to data science How ML works. A brief history of data. MODULE 2 - FUNDAMENTALS Python fundamentals: Variables and assignment Data types (numbers, strings, lists, tuples, dictionaries, sets) Built-in functions vs methods Importing modules and libraries For loops Conditionals Anaconda developer environment (Jupyter notebook) Plotting (matplotlib) Simple debugging Python Libraries for data handling: numpy, pandas, matplotlib Manipulate and analyse tabular data with pandas DataFrame ML fundamentals: Types of Machine Learning. Key concepts: supervised and unsupervised learning, classification, regression and clustering problems, classes and labels, training, validation and testing. The six steps of ML: #1 - Import the data #2 - Clean the data #3 - Split data. Training Set/test set #4 - Create a Model #5 - Check the output #6 - Improve MODULE 3 - ADVANCED PROGRAMMING Advanced Python: File handling. Introduce the different ways to read input from files and write output to a file. Writing functions Classes Machine Learning: Building good training datasets - Data preprocessing Training and test datasets KNN Creating a model Model validation (k-fold cross validation) Hyperparameter tuning Performance evaluation EXAM SIMULATION (4h)
Prerequisites
A biological or biomedical background. Knowledge of a variety of biological data and questions. For the Programming module, no previous experience is necessary. Familiarity with at least one of the main OS (Linux, Mac OSX, Windows 10) is required. To be able to complete Module 4 (Python and Machine learning), the achievement of Modules 1- 3 learning outcomes is a prerequisite.
Books
Sebastian Raschka - Introduction to Machine Learning https://sebastianraschka.com/resources/ml-lectures-1/ Sebastian Raschka, Yuxi Liu, Vahid Mirjalili Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python (ISBN-10: 1801819319 ISBN-13: 978-1801819312) Further learning materials (including slides, tutorials, videos, notes, and extracts from text books, examples, scripts) will be provided before and during the course by the teacher.
Frequency
The course is fully practical. It is strongly recommended to attend class in order to acquire the skills required to successfully achieve learning outcomes and pass the exam.
Exam mode
PYTHON PROGRAMMING Remember/Comprehend Students are given a piece of code (= programme including one example for each topic of the course) on paper with numbered lines. and will have to write - line by line - what it does, with explanation and description of each Python instruction; Apply/Analyse Students are given a piece of pseudocode. They have to write the corresponding code Evaluate/Create When done, they copy the programme on their computer (using Jupyter Notebook locally, no Internet connection) and run and debug it. They have to write what the errors are (type of error and why it occurs). Finally, they have to describe what the programme does, i.e. what’s the input, how it gets transformed/filtered and what is the output. MACHINE LEARNING Discussion of a project: Students will choose a dataset from e.g. Kaggle and generate a whole ML pipeline in a Google Colab or Jupyter Notebook. The pipeline will be described and discussed in the oral part of the test.
Lesson mode
Learning outcomes (LOs) will guide the design of learning experiences (LEs). For the achievement of each LO, the most appropriate LE(s) will be identified and planned. Learning experiences will include, depending on the Bologna descriptors involved: Very brief interactive lectures on the main concept of programming and machine learning; Participatory live coding sessions; Hands-on coding sessions (individual or in pairs/groups) where students will have to independently use Python programming to solve data handling problems and implement ML algorithms;
  • Lesson code10611803
  • Academic year2025/2026
  • CourseGenetics and Molecular Biology
  • CurriculumGenetica e Biologia Molecolare (percorso valido anche ai fini del conseguimento del doppio titolo italo-francese)
  • Year1st year
  • Semester2nd semester
  • SSDBIO/10
  • CFU6