Programma
Modern methods in biology, such high-throughput sequencing, generate enormous amounts of data. To discover information in these data and to ask the right questions, new methods from data mining, artificial intelligence and deep learning need to be developed and applied.
The course is structured into three modules plus a final 4-hour session where a simulation of the exam will be conducted.
MODULE 1 - INTRODUCTION
Introduction to the course
Understand how the class is composed
Presentation of the syllabus
Work environment
Python programming language:
Python introduction
Introduce Python language and the Python interpreter
Understand the many ways to run Python: examples of the many ways to run Python programmes/scripts.
First Python commands and how they work in different environments (shell terminal, including the interactive interpreter and text editors where to write code to run in a shell terminal, code editors, IDEs, and Notebooks)
Work environment: Anaconda > Jupyter Lab, Jupyter notebooks.
Google Colab
Machine Learning:
Introduction to the very first concepts of ML
ML for biological data: usefulness and challenges
The relationship between AI and Machine Learning
Discuss why ML can be very helpful in biology and the differences between "traditional" programming and ML
Introduction to data science
How ML works.
A brief history of data.
MODULE 2 - FUNDAMENTALS
Python fundamentals:
Variables and assignment
Data types (numbers, strings, lists, tuples, dictionaries, sets)
Built-in functions vs methods
Importing modules and libraries
For loops
Conditionals
Anaconda developer environment (Jupyter notebook)
Plotting (matplotlib)
Simple debugging
Python Libraries for data handling: numpy, pandas, matplotlib
Manipulate and analyse tabular data with pandas DataFrame
ML fundamentals:
Types of Machine Learning.
Key concepts: supervised and unsupervised learning, classification, regression and clustering problems, classes and labels, training, validation and testing.
The six steps of ML:
#1 - Import the data
#2 - Clean the data
#3 - Split data. Training Set/test set
#4 - Create a Model
#5 - Check the output
#6 - Improve
MODULE 3 - ADVANCED PROGRAMMING
Advanced Python:
File handling. Introduce the different ways to read input from files and write output to a file.
Writing functions
Classes
Machine Learning:
Building good training datasets - Data preprocessing
Training and test datasets
KNN
Creating a model
Model validation (k-fold cross validation)
Hyperparameter tuning
Performance evaluation
EXAM SIMULATION (4h)
Prerequisiti
A biological or biomedical background. Knowledge of a variety of biological data and questions.
For the Programming module, no previous experience is necessary. Familiarity with at least one of the main OS (Linux, Mac OSX, Windows 10) is required.
Familiarity with the Google Suite is not a prerequisite but it is advised.
Testi di riferimento
Sebastian Raschka - Introduction to Machine Learning
https://sebastianraschka.com/resources/ml-lectures-1/
Sebastian Raschka, Yuxi Liu, Vahid Mirjalili
Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python (ISBN-10: 1801819319 ISBN-13: 978-1801819312)
Further learning materials (including slides, tutorials, videos, notes, and extracts from text books, examples, scripts) will be provided before and during the course by the teacher.
Frequenza
The course is fully practical. It is strongly recommended to attend class in order to acquire the skills required to successfully achieve learning outcomes and pass the exam.
Modalità di esame
PYTHON PROGRAMMING
Remember/Comprehend
Students are given a piece of code (= programme including one example for each topic of the course) on paper with numbered lines. and will have to write - line by line - what it does, with explanation and description of each Python instruction;
Apply/Analyse
Students are given a piece of pseudocode. They have to write the corresponding code
Evaluate/Create
When done, they copy the programme on their computer (using Jupyter Notebook locally, no Internet connection) and run and debug it. They have to write what the errors are (type of error and why it occurs). Finally, they have to describe what the programme does, i.e. what’s the input, how it gets transformed/filtered and what is the output.
MACHINE LEARNING
Discussion of a project: Students will choose a dataset from e.g. Kaggle and generate a whole ML pipeline in a Google Colab or Jupyter Notebook. The pipeline will be described and discussed in the oral part of the test.
Modalità di erogazione
Learning outcomes (LOs) will guide the design of learning experiences (LEs). For the achievement of each LO, the most appropriate LE(s) will be identified and planned.
Learning experiences will include, depending on the Bologna descriptors involved:
Very brief interactive lectures on the main concept of programming and machine learning;
Participatory live coding sessions;
Hands-on coding sessions (individual or in pairs/groups) where students will have to independently use Python programming to solve data handling problems and implement ML algorithms;