Programma
Modern methods in biology, such high-throughput sequencing, generate enormous amounts of data. To discover information in these data and to ask the right questions, new methods from data mining, artificial intelligence and deep learning need to be developed and applied.
The course is structured into four modules plus a final 4-hour session where a simulation of the exam will be conducted:
MODULE1
Python introduction:
- Introduction to the Python programming
- Use replit.com to write first Python code: variable assignment, print(), input()
- Write and run simple Python scripts
- Understand the many ways to run Python
- Have a first contact with the Developer environment
Introduction to ML
- Introduction to the very first concepts of ML
- ML for biological data: usefulness and challenges
- The relationship between AI and Machine Learning
- Introduction to data science
- How ML works.
- A brief history of data.
MODULE 2
Python fundamentals
- Variables and assignment
- Data types (numbers, strings, lists, tuples, dictionaries, sets)
- Built-in functions vs methods
- Importing modules and libraries
- For loops
- Conditionals
- Anaconda developer environment (Jupyter notebook)
- Plotting (matplotlib)
- Simple debugging
- Python Libraries for data handling: numpy, pandas, matplotlib
- Manipulate and analyse tabular data with pandas DataFrame
ML fundamentals:
- Types of Machine Learning.
- Key concepts: supervised and unsupervised learning, classification, regression and clustering problems, classes and labels, training, validation and testing.
- The six steps of ML:
#1 - Import the data
#2 - Clean the data
#3 - Split data. Training Set/test set
#4 - Create a Model
#5 - Check the output
#6 - Improve
MODULE 4
Advanced Python:
- Writing functions
- Debugging and error handling
- Introduction to classes
- Reading and writing pseudocode
Traditional ML:
Basic principles of:
- classification and regression models
- clustering models
- dimensionality reduction
- loss or cost functions
- parameters and hyperparameters
MODULE 4
Python programming and Machine Learning
- Scikit-Learn to implement classification and regression models
- Feature selection
- Data encoding
- Training and testing
- Performance assessment
EXAM SIMULATION (4h)
Prerequisiti
A biological or biomedical background. Knowledge of a variety of biological data and questions.
For the Programming module, no previous experience is necessary. Familiarity with at least one of the main OS (Linux, Mac OSX, Windows 10) is required.
To be able to complete Module 4 (Python and Machine learning), the achievement of Modules 1- 3 learning outcomes is a prerequisite.
Testi di riferimento
Learning materials (including slides, tutorials, videos, notes, and extracts from text books, examples, scripts) will be provided before and during the course by the teacher.
Frequenza
The course is fully practical. It is strongly recommended to attend class in order to acquire the skills required to successfully achieve learning outcomes and pass the exam.
Modalità di esame
Remember/Comprehend
Students are given a piece of code (= programme including one example for each topic of the course) on paper with numbered lines. and will have to write - line by line - what it does, with explanation and description of each Python instruction;
Apply/Analyse
Students are given a piece of pseudocode. They have to write the corresponding code
Evaluate/Create
When done, they copy the programme on their computer (using Jupyter Notebook locally, no Internet connection) and run and debug it. They have to write what the errors are (type of error and why it occurs). Finally, they have to describe what the programme does, i.e. what’s the input, how it gets transformed/filtered and what is the output. Also, explain the reason for the ML algorithm chosen.
Modalità di erogazione
Learning outcomes (LOs) will guide the design of learning experiences (LEs). For the achievement of each LO, the most appropriate LE(s) will be identified and planned.
Learning experiences will include, depending on the Bologna descriptors involved:
Very brief interactive lectures on the main concept of programming and machine learning;
Participatory live coding sessions;
Hands-on coding sessions (individual or in pairs/groups) where students will have to independently use Python programming to solve data handling problems and implement ML algorithms;