PROGRAMMING AND MACHINE LEARNING FOR BIOLOGICAL DATA

Obiettivi formativi

Al termine del corso, gli studenti saranno in grado di: Eseguire programmi Python Archiviare i dati nei programmi Utilizzare le funzioni integrate Rilevare errori di sintassi che si verificano nei programmi Leggere dati tabulari Visualizzare e analizzare statisticamente i dati tabulari Graficare dati biologici Creare funzioni Ripetere le azioni con i loop Operare delle scelte Determinare dove si sono verificati gli errori Gestire errori ed eccezioni Rendere i programmi leggibili Utilizzare software scritto da altre persone Riconoscere vari formati di dati per rappresentare i dati della sequenza DNA/RNA Realizzare in modo indipendente script Python per: - Leggere dati in sequenza utilizzando moduli Python o BioPython - Analizzare i file di dati - Eseguire programmi esterni - Leggere l'input dalla riga di comando Descrivere un'ampia gamma di tecniche di machine learning Riconoscere quale metodo di apprendimento automatico è applicabile a determinati problemi di analisi dei dati Trasformare i dati biologici per l'applicazione ML. In particolare, trasformare i dati di sequenza in un formato leggibile dal computer per l'input in una pipeline di machine learning Dati di sequenza biologica pre-elaborazione per l'elaborazione del linguaggio naturale Creare un modello RF (Random Forest) per classificare un set di sequenze

Canale 1
ALLEGRA VIA Scheda docente

Programmi - Frequenza - Esami

Programma
Modern methods in biology, such high-throughput sequencing, generate enormous amounts of data. To discover information in these data and to ask the right questions, new methods from data mining, artificial intelligence and deep learning need to be developed and applied. The course is structured into three modules plus a final 4-hour session where a simulation of the exam will be conducted. MODULE 1 - INTRODUCTION Introduction to the course Understand how the class is composed Presentation of the syllabus Work environment Python programming language: Python introduction Introduce Python language and the Python interpreter Understand the many ways to run Python: examples of the many ways to run Python programmes/scripts. First Python commands and how they work in different environments (shell terminal, including the interactive interpreter and text editors where to write code to run in a shell terminal, code editors, IDEs, and Notebooks) Work environment: Anaconda > Jupyter Lab, Jupyter notebooks. Google Colab Machine Learning: Introduction to the very first concepts of ML ML for biological data: usefulness and challenges The relationship between AI and Machine Learning Discuss why ML can be very helpful in biology and the differences between "traditional" programming and ML Introduction to data science How ML works. A brief history of data. MODULE 2 - FUNDAMENTALS Python fundamentals: Variables and assignment Data types (numbers, strings, lists, tuples, dictionaries, sets) Built-in functions vs methods Importing modules and libraries For loops Conditionals Anaconda developer environment (Jupyter notebook) Plotting (matplotlib) Simple debugging Python Libraries for data handling: numpy, pandas, matplotlib Manipulate and analyse tabular data with pandas DataFrame ML fundamentals: Types of Machine Learning. Key concepts: supervised and unsupervised learning, classification, regression and clustering problems, classes and labels, training, validation and testing. The six steps of ML: #1 - Import the data #2 - Clean the data #3 - Split data. Training Set/test set #4 - Create a Model #5 - Check the output #6 - Improve MODULE 3 - ADVANCED PROGRAMMING Advanced Python: File handling. Introduce the different ways to read input from files and write output to a file. Writing functions Classes Machine Learning: Building good training datasets - Data preprocessing Training and test datasets KNN Creating a model Model validation (k-fold cross validation) Hyperparameter tuning Performance evaluation EXAM SIMULATION (4h)
Prerequisiti
A biological or biomedical background. Knowledge of a variety of biological data and questions. For the Programming module, no previous experience is necessary. Familiarity with at least one of the main OS (Linux, Mac OSX, Windows 10) is required. Familiarity with the Google Suite is not a prerequisite but it is advised.
Testi di riferimento
Sebastian Raschka - Introduction to Machine Learning https://sebastianraschka.com/resources/ml-lectures-1/ Sebastian Raschka, Yuxi Liu, Vahid Mirjalili Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python (ISBN-10: 1801819319 ISBN-13: 978-1801819312) Further learning materials (including slides, tutorials, videos, notes, and extracts from text books, examples, scripts) will be provided before and during the course by the teacher.
Frequenza
The course is fully practical. It is strongly recommended to attend class in order to acquire the skills required to successfully achieve learning outcomes and pass the exam.
Modalità di esame
PYTHON PROGRAMMING Remember/Comprehend Students are given a piece of code (= programme including one example for each topic of the course) on paper with numbered lines. and will have to write - line by line - what it does, with explanation and description of each Python instruction; Apply/Analyse Students are given a piece of pseudocode. They have to write the corresponding code Evaluate/Create When done, they copy the programme on their computer (using Jupyter Notebook locally, no Internet connection) and run and debug it. They have to write what the errors are (type of error and why it occurs). Finally, they have to describe what the programme does, i.e. what’s the input, how it gets transformed/filtered and what is the output. MACHINE LEARNING Discussion of a project: Students will choose a dataset from e.g. Kaggle and generate a whole ML pipeline in a Google Colab or Jupyter Notebook. The pipeline will be described and discussed in the oral part of the test.
Modalità di erogazione
Learning outcomes (LOs) will guide the design of learning experiences (LEs). For the achievement of each LO, the most appropriate LE(s) will be identified and planned. Learning experiences will include, depending on the Bologna descriptors involved: Very brief interactive lectures on the main concept of programming and machine learning; Participatory live coding sessions; Hands-on coding sessions (individual or in pairs/groups) where students will have to independently use Python programming to solve data handling problems and implement ML algorithms;
  • Codice insegnamento10611803
  • Anno accademico2025/2026
  • CorsoBiochemistry – Biochimica
  • CurriculumCurriculum unico
  • Anno2º anno
  • Semestre1º semestre
  • SSDBIO/10
  • CFU6