PROGRAMMING AND MACHINE LEARNING FOR BIOLOGICAL DATA

Obiettivi formativi

Al termine del corso, gli studenti saranno in grado di: Eseguire programmi Python Archiviare i dati nei programmi Utilizzare le funzioni integrate Rilevare errori di sintassi che si verificano nei programmi Leggere dati tabulari Visualizzare e analizzare statisticamente i dati tabulari Graficare dati biologici Creare funzioni Ripetere le azioni con i loop Operare delle scelte Determinare dove si sono verificati gli errori Gestire errori ed eccezioni Rendere i programmi leggibili Utilizzare software scritto da altre persone Riconoscere vari formati di dati per rappresentare i dati della sequenza DNA/RNA Realizzare in modo indipendente script Python per: - Leggere dati in sequenza utilizzando moduli Python o BioPython - Analizzare i file di dati - Eseguire programmi esterni - Leggere l'input dalla riga di comando Descrivere un'ampia gamma di tecniche di machine learning Riconoscere quale metodo di apprendimento automatico è applicabile a determinati problemi di analisi dei dati Trasformare i dati biologici per l'applicazione ML. In particolare, trasformare i dati di sequenza in un formato leggibile dal computer per l'input in una pipeline di machine learning Dati di sequenza biologica pre-elaborazione per l'elaborazione del linguaggio naturale Creare un modello RF (Random Forest) per classificare un set di sequenze

Canale 1
ALLEGRA VIA Scheda docente

Programmi - Frequenza - Esami

Programma
Modern methods in biology, such high-throughput sequencing, generate enormous amounts of data. To discover information in these data and to ask the right questions, new methods from data mining, artificial intelligence and deep learning need to be developed and applied. The course is structured into four modules plus a final 4-hour session where a simulation of the exam will be conducted: MODULE1 Python introduction: - Introduction to the Python programming - Use replit.com to write first Python code: variable assignment, print(), input() - Write and run simple Python scripts - Understand the many ways to run Python - Have a first contact with the Developer environment Introduction to ML - Introduction to the very first concepts of ML - ML for biological data: usefulness and challenges - The relationship between AI and Machine Learning - Introduction to data science - How ML works. - A brief history of data. MODULE 2 Python fundamentals - Variables and assignment - Data types (numbers, strings, lists, tuples, dictionaries, sets) - Built-in functions vs methods - Importing modules and libraries - For loops - Conditionals - Anaconda developer environment (Jupyter notebook) - Plotting (matplotlib) - Simple debugging - Python Libraries for data handling: numpy, pandas, matplotlib - Manipulate and analyse tabular data with pandas DataFrame ML fundamentals: - Types of Machine Learning. - Key concepts: supervised and unsupervised learning, classification, regression and clustering problems, classes and labels, training, validation and testing. - The six steps of ML: #1 - Import the data #2 - Clean the data #3 - Split data. Training Set/test set #4 - Create a Model #5 - Check the output #6 - Improve MODULE 4 Advanced Python: - Writing functions - Debugging and error handling - Introduction to classes - Reading and writing pseudocode Traditional ML: Basic principles of: - classification and regression models - clustering models - dimensionality reduction - loss or cost functions - parameters and hyperparameters MODULE 4 Python programming and Machine Learning - Scikit-Learn to implement classification and regression models - Feature selection - Data encoding - Training and testing - Performance assessment EXAM SIMULATION (4h)
Prerequisiti
A biological or biomedical background. Knowledge of a variety of biological data and questions. For the Programming module, no previous experience is necessary. Familiarity with at least one of the main OS (Linux, Mac OSX, Windows 10) is required. To be able to complete Module 4 (Python and Machine learning), the achievement of Modules 1- 3 learning outcomes is a prerequisite.
Testi di riferimento
Learning materials (including slides, tutorials, videos, notes, and extracts from text books, examples, scripts) will be provided before and during the course by the teacher.
Frequenza
The course is fully practical. It is strongly recommended to attend class in order to acquire the skills required to successfully achieve learning outcomes and pass the exam.
Modalità di esame
Remember/Comprehend Students are given a piece of code (= programme including one example for each topic of the course) on paper with numbered lines. and will have to write - line by line - what it does, with explanation and description of each Python instruction; Apply/Analyse Students are given a piece of pseudocode. They have to write the corresponding code Evaluate/Create When done, they copy the programme on their computer (using Jupyter Notebook locally, no Internet connection) and run and debug it. They have to write what the errors are (type of error and why it occurs). Finally, they have to describe what the programme does, i.e. what’s the input, how it gets transformed/filtered and what is the output. Also, explain the reason for the ML algorithm chosen.
Modalità di erogazione
Learning outcomes (LOs) will guide the design of learning experiences (LEs). For the achievement of each LO, the most appropriate LE(s) will be identified and planned. Learning experiences will include, depending on the Bologna descriptors involved: Very brief interactive lectures on the main concept of programming and machine learning; Participatory live coding sessions; Hands-on coding sessions (individual or in pairs/groups) where students will have to independently use Python programming to solve data handling problems and implement ML algorithms;
ALLEGRA VIA Scheda docente

Programmi - Frequenza - Esami

Programma
Modern methods in biology, such high-throughput sequencing, generate enormous amounts of data. To discover information in these data and to ask the right questions, new methods from data mining, artificial intelligence and deep learning need to be developed and applied. The course is structured into four modules plus a final 4-hour session where a simulation of the exam will be conducted: MODULE1 Python introduction: - Introduction to the Python programming - Use replit.com to write first Python code: variable assignment, print(), input() - Write and run simple Python scripts - Understand the many ways to run Python - Have a first contact with the Developer environment Introduction to ML - Introduction to the very first concepts of ML - ML for biological data: usefulness and challenges - The relationship between AI and Machine Learning - Introduction to data science - How ML works. - A brief history of data. MODULE 2 Python fundamentals - Variables and assignment - Data types (numbers, strings, lists, tuples, dictionaries, sets) - Built-in functions vs methods - Importing modules and libraries - For loops - Conditionals - Anaconda developer environment (Jupyter notebook) - Plotting (matplotlib) - Simple debugging - Python Libraries for data handling: numpy, pandas, matplotlib - Manipulate and analyse tabular data with pandas DataFrame ML fundamentals: - Types of Machine Learning. - Key concepts: supervised and unsupervised learning, classification, regression and clustering problems, classes and labels, training, validation and testing. - The six steps of ML: #1 - Import the data #2 - Clean the data #3 - Split data. Training Set/test set #4 - Create a Model #5 - Check the output #6 - Improve MODULE 4 Advanced Python: - Writing functions - Debugging and error handling - Introduction to classes - Reading and writing pseudocode Traditional ML: Basic principles of: - classification and regression models - clustering models - dimensionality reduction - loss or cost functions - parameters and hyperparameters MODULE 4 Python programming and Machine Learning - Scikit-Learn to implement classification and regression models - Feature selection - Data encoding - Training and testing - Performance assessment EXAM SIMULATION (4h)
Prerequisiti
A biological or biomedical background. Knowledge of a variety of biological data and questions. For the Programming module, no previous experience is necessary. Familiarity with at least one of the main OS (Linux, Mac OSX, Windows 10) is required. To be able to complete Module 4 (Python and Machine learning), the achievement of Modules 1- 3 learning outcomes is a prerequisite.
Testi di riferimento
Learning materials (including slides, tutorials, videos, notes, and extracts from text books, examples, scripts) will be provided before and during the course by the teacher.
Frequenza
The course is fully practical. It is strongly recommended to attend class in order to acquire the skills required to successfully achieve learning outcomes and pass the exam.
Modalità di esame
Remember/Comprehend Students are given a piece of code (= programme including one example for each topic of the course) on paper with numbered lines. and will have to write - line by line - what it does, with explanation and description of each Python instruction; Apply/Analyse Students are given a piece of pseudocode. They have to write the corresponding code Evaluate/Create When done, they copy the programme on their computer (using Jupyter Notebook locally, no Internet connection) and run and debug it. They have to write what the errors are (type of error and why it occurs). Finally, they have to describe what the programme does, i.e. what’s the input, how it gets transformed/filtered and what is the output. Also, explain the reason for the ML algorithm chosen.
Modalità di erogazione
Learning outcomes (LOs) will guide the design of learning experiences (LEs). For the achievement of each LO, the most appropriate LE(s) will be identified and planned. Learning experiences will include, depending on the Bologna descriptors involved: Very brief interactive lectures on the main concept of programming and machine learning; Participatory live coding sessions; Hands-on coding sessions (individual or in pairs/groups) where students will have to independently use Python programming to solve data handling problems and implement ML algorithms;
  • Codice insegnamento10611803
  • Anno accademico2024/2025
  • CorsoNeurobiologia - Neurobiology
  • CurriculumCurriculum unico
  • Anno2º anno
  • Semestre2º semestre
  • SSDBIO/10
  • CFU6