FOUNDATIONS OF DATA SCIENCE

Obiettivi formativi

Obiettivi generali: Acquisire i fondamenti della scienza dai dati e dell'apprendimento automatico. Obiettivi specifici: Rendere gli studenti consapevoli degli strumenti teorici e pratici della scienza dei dati e dell'apprendimento automatico, nonché dei loro limiti intrinseci; rendere gli studenti in grado di affrontare problemi reali attraverso gli strumenti più appropriati. Conoscenza e comprensione: Il corso fornisce le nozioni, tecniche e metodologie di base utilizzate nell'ambito della scienza dei dati e dell'apprendimento automatico. Fornisce inoltre i rudimenti di programmazione necessari ad applicare la teoria a casi reali Applicare conoscenza e comprensione: Alla fine del corso, gli studenti sapranno affrontare problemi concreti di scienza dei dati, dalla loro formalizzazione sino alla manipolazione dei dati attraverso appropriati strumenti software. Capacità  critiche e di giudizio: Gli studenti saranno in grado di scegliere le tecniche da applicare al caso specifico e di valutarne le prestazioni. Capacità  comunicative: Gli studenti saranno in grado di rappresentare e comunicare l'informazione estratta dai dati, attraverso l'uso razionale di grafici e indicatori. Capacità  di apprendimento: Gli studenti saranno messi in grado di apprendere autonomamente nozioni sia teoriche che pratiche del campo.

Canale 1
MATTEO CINELLI Scheda docente

Programmi - Frequenza - Esami

Programma
Il corso è costituito da tre temi principali: Machine Learning Foundations: Datasets and their representation (6h), Linear Regression with bias-variance trade-off and regularization (7h), Classification, Calibration, and Performance Evaluation (6h), Non-Parametric models: K-NN, Decision Trees, Random Forest, and XGBoost (5h), Neural Networks and Backpropagation (4h), Image Representation and Convolution (3h), CNNs and other Network Components (5h), Autoencoders and Variational Inference (5h), Text Representation, Self-Attention, and Transformers (3h), Multimodal Machine Learning (2h). Complex Networks and Network Science: Introduction to Network Data and Structural Properties of Networks (10h), Generative Models of Network Formation (7h), Mechanistic Models of Network Formation (5h), Community Detection and Graph Clustering Methods (8h). Programming and Practice: Each objective will be addressed theoretically and through practical programming exercises with Python.
Prerequisiti
Calcolo e Algebra Lineare, inclusi il calcolo delle derivate, la comprensione delle operazioni tra matrici e vettori e della relativa notazione. Fondamenti di Probabilità e Statistica, inclusi i concetti base di probabilità, distribuzioni gaussiane, media e deviazione standard.
Testi di riferimento
Data Science: Bertsimas, O'Hair, Pulleyblank. The Analytics Edge. Machine Learning Christopher M. Bishop, 2006. Pattern Recognition and Machine Learning Deisenroth, Faisal, Ong, 2020. Mathematics for Machine Learning (available at: https://mml-book.github.io/) Deep learning S. Prince. Understanding Deep Learning. MIT Press, 2023. Ian Goofellow, Yoshua Bengio, Aaron Courville, 2017. Deep Learning (available at: https://www.deeplearningbook.org/) Zhang Lipton Li Smola Book, 2019 Dive into Deep Learning (interactive book and code at: http://d2l.ai/index.html) Image Analysis and Recognition, Computer Vision: Richard Szeliski, 2010. Computer Vision: Algorithms and Applications (available at: http://szeliski.org/Book) Network Science Newman, Mark, Networks, 2nd edn (Oxford, 2018; online edn, Oxford Academic, 18 Oct. 2018), https://doi.org/10.1093/oso/9780198805090.001.0001 Barabási, A.-L., Pósfai, M. (2016). Network science. Cambridge: Cambridge University Press. ISBN: 9781107076266 1107076269 Python Allen B. Downey, 2015. Think Python: How to Think Like a Computer Scientist (available at: https://www.greenteapress.com/thinkpython/thinkpython.html) Jake VanderPlas, 2016. Python Data Science Handbook: Tools and Techniques for Developers: Essential Tools for working with Data (Book and notebooks available at: https://github.com/jakevdp/PythonDataScienceHandbook)
Frequenza
Didattica frontale
Modalità di esame
La valutazione del corso è strutturata in modo accurato in tre componenti distinte, ciascuna pensata per valutare diverse competenze: 1) Teoria (34%): Un esame a scelta multipla della durata di 30 minuti valuta la comprensione concettuale dei principi fondamentali della data science e del machine learning. 2) Pratica (33%): Incentrata sull'applicazione pratica, in gruppi di 3-5 studenti: a) Esercitazioni di programmazione (16.5%): Due compiti in Python focalizzati sull’implementazione tecnica. b) Progetto finale e presentazione (16.5%): Applicazione a un problema reale, dalla progettazione del modello alla comunicazione con gli stakeholder. 3) Laboratorio di Network Science (33%): Un esame a scelta multipla di 30 minuti per valutare la padronanza dei concetti di network science.
Modalità di erogazione
Didattica frontale
MATTEO CINELLI Scheda docente

Programmi - Frequenza - Esami

Programma
Il corso è costituito da tre temi principali: Machine Learning Foundations: Datasets and their representation (6h), Linear Regression with bias-variance trade-off and regularization (7h), Classification, Calibration, and Performance Evaluation (6h), Non-Parametric models: K-NN, Decision Trees, Random Forest, and XGBoost (5h), Neural Networks and Backpropagation (4h), Image Representation and Convolution (3h), CNNs and other Network Components (5h), Autoencoders and Variational Inference (5h), Text Representation, Self-Attention, and Transformers (3h), Multimodal Machine Learning (2h). Complex Networks and Network Science: Introduction to Network Data and Structural Properties of Networks (10h), Generative Models of Network Formation (7h), Mechanistic Models of Network Formation (5h), Community Detection and Graph Clustering Methods (8h). Programming and Practice: Each objective will be addressed theoretically and through practical programming exercises with Python.
Prerequisiti
Calcolo e Algebra Lineare, inclusi il calcolo delle derivate, la comprensione delle operazioni tra matrici e vettori e della relativa notazione. Fondamenti di Probabilità e Statistica, inclusi i concetti base di probabilità, distribuzioni gaussiane, media e deviazione standard.
Testi di riferimento
Data Science: Bertsimas, O'Hair, Pulleyblank. The Analytics Edge. Machine Learning Christopher M. Bishop, 2006. Pattern Recognition and Machine Learning Deisenroth, Faisal, Ong, 2020. Mathematics for Machine Learning (available at: https://mml-book.github.io/) Deep learning S. Prince. Understanding Deep Learning. MIT Press, 2023. Ian Goofellow, Yoshua Bengio, Aaron Courville, 2017. Deep Learning (available at: https://www.deeplearningbook.org/) Zhang Lipton Li Smola Book, 2019 Dive into Deep Learning (interactive book and code at: http://d2l.ai/index.html) Image Analysis and Recognition, Computer Vision: Richard Szeliski, 2010. Computer Vision: Algorithms and Applications (available at: http://szeliski.org/Book) Network Science Newman, Mark, Networks, 2nd edn (Oxford, 2018; online edn, Oxford Academic, 18 Oct. 2018), https://doi.org/10.1093/oso/9780198805090.001.0001 Barabási, A.-L., Pósfai, M. (2016). Network science. Cambridge: Cambridge University Press. ISBN: 9781107076266 1107076269 Python Allen B. Downey, 2015. Think Python: How to Think Like a Computer Scientist (available at: https://www.greenteapress.com/thinkpython/thinkpython.html) Jake VanderPlas, 2016. Python Data Science Handbook: Tools and Techniques for Developers: Essential Tools for working with Data (Book and notebooks available at: https://github.com/jakevdp/PythonDataScienceHandbook)
Frequenza
Didattica frontale
Modalità di esame
La valutazione del corso è strutturata in modo accurato in tre componenti distinte, ciascuna pensata per valutare diverse competenze: 1) Teoria (34%): Un esame a scelta multipla della durata di 30 minuti valuta la comprensione concettuale dei principi fondamentali della data science e del machine learning. 2) Pratica (33%): Incentrata sull'applicazione pratica, in gruppi di 3-5 studenti: a) Esercitazioni di programmazione (16.5%): Due compiti in Python focalizzati sull’implementazione tecnica. b) Progetto finale e presentazione (16.5%): Applicazione a un problema reale, dalla progettazione del modello alla comunicazione con gli stakeholder. 3) Laboratorio di Network Science (33%): Un esame a scelta multipla di 30 minuti per valutare la padronanza dei concetti di network science.
Modalità di erogazione
Didattica frontale
INDRO SPINELLI Scheda docente

Programmi - Frequenza - Esami

Programma
Machine Learning Foundations: Datasets and their representation (6h), Linear Regression with bias-variance trade-off and regularization (7h), Classification, Calibration, and Performance Evaluation (6h), Non-Parametric models: K-NN, Decision Trees, Random Forest, and XGBoost (5h), Neural Networks and Backpropagation (4h), Image Representation and Convolution (3h), CNNs and other Network Components (5h), Autoencoders and Variational Inference (5h), Text Representation, Self-Attention, and Transformers (3h), Multimodal Machine Learning (2h). Programming and Practice: Each objective will be addressed theoretically and through practical programming exercises with Python.
Prerequisiti
Calculus and Linear Algebra, including taking derivatives, understanding matrix vector operations and notation Basic Probability and Statistics, including basics of probabilities, gaussian distributions, mean and standard deviation
Testi di riferimento
Data Science: Bertsimas, O'Hair, Pulleyblank. The Analytics Edge. Machine Learning: Christopher M. Bishop, 2006. Pattern Recognition and Machine Learning Deisenroth, Faisal, Ong, 2020. Mathematics for Machine Learning (available at: https://mml-book.github.io/) Deep learning: S. Prince. Understanding Deep Learning. MIT Press, 2023. Ian Goofellow, Yoshua Bengio, Aaron Courville, 2017. Deep Learning (available at: https://www.deeplearningbook.org/) Zhang Lipton Li Smola Book, 2019 Dive into Deep Learning (interactive book and code at: http://d2l.ai/index.html) Image Analysis and Recognition, Computer Vision: Richard Szeliski, 2010. Computer Vision: Algorithms and Applications (available at: http://szeliski.org/Book) Python Allen B. Downey, 2015. Think Python: How to Think Like a Computer Scientist (available at: https://www.greenteapress.com/thinkpython/thinkpython.html) Jake VanderPlas, 2016. Python Data Science Handbook: Tools and Techniques for Developers: Essential Tools for working with Data (Book and notebooks available at: https://github.com/jakevdp/PythonDataScienceHandbook)
Frequenza
Please refer to the degree course regulations.
Modalità di esame
The evaluation for the course is thoughtfully structured into three distinct components, each designed to assess different skill sets and areas of competence. 1) Theory (50%): A 30-minute multiple-choice exam evaluates core conceptual understanding, testing fluency in foundational data science and machine learning principles. 2) Practice (50%): Focused on hands-on application, of a group of 3-5 students combining: a) Coding Assignments (25%): Python-based task emphasizing technical implementation. b) Final Project & Presentation (25%): A real-world problem, from model design to stakeholder communication.
Modalità di erogazione
Traditional frontal lectures
INDRO SPINELLI Scheda docente

Programmi - Frequenza - Esami

Programma
Machine Learning Foundations: Datasets and their representation (6h), Linear Regression with bias-variance trade-off and regularization (7h), Classification, Calibration, and Performance Evaluation (6h), Non-Parametric models: K-NN, Decision Trees, Random Forest, and XGBoost (5h), Neural Networks and Backpropagation (4h), Image Representation and Convolution (3h), CNNs and other Network Components (5h), Autoencoders and Variational Inference (5h), Text Representation, Self-Attention, and Transformers (3h), Multimodal Machine Learning (2h). Programming and Practice: Each objective will be addressed theoretically and through practical programming exercises with Python.
Prerequisiti
Calculus and Linear Algebra, including taking derivatives, understanding matrix vector operations and notation Basic Probability and Statistics, including basics of probabilities, gaussian distributions, mean and standard deviation
Testi di riferimento
Data Science: Bertsimas, O'Hair, Pulleyblank. The Analytics Edge. Machine Learning: Christopher M. Bishop, 2006. Pattern Recognition and Machine Learning Deisenroth, Faisal, Ong, 2020. Mathematics for Machine Learning (available at: https://mml-book.github.io/) Deep learning: S. Prince. Understanding Deep Learning. MIT Press, 2023. Ian Goofellow, Yoshua Bengio, Aaron Courville, 2017. Deep Learning (available at: https://www.deeplearningbook.org/) Zhang Lipton Li Smola Book, 2019 Dive into Deep Learning (interactive book and code at: http://d2l.ai/index.html) Image Analysis and Recognition, Computer Vision: Richard Szeliski, 2010. Computer Vision: Algorithms and Applications (available at: http://szeliski.org/Book) Python Allen B. Downey, 2015. Think Python: How to Think Like a Computer Scientist (available at: https://www.greenteapress.com/thinkpython/thinkpython.html) Jake VanderPlas, 2016. Python Data Science Handbook: Tools and Techniques for Developers: Essential Tools for working with Data (Book and notebooks available at: https://github.com/jakevdp/PythonDataScienceHandbook)
Frequenza
Please refer to the degree course regulations.
Modalità di esame
The evaluation for the course is thoughtfully structured into three distinct components, each designed to assess different skill sets and areas of competence. 1) Theory (50%): A 30-minute multiple-choice exam evaluates core conceptual understanding, testing fluency in foundational data science and machine learning principles. 2) Practice (50%): Focused on hands-on application, of a group of 3-5 students combining: a) Coding Assignments (25%): Python-based task emphasizing technical implementation. b) Final Project & Presentation (25%): A real-world problem, from model design to stakeholder communication.
Modalità di erogazione
Traditional frontal lectures
  • Codice insegnamento1047627
  • Anno accademico2025/2026
  • CorsoComputer Science - Informatica
  • CurriculumCurriculum unico
  • Anno2º anno
  • Semestre1º semestre
  • SSDINF/01
  • CFU6