Fundamentals of Data Science

Obiettivi formativi

General objectives: This course introduces the foundational tools of data science by combining machine learning, statistical modeling, and network science to explore real-world data in its structural and dynamic complexity. It equips students to treat data as a strategic asset by combining Python programming, data analysis, machine learning, and approaches from complex systems to develop a more interpretive and systemic understanding of data. Through industry-standard methods, participants will learn to analyze datasets, uncover meaningful patterns, and produce accurate predictions. The curriculum provides the skills to design discriminative models for classification and regression and generative models for tasks such as data synthesis and significance evaluation. Specific objectives: The course is built around three core dimensions. Machine Learning Foundations: Datasets and their representation (6h), Linear Regression with bias-variance trade-off and regularization (7h), Classification, Calibration, and Performance Evaluation (6h), Non-Parametric models: K-NN, Decision Trees, Random Forest, and XGBoost (5h), Neural Networks and Backpropagation (4h), Image Representation and Convolution (3h), CNNs and other Network Components (5h), Autoencoders and Variational Inference (5h), Text Representation, Self-Attention, and Transformers (3h), Multimodal Machine Learning (2h). Complex Networks and Network Science: Introduction to Network Data and Structural Properties of Networks (10h), Generative Models of Network Formation (7h), Mechanistic Models of Network Formation (5h), Community Detection and Graph Clustering Methods (8h). Programming and Practice: Each objective will be addressed theoretically and through practical programming exercises with Python. Knowledge and understanding: This course comprehensively introduces the foundational concepts, theories, techniques, and methodologies in data science. It elucidates the core principles behind this discipline and critically examines their inherent limitations. Additionally, the course highlights practical applications with focused computer vision and network science case studies, providing students with a well-rounded understanding of theory and practice. Apply knowledge and understanding: By the end of the course, students will be proficient in tackling real-world data science challenges by translating complex phenomena into formal analytical and machine learning frameworks. They will be able to select and apply appropriate algorithms, refine models, and extract actionable insights from data across domains. The curriculum emphasizes a full data science workflow—data acquisition, representation, preprocessing, and exploratory analysis—followed by model training, tuning, evaluation, and deployment. This course systematically cultivates the advanced programming and modeling competencies that are indispensable for the contemporary data scientist. Critical and judgment skills: Students will develop the ability to analyze real-world challenges and select the most suitable data science techniques by weighing data characteristics, computational constraints, and domain-specific objectives. They will evaluate their solutions models using quantitative metrics to make informed, context-driven decisions that balance technical excellence with broader societal impact. Communication skills: Students will cultivate the ability to effectively present and communicate data-driven insights using well-designed visualizations and key performance indicators. They will learn to rigorously articulate their analytical solutions and systematically explain the structure of their code. This emphasis on communication is further reinforced through a final project presentation and an interactive discussion session, ensuring that students can clearly convey complex technical concepts to both technical and non-technical audiences. Learning ability: Students will be able to learn both the theory and the practice of the field autonomously to face other problems in data analysis, machine learning, computer vision, and network science.

Canale 1
MATTEO CINELLI Scheda docente

Programmi - Frequenza - Esami

Programma
Il corso è costituito da tre temi principali: Machine Learning Foundations: Datasets and their representation (6h), Linear Regression with bias-variance trade-off and regularization (7h), Classification, Calibration, and Performance Evaluation (6h), Non-Parametric models: K-NN, Decision Trees, Random Forest, and XGBoost (5h), Neural Networks and Backpropagation (4h), Image Representation and Convolution (3h), CNNs and other Network Components (5h), Autoencoders and Variational Inference (5h), Text Representation, Self-Attention, and Transformers (3h), Multimodal Machine Learning (2h). Complex Networks and Network Science: Introduction to Network Data and Structural Properties of Networks (10h), Generative Models of Network Formation (7h), Mechanistic Models of Network Formation (5h), Community Detection and Graph Clustering Methods (8h). Programming and Practice: Each objective will be addressed theoretically and through practical programming exercises with Python.
Prerequisiti
Calcolo e Algebra Lineare, inclusi il calcolo delle derivate, la comprensione delle operazioni tra matrici e vettori e della relativa notazione. Fondamenti di Probabilità e Statistica, inclusi i concetti base di probabilità, distribuzioni gaussiane, media e deviazione standard.
Testi di riferimento
Data Science: Bertsimas, O'Hair, Pulleyblank. The Analytics Edge. Machine Learning Christopher M. Bishop, 2006. Pattern Recognition and Machine Learning Deisenroth, Faisal, Ong, 2020. Mathematics for Machine Learning (available at: https://mml-book.github.io/) Deep learning S. Prince. Understanding Deep Learning. MIT Press, 2023. Ian Goofellow, Yoshua Bengio, Aaron Courville, 2017. Deep Learning (available at: https://www.deeplearningbook.org/) Zhang Lipton Li Smola Book, 2019 Dive into Deep Learning (interactive book and code at: http://d2l.ai/index.html) Image Analysis and Recognition, Computer Vision: Richard Szeliski, 2010. Computer Vision: Algorithms and Applications (available at: http://szeliski.org/Book) Network Science Newman, Mark, Networks, 2nd edn (Oxford, 2018; online edn, Oxford Academic, 18 Oct. 2018), https://doi.org/10.1093/oso/9780198805090.001.0001 Barabási, A.-L., Pósfai, M. (2016). Network science. Cambridge: Cambridge University Press. ISBN: 9781107076266 1107076269 Python Allen B. Downey, 2015. Think Python: How to Think Like a Computer Scientist (available at: https://www.greenteapress.com/thinkpython/thinkpython.html) Jake VanderPlas, 2016. Python Data Science Handbook: Tools and Techniques for Developers: Essential Tools for working with Data (Book and notebooks available at: https://github.com/jakevdp/PythonDataScienceHandbook)
Frequenza
Didattica frontale
Modalità di esame
La valutazione del corso è strutturata in modo accurato in tre componenti distinte, ciascuna pensata per valutare diverse competenze: 1) Teoria (34%): Un esame a scelta multipla della durata di 30 minuti valuta la comprensione concettuale dei principi fondamentali della data science e del machine learning. 2) Pratica (33%): Incentrata sull'applicazione pratica, in gruppi di 3-5 studenti: a) Esercitazioni di programmazione (16.5%): Due compiti in Python focalizzati sull’implementazione tecnica. b) Progetto finale e presentazione (16.5%): Applicazione a un problema reale, dalla progettazione del modello alla comunicazione con gli stakeholder. 3) Laboratorio di Network Science (33%): Un esame a scelta multipla di 30 minuti per valutare la padronanza dei concetti di network science.
Modalità di erogazione
Didattica frontale
INDRO SPINELLI Scheda docente

Programmi - Frequenza - Esami

Programma
The course is built around three core dimensions. Machine Learning Foundations: Datasets and their representation (6h), Linear Regression with bias-variance trade-off and regularization (7h), Classification, Calibration, and Performance Evaluation (6h), Non-Parametric models: K-NN, Decision Trees, Random Forest, and XGBoost (5h), Neural Networks and Backpropagation (4h), Image Representation and Convolution (3h), CNNs and other Network Components (5h), Autoencoders and Variational Inference (5h), Text Representation, Self-Attention, and Transformers (3h), Multimodal Machine Learning (2h). Complex Networks and Network Science: Introduction to Network Data and Structural Properties of Networks (10h), Generative Models of Network Formation (7h), Mechanistic Models of Network Formation (5h), Community Detection and Graph Clustering Methods (8h). Programming and Practice: Each objective will be addressed theoretically and through practical programming exercises with Python.
Prerequisiti
Calculus and Linear Algebra, including taking derivatives, understanding matrix vector operations and notation Basic Probability and Statistics, including basics of probabilities, gaussian distributions, mean and standard deviation
Testi di riferimento
Data Science: Bertsimas, O'Hair, Pulleyblank. The Analytics Edge. Machine Learning: Christopher M. Bishop, 2006. Pattern Recognition and Machine Learning Deisenroth, Faisal, Ong, 2020. Mathematics for Machine Learning (available at: https://mml-book.github.io/) Deep learning: S. Prince. Understanding Deep Learning. MIT Press, 2023. Ian Goofellow, Yoshua Bengio, Aaron Courville, 2017. Deep Learning (available at: https://www.deeplearningbook.org/) Zhang Lipton Li Smola Book, 2019 Dive into Deep Learning (interactive book and code at: http://d2l.ai/index.html) Image Analysis and Recognition, Computer Vision: Richard Szeliski, 2010. Computer Vision: Algorithms and Applications (available at: http://szeliski.org/Book) Network Science Newman, Mark, Networks, 2nd edn (Oxford, 2018; online edn, Oxford Academic, 18 Oct. 2018), https://doi.org/10.1093/oso/9780198805090.001.0001 Barabási, A.-L., Pósfai, M. (2016). Network science. Cambridge: Cambridge University Press. ISBN: 9781107076266 1107076269 Python Allen B. Downey, 2015. Think Python: How to Think Like a Computer Scientist (available at: https://www.greenteapress.com/thinkpython/thinkpython.html) Jake VanderPlas, 2016. Python Data Science Handbook: Tools and Techniques for Developers: Essential Tools for working with Data (Book and notebooks available at: https://github.com/jakevdp/PythonDataScienceHandbook)
Frequenza
Please refer to the degree course regulations.
Modalità di esame
The evaluation for the course is thoughtfully structured into three distinct components, each designed to assess different skill sets and areas of competence. 1) Theory (34%): A 30-minute multiple-choice exam evaluates core conceptual understanding, testing fluency in foundational data science and machine learning principles. 2) Practice (33%): Focused on hands-on application, of a group of 3-5 students combining: a) Coding Assignments (16.5%): Two Python-based tasks emphasizing technical implementation. b) Final Project & Presentation (16.5%): A real-world problem, from model design to stakeholder communication. 3) Network Science Laboratory (33%): A specialized 30-minute multiple-choice exam assesses mastery of network science concepts.
Modalità di erogazione
Traditional frontal lectures
  • Codice insegnamento1047224
  • Anno accademico2025/2026
  • CorsoData Science
  • CurriculumCurriculum unico
  • Anno1º anno
  • Semestre1º semestre
  • SSDINF/01
  • CFU9