STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY Canale unico
Docente coordinatore e verbalizzante: PIERPAOLO BRUTTI
Modulo 1: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY II
- Tipologia
- Formazione matematico-statistica
- SSD
- SECS-S/01
- Anno
- 1º anno
- Semestre
- 2º semestre
- CFU
- 3
- Distribuzione delle ore
- 18 classroom hours, 12 laboratory hours
- Docenti
- LUCA TARDELLA
Modulo 2: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY I
- Tipologia
- Formazione matematico-statistica
- SSD
- SECS-S/01
- Anno
- 1º anno
- Semestre
- 1º semestre
- CFU
- 9
- Distribuzione delle ore
- 56 classroom hours, 36 laboratory hours
- Docenti
- PIERPAOLO BRUTTI
Obiettivi formativi
Learning goals
Statistical Methods in Data Science is a two-semester course aimed at providing the fundamental tools for:
setting up probabilistic models;
understanding the basic principles of the main inferential problems: estimation, hypothesis testing, model checking and forecasting;
understanding and contrasting the two main inferential paradigms, namely frequentist and Bayesian statistics;
implementing inference on observed data through both optimization and simulation-based (approximation) techniques such as:
Bootstrap
Monte Carlo
Monte Carlo Markov Chain (MCMC)
understanding comparative merits of alternative strategies
developing statistical computations within a suitable software environment like R (www.r-project.org), OpenBUGS (http://openbugs.net/w/FrontPage) and STAN (http://mc-stan.org/).
Knowledge and understanding
On successful completion of this course, students will:
know the main statistical principles, inferential problems, paradigms and algorithms;
assess the empirical and theoretical performance of different modeling approaches;
know the main platforms, programming languages to develop effective implementations.
Applying knowledge and understanding
Besides the understanding of theoretical aspects, thanks to applied homeworks and a dedicated laboratory in the second semester focused on Bayesian modeling, students will be constantly challenged to use and evaluate all the techniques they have learned as well as to propose new modelization suitable for specific tasks at hand.
Making judgements
On successful completion of this course, students will develop a positive critical attitude towards the empirical and theoretical evaluation of statistical methodologies and results.
Communication skills
In preparing the report and oral presentation for the final project of the second semester laboratory, students will learn how to effectively communicate information, ideas, problems and solutions to specialists but also to a general audience.
Learning skills
In this course the students will develop the skills necessary for a successful understanding and application of new statistical methodologies together with their effective implementation. The goal is of course to grow an active attitude towards continued learning throughout a professional career.
Risultati di apprendimento attesi
Statistical Methods in Data Science is a two-semester course aimed at providing the fundamental tools for:
setting up probabilistic models;
understanding the basic principles of the main inferential problems: estimation, hypothesis testing, model checking and forecasting;
understanding and contrasting the two main inferential paradigms, namely frequentist and Bayesian statistics;
implementing inference on observed data through both optimization and simulation-based (approximation) techniques such as:
Bootstrap
Monte Carlo
Monte Carlo Markov Chain (MCMC)
understanding comparative merits of alternative strategies
developing statistical computations within a suitable software environment like R (www.r-project.org), OpenBUGS (http://openbugs.net/w/FrontPage) and STAN (http://mc-stan.org/).
Prerequisiti
Basic Probability, Linear Algebra, Multivariable Calculus
Programma dell’insegnamento
Part I Probability
---------------------
Outcomes, Events & Probability.
Conditional Probability & Independence.
Bayes' Theorem: Interpretation & Use.
Random Variables and Random Vectors.
Directed Graphical Models.
Expected Value, Variance and Covariance.
Univariate and Multivariate Distributions.
Generating Functions and Convergence Theorems.
Part II Statistics
---------------------
The Empirical Distribution and the Bootstrap
Statistical Modelling: the Likelihood Function.
Parameter Estimation: Point and Interval Estimation.
Hypothesis Testing.
Alternative Inferential Frameworks: Frequentist vs Bayesian Inference.
Testi di riferimento
Main References:
F.M. Dekking, C. Kraaikamp, H.P. Lopuhaa, L.E. Meester (2007). A Modern Introduction to Probability and Statistics. Springer.
L. Wasserman (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
Bibliografia
Modulo: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY II
• Jean-Michel Marin and Christian P. Robert, Bayesian Core: A Practical Approach to Computational
Bayesian Statistics, Springer, 2007
• Christian P. Robert and George Casella. Monte Carlo statistical methods (2nd ed.. Springer-Verlag
Inc, 2004.
• Ioannis Ntzoufras, Bayesian Modeling Using WinBUGS. Wiley, 2009.
• Peter Congdon. Bayesian Statistical Modelling (2nd ed.). Wiley, 2006
Modulo: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY I
Riferimenti principali:
F.M. Dekking, C. Kraaikamp, H.P. Lopuhaa, L.E. Meester (2007). A Modern Introduction to Probability and Statistics. Springer.
L. Wasserman (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
Modalità di svolgimento
Lezioni Frontali
Frequenza
Facoltativa
Modalità di esame
Homeworks + Written Exam + Oral Check
Esempi di domande
Domande varie su come i diversi approcci inferenziali (frequentista e bayesiano) si applichino a problemi reali di analisi dei dati, quali siano i criteri per scegliere modelli probabilistici appropriati, e in che modo tecniche di ottimizzazione e simulazione come il Bootstrap, il Monte Carlo e l’MCMC possano essere implementate in ambienti di calcolo statistico come R.
Programmazione delle attività didattiche
Modulo: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY II
- secondo semestre
Modulo: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY I
Obiettivi per lo sviluppo sostenibile - Agenda ONU 2030
- Anno accademico2024/2025
- Corso di studio a cui afferisce l’insegnamentoData Science
- Linguaeng
- CFU12 CFU distribuiti in 2 moduli didattici integrati
- Durata complessiva122 ore