STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY Canale unico

Docente coordinatore e verbalizzante: PIERPAOLO BRUTTI

Modulo 1: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY II

Tipologia
Formazione matematico-statistica
SSD
SECS-S/01
Anno
1º anno
Semestre
2º semestre
CFU
3
Distribuzione delle ore
18 classroom hours, 12 laboratory hours
Docenti
LUCA TARDELLA

Modulo 2: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY I

Tipologia
Formazione matematico-statistica
SSD
SECS-S/01
Anno
1º anno
Semestre
1º semestre
CFU
9
Distribuzione delle ore
56 classroom hours, 36 laboratory hours
Docenti
PIERPAOLO BRUTTI

Obiettivi formativi

Learning goals

Statistical Methods in Data Science is a two-semester course aimed at providing the fundamental tools for:

setting up probabilistic models;
understanding the basic principles of the main inferential problems: estimation, hypothesis testing, model checking and forecasting;
understanding and contrasting the two main inferential paradigms, namely frequentist and Bayesian statistics;
implementing inference on observed data through both optimization and simulation-based (approximation) techniques such as:
Bootstrap
Monte Carlo
Monte Carlo Markov Chain (MCMC)
understanding comparative merits of alternative strategies
developing statistical computations within a suitable software environment like R (www.r-project.org), OpenBUGS (http://openbugs.net/w/FrontPage) and STAN (http://mc-stan.org/).

Knowledge and understanding

On successful completion of this course, students will:
know the main statistical principles, inferential problems, paradigms and algorithms;
assess the empirical and theoretical performance of different modeling approaches;
know the main platforms, programming languages to develop effective implementations.

Applying knowledge and understanding

Besides the understanding of theoretical aspects, thanks to applied homeworks and a dedicated laboratory in the second semester focused on Bayesian modeling, students will be constantly challenged to use and evaluate all the techniques they have learned as well as to propose new modelization suitable for specific tasks at hand.

Making judgements

On successful completion of this course, students will develop a positive critical attitude towards the empirical and theoretical evaluation of statistical methodologies and results.

Communication skills

In preparing the report and oral presentation for the final project of the second semester laboratory, students will learn how to effectively communicate information, ideas, problems and solutions to specialists but also to a general audience.

Learning skills

In this course the students will develop the skills necessary for a successful understanding and application of new statistical methodologies together with their effective implementation. The goal is of course to grow an active attitude towards continued learning throughout a professional career.


Risultati di apprendimento attesi

Statistical Methods in Data Science is a two-semester course aimed at providing the fundamental tools for:

setting up probabilistic models;
understanding the basic principles of the main inferential problems: estimation, hypothesis testing, model checking and forecasting;
understanding and contrasting the two main inferential paradigms, namely frequentist and Bayesian statistics;
implementing inference on observed data through both optimization and simulation-based (approximation) techniques such as:
Bootstrap
Monte Carlo
Monte Carlo Markov Chain (MCMC)
understanding comparative merits of alternative strategies
developing statistical computations within a suitable software environment like R (www.r-project.org), OpenBUGS (http://openbugs.net/w/FrontPage) and STAN (http://mc-stan.org/).

Prerequisiti

Basic Probability, Linear Algebra, Multivariable Calculus

Programma dell’insegnamento

Part I Probability
---------------------
Outcomes, Events & Probability.
Conditional Probability & Independence.
Bayes' Theorem: Interpretation & Use.
Random Variables and Random Vectors.
Directed Graphical Models.
Expected Value, Variance and Covariance.
Univariate and Multivariate Distributions.
Generating Functions and Convergence Theorems.

Part II Statistics
---------------------
The Empirical Distribution and the Bootstrap
Statistical Modelling: the Likelihood Function.
Parameter Estimation: Point and Interval Estimation.
Hypothesis Testing.
Alternative Inferential Frameworks: Frequentist vs Bayesian Inference.

Testi di riferimento

Main References:
F.M. Dekking, C. Kraaikamp, H.P. Lopuhaa, L.E. Meester (2007). A Modern Introduction to Probability and Statistics. Springer.
L. Wasserman (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.

Bibliografia

Modulo: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY II
• Jean-Michel Marin and Christian P. Robert, Bayesian Core: A Practical Approach to Computational
Bayesian Statistics, Springer, 2007
• Christian P. Robert and George Casella. Monte Carlo statistical methods (2nd ed.. Springer-Verlag
Inc, 2004.
• Ioannis Ntzoufras, Bayesian Modeling Using WinBUGS. Wiley, 2009.
• Peter Congdon. Bayesian Statistical Modelling (2nd ed.). Wiley, 2006





Modulo: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY I
Riferimenti principali:
F.M. Dekking, C. Kraaikamp, H.P. Lopuhaa, L.E. Meester (2007). A Modern Introduction to Probability and Statistics. Springer.
L. Wasserman (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.

Modalità di svolgimento

Lezioni Frontali

Frequenza

Facoltativa

Modalità di esame

Homeworks + Written Exam + Oral Check

Esempi di domande

Domande varie su come i diversi approcci inferenziali (frequentista e bayesiano) si applichino a problemi reali di analisi dei dati, quali siano i criteri per scegliere modelli probabilistici appropriati, e in che modo tecniche di ottimizzazione e simulazione come il Bootstrap, il Monte Carlo e l’MCMC possano essere implementate in ambienti di calcolo statistico come R.

Programmazione delle attività didattiche

Modulo: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY II

  • secondo semestre
    • Testi di riferimento: OK



Modulo: STATISTICAL METHODS IN DATA SCIENCE AND LABORATORY I


Obiettivi per lo sviluppo sostenibile - Agenda ONU 2030

  • Goal1
  • Goal2
  • Goal3
  • Anno accademico2024/2025
  • Corso di studio a cui afferisce l’insegnamentoData Science
  • Linguaeng
  • CFU12 CFU distribuiti in 2 moduli didattici integrati
  • Durata complessiva122 ore