BIG DATA FOR OFFICIAL STATISTICS

Obiettivi formativi

What subset of Big Data can be used in the ambit of Official Statistics and what domains of Official Statistics can be enriched through the availability of new data sources. How new data sources can be used in Official Statistics, by taking into account challenges, needs and risks in this exercise. Definition of the role of Big Data in the context of Official Statistics. How to frame the measurement of social, demographic and economic phenomena through Big Data by considering challenges, needs and risks.

Canale 1
FILOMENA MAGGINO Scheda docente

Programmi - Frequenza - Esami

Programma
Dealing with Big Data in the context of Official Statistics requires considering complexity of using Big Data in governing perspectives. The kind of knowledge is twofold: - knowledge able to enrich the traditional statistics in order to make them timelier and more differentiated from the territorial and/or social point of view (e.g., nowcasting) - knowledge to be added to the traditional statistics by observing phenomena from different angles through an ex-post perspective Both approaches may have deep consequences in Official Statistics, in terms of challenges, opportunities, needs and risks. Part I: key concepts to be known 1. Some crucial definitions: a. complexity, b. data, c. data quality, d. indicators, e. forecasts, f. classification, g. data representation and visualization. 2. Official Statistics: definition, role and new challenges. 3. Big Data in the context of Official Statistics a. Opportunities and criticisms: sources, extracting veins, opportunities, challenges, from information to data b. Using Big Data sources for the production of Official Statistics: scenarios, impacts and perspectives Part II: Analytical methods and techniques in the context of Official Statistics 1. Methods for Machine Learning a. Supervised learning b. Unsupervised learning c. Semi-supervised learning 2. Traditional Algorithms for Machine Learning a. Linear and logistic regression b. Clustering c. Decision Making processes d. Random forest 3. Machine Learning with Neural Networks a. Convolutional Neural Networks b. Recurrent Neural Networks c. Autoencoders and Generative Adversarial Networks d. Transfomers and Large Language Models Part III: Some applications 1. Land cover 2. Sentiment Analysis for social media texts 3. Topic modelling for social media texts 4. Semantic Search and RAG for automatic enterprise classification The course will be integrated with lessons by - Dr. Francesco Pugliese (Italian National Institute of Statistics - Istat) - Dr. Angela Pappagallo (Italian National Institute of Statistics - Istat) - Dr. Francesco Ortame (Italian National Institute of Statistics - Istat)
Prerequisiti
Nessun prerequisito
Testi di riferimento
https://www.egeaeditore.it/ita/prodotti/matematica-statistica-demografia/big-data-and-official-statistics_.aspx
Modalità insegnamento
Le lezioni si svolgeranno in presenza e/o in distance
Frequenza
Non obbligatoria ma raccomandata.
Modalità di esame
Prova orale con valutazione progetto individuale.
Bibliografia
Since the course will be delivered in English, all information will be provided here in that language.
Modalità di erogazione
Le lezioni si svolgeranno in presenza.
  • Codice insegnamento1056085
  • Anno accademico2025/2026
  • CorsoData Science
  • CurriculumCurriculum unico
  • Anno2º anno
  • Semestre1º semestre
  • SSDSECS-S/05
  • CFU6