Programma
Dealing with Big Data in the context of Official Statistics requires considering complexity of using Big Data in governing perspectives. The kind of knowledge is twofold:
- knowledge able to enrich the traditional statistics in order to make them timelier and more differentiated from the territorial and/or social point of view (e.g., nowcasting)
- knowledge to be added to the traditional statistics by observing phenomena from different angles through an ex-post perspective
Both approaches may have deep consequences in Official Statistics, in terms of challenges, opportunities, needs and risks.
Part I: key concepts to be known
1. Some crucial definitions:
a. complexity,
b. data,
c. data quality,
d. indicators,
e. forecasts,
f. classification,
g. data representation and visualization.
2. Official Statistics: definition, role and new challenges.
3. Big Data in the context of Official Statistics
a. Opportunities and criticisms: sources, extracting veins, opportunities, challenges, from information to data
b. Using Big Data sources for the production of Official Statistics: scenarios, impacts and perspectives
Part II: Analytical methods and techniques in the context of Official Statistics
1. Methods for Machine Learning
a. Supervised learning
b. Unsupervised learning
c. Semi-supervised learning
2. Traditional Algorithms for Machine Learning
a. Linear and logistic regression
b. Clustering
c. Decision Making processes
d. Random forest
3. Machine Learning with Neural Networks
a. Convolutional Neural Networks
b. Recurrent Neural Networks
c. Autoencoders and Generative Adversarial Networks
d. Transfomers and Large Language Models
Part III: Some applications
1. Land cover
2. Sentiment Analysis for social media texts
3. Topic modelling for social media texts
4. Semantic Search and RAG for automatic enterprise classification
The course will be integrated with lessons by
- Dr. Francesco Pugliese (Italian National Institute of Statistics - Istat)
- Dr. Angela Pappagallo (Italian National Institute of Statistics - Istat)
- Dr. Francesco Ortame (Italian National Institute of Statistics - Istat)
Prerequisiti
Nessun prerequisito
Testi di riferimento
https://www.egeaeditore.it/ita/prodotti/matematica-statistica-demografia/big-data-and-official-statistics_.aspx
Modalità insegnamento
Le lezioni si svolgeranno in presenza e/o in distance
Frequenza
Non obbligatoria ma raccomandata.
Modalità di esame
Prova orale con valutazione progetto individuale.
Bibliografia
Since the course will be delivered in English, all information will be provided here in that language.
Modalità di erogazione
Le lezioni si svolgeranno in presenza.