Big Data for Official Statistics

Course objectives

What subset of Big Data can be used in the ambit of Official Statistics and what domains of Official Statistics can be enriched through the availability of new data sources. How new data sources can be used in Official Statistics, by taking into account challenges, needs and risks in this exercise. Definition of the role of Big Data in the context of Official Statistics. How to frame the measurement of social, demographic and economic phenomena through Big Data by considering challenges, needs and risks.

Channel 1
FILOMENA MAGGINO Lecturers' profile

Program - Frequency - Exams

Course program
Dealing with Big Data in the context of Official Statistics requires considering complexity of using Big Data in governing perspectives. The kind of knowledge is twofold: - knowledge able to enrich the traditional statistics in order to make them timelier and more differentiated from the territorial and/or social point of view (e.g., nowcasting) - knowledge to be added to the traditional statistics by observing phenomena from different angles through an ex-post perspective Both approaches may have deep consequences in Official Statistics, in terms of challenges, opportunities, needs and risks. Part I: key concepts to be known 1. Some crucial definitions: a. complexity, b. data, c. data quality, d. indicators, e. forecasts, f. classification, g. data representation and visualization. 2. Official Statistics: definition, role and new challenges. 3. Big Data in the context of Official Statistics a. Opportunities and criticisms: sources, extracting veins, opportunities, challenges, from information to data b. Using Big Data sources for the production of Official Statistics: scenarios, impacts and perspectives Part II: Analytical methods and techniques in the context of Official Statistics 1. Methods for Machine Learning a. Supervised learning b. Unsupervised learning c. Semi-supervised learning 2. Traditional Algorithms for Machine Learning a. Linear and logistic regression b. Clustering c. Decision Making processes d. Random forest 3. Machine Learning with Neural Networks a. Convolutional Neural Networks b. Recurrent Neural Networks c. Autoencoders and Generative Adversarial Networks d. Transfomers and Large Language Models Part III: Some applications 1. Land cover 2. Sentiment Analysis for social media texts 3. Topic modelling for social media texts 4. Semantic Search and RAG for automatic enterprise classification The course will be integrated with lessons by - Dr. Francesco Pugliese (Italian National Institute of Statistics - Istat) - Dr. Angela Pappagallo (Italian National Institute of Statistics - Istat) - Dr. Francesco Ortame (Italian National Institute of Statistics - Istat)
Prerequisites
No prerequisites
Books
https://www.egeaeditore.it/ita/prodotti/matematica-statistica-demografia/big-data-and-official-statistics_.aspx
Teaching mode
Lessons will take place face to face and / or remotely
Frequency
Not mandatory but recommended.
Exam mode
Oral exam including evaluation of an individual project
Bibliography
The students will be provided with the materials through Moodle platform.
Lesson mode
Lessons will take place face to face.
  • Lesson code1056085
  • Academic year2025/2026
  • CourseData Science
  • CurriculumSingle curriculum
  • Year2nd year
  • Semester1st semester
  • SSDSECS-S/05
  • CFU6