LARGE-SCALE DATA MANAGEMENT

Course objectives

General goals: The goal of the course is to make students familiar with the basic concepts of managing information systems at large scale. Two specific topics will be investigated in detail, namely information models for Big Data Management, and information integration. Both topics are extremely relevant in the data-driven society, where virtually all information systems of reasonably sized organisations need to both manage large data sets, and to interact with several data sources. Specific goals: To study the data models used in Big Data Management, especially NoSQL data models, including column-based, key-vale, and document data models, and to get familiar with the notions and the techniques for information integration. Knowledge and understanding: After the course the student will have a good knowledge on the differences and similarities between the relational model and the various classes of NoSQL data models. Moreover, the students will understand the theoretical issues in data integration and exchange, and will have a good knowledge about the various architectures of information integration systems. Apply knowledge and understanding: The students will be able to design her/his own Big Data repository using one of the data models adopted in practice, to choose an appropriate architecture for information integration, and to build and maintain an information integration systems structured according to the chosen architecture. Critical and judgment skills: The student will be able to evaluate the requirement for a Big Data Management system, and will be able to choose the right data model and infrastructure to choose. Analogously, the student will be able to understand the requirement for a specific information integration system, and choose the appropriate approaches and techniques for designing a high-quality solution. Communication skills: The students will acquire a good knowledge on how to illustrate the results of a design process, both in the context of Big Data Management, and in the context of information integration systems. Learning ability: The student will be able to understand any new architecture and approach to Big Data Management and to Information Integration that will become popular in the future

Channel 1
ANTONELLA POGGI Lecturers' profile

Program - Frequency - Exams

Course program
- Introduction to Big Data - Aggregate Databases: Aggregate NoSQL data models: Key-value, document, column-family databases; Data Modeling; Distribution Models; Consistency: Update consistency, Read Consistency, CAP Theorem; Map Reduce framework. - Document-based Data models: MongoDB - Hadoop and its Ecosystem; Hive; Data Lakes. - RDFS; SPARQL; Linked Open Data; Ontology-based Data Access Information Integration: ----------------------------- - Architectures for information integration - Distributed data management - Data federation - Data exchange and data warehousing - ETL (Extraction, Transformation and Loading), data cleaning and data reconciliation - Data integration - Ontology-based data integration
Prerequisites
A good knowledge of the fundamentals of Programming Structures, Programming Languages, Databases (SQL, relational data model, Entity-Relationship data model, conceptual and logical database design) and Database systems.
Books
Notes and slides prepared by the professors
Frequency
There is not attendance requirement, but attendance is strongly advised.
Exam mode
The exam consists in the development of a small project, focused on demonstrating a tool for data management in DataWarehousing or according to a NoSQL data model, and, for the part on Information Integration, on demonstrating a tool for data integration or data federation. The development of a sample application that makes use of one or more of these tools is also acceptable. The project is presented to the teacher with the aid of slides. The project is complemented with an oral exam on the course topics.
Bibliography
NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Pramod J. Sadalage, Martin Fowler. Addison-Wesley. 2014.
Lesson mode
Lectures and exercises during lectures and as additional individual study.
MARCO CONSOLE Lecturers' profile
  • Lesson code1044408
  • Academic year2025/2026
  • CourseEngineering in Computer Science and Artificial Intelligence
  • CurriculumSingle curriculum
  • Year2nd year
  • Semester1st semester
  • SSDING-INF/05
  • CFU6