Engineering in Computer Science and Artificial Intelligence

Data Mining

Course objectives

The goal of the course is twofold. First, it will present the main theory behind the analysis of data. Second, it will be hands-on and at the end students will become familiar with various state-of-the-art tools and techniques for analyzing data. We will use Python for downloading data as well as its rich machine-learning libraries, the R environment for statistical processing, and the MapReduce framework for mining of large-scale data.

Channel 1

ARISTIDIS ANAGNOSTOPOULOS Lecturers' profile

LUCA BECCHETTI Lecturers' profile

Program - Frequency - Exams

Course program

Large scale computation and mining large graphs Standard approaches and Map Reduce - like paradigm. Solving basic problems using Apache Spark MapReduce/Hadoop - like algorithms for triangle counting and connected components A quick tour of community detection in large graphs Hashing and sampling techniques for neighborhood search Neighborhood search and the curse of dimensionality in euclidean spaces Reducing dimensions in euclidean spaces via hashing Extensions to different metrics Efficient neighborhood search via hashing and bucketing Reducing search space via sampling Dimensionality reduction SVD and basic approach Sparsification techniques and CUR Random projections Sketching and sampling for streams of data Estimating frequency moments in sliding windows Sketching algorithms for heavy hitters tracking Sketching techniques for join size estimation Sketching techniques for large graph mining, with application to neighborhood search, and analysis of community structure Distributed algorithms in following MapReduce paradigm Graph semi-streaming algorithms

Prerequisites

- Linear algebra - Calculus and basic knowledge of probability theory and statistics - Programming, fundamental algorithms and data structures

Books

- Selected sections of "Foundations of Data Science", by Avrim Blum, John Hopcroft, and Ravindran Kannan, available at https://www.cs.cornell.edu/jeh/book.pdf - Selected sections and chapters of "Mining of massive datasets" (2nd edition). Cambridge University Press, 2014, by Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. - Scientific papers and on line resources. Pointers will be given by the instructor when needed

Frequency

Attendance of theoretical and practical lectures is not mandatory, but it is strongly advised.

Exam mode

- Theoretical and practical homeworks on the topics covered during the course - Written exam - Oral exam

Lesson mode

Lectures will be held in presence. Part of the lectures will be applied, with the students involved, together with the instructor, in implementing notions and concepts introduced in the course

Lesson code1038138
Academic year2025/2026
CourseEngineering in Computer Science and Artificial Intelligence
CurriculumSingle curriculum
Year2nd year
Semester1st semester
SSDING-INF/05
CFU6

Course catalogue

Engineering in Computer Science and Artificial Intelligence

23/09/2025 - Second extraordinary exam session

11/09/2025 - 1st Semester, Academic Year 2025–26: Start of Classes

Data Mining

Course objectives

Program - Frequency - Exams

Course program

Prerequisites

Books

Frequency

Exam mode

Lesson mode

Engineering in Computer Science and Artificial Intelligence

Featured announcements

23/09/2025 - Second extraordinary exam session

11/09/2025 - 1st Semester, Academic Year 2025–26: Start of Classes

Data Mining

Course objectives

Program - Frequency - Exams

Course program

Prerequisites

Books

Frequency

Exam mode

Lesson mode