Course program
Large scale computation and mining large graphs
Standard approaches and Map Reduce - like paradigm. Solving basic problems using Apache Spark
MapReduce/Hadoop - like algorithms for triangle counting and connected components
A quick tour of community detection in large graphs
Hashing and sampling techniques for neighborhood search
Neighborhood search and the curse of dimensionality in euclidean spaces
Reducing dimensions in euclidean spaces via hashing
Extensions to different metrics
Efficient neighborhood search via hashing and bucketing
Reducing search space via sampling
Dimensionality reduction
SVD and basic approach
Sparsification techniques and CUR
Random projections
Sketching and sampling for streams of data
Estimating frequency moments in sliding windows
Sketching algorithms for heavy hitters tracking
Sketching techniques for join size estimation
Sketching techniques for large graph mining, with application to neighborhood search, and analysis of community structure
Distributed algorithms in following MapReduce paradigm
Graph semi-streaming algorithms
Prerequisites
- Linear algebra
- Calculus and basic knowledge of probability theory and statistics
- Programming, fundamental algorithms and data structures
Books
- Selected sections of "Foundations of Data Science", by Avrim Blum, John Hopcroft, and Ravindran Kannan, available at https://www.cs.cornell.edu/jeh/book.pdf
- Selected sections and chapters of "Mining of massive datasets" (2nd edition). Cambridge University Press, 2014, by Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman.
- Scientific papers and on line resources. Pointers will be given by the instructor when needed
Frequency
Attendance of theoretical and practical lectures is not mandatory, but it is strongly advised.
Exam mode
- Theoretical and practical homeworks on the topics covered during the course
- Written exam
- Oral exam
Lesson mode
Lectures will be held in presence. Part of the lectures will be applied, with the students involved, together with the instructor, in implementing notions and concepts introduced in the course