HIGH-DIMENSIONAL PROBABILITY AND STATISTICS

Course objectives

General objectives: to acquire knowledge in High dimensional Probability and Statistics with applications to Data Science Specific objectives: Knowledge and understanding: at the end of the course the student will have acquired the basic notions of High Dimensional Probability and Statistics and will be familiar with algorithms used to solve some relevant problems in Data Science. Apply knowledge and understanding: at the end of the course the student will be able to solve some problems concerning high dimensional random geometric structures, data dimension reduction, statistical learning and high dimensional regression Critical and judgmental skills: the student will realize the ideas behind several algorithms and software used in Data Science, understand optimal conditions and/or possible limits for applications Communication skills: the student must show the ability to present the contents of the course in the oral part of the assessment and in the solution of problems in the written test. Learning skills: the acquired knowledge will allow a multidisciplinary understanding of several problems motivated by data science and will facilitate the study into some very active research fields.

Channel 1
ALBERTO FACHECHI Lecturers' profile
LORENZO TAGGI Lecturers' profile

Program - Frequency - Exams

Course program
The course covers key concepts and methods in high-dimensional probability and statistics, with a focus on concentration inequalities, subgaussian and subexponential random variables, and their applications in data analysis and machine learning. Topics include: Hoeffding inequality and the definition of subgaussian random variables. Equivalent characterizations and the subgaussian norm. Hoeffding inequality for sums of independent subgaussian variables. Subexponential random variables and Chernoff bounds. Johnson–Lindenstrauss lemma and dimensionality reduction. Epsilon-nets, covering numbers, and packing numbers; quantitative bounds in 𝑅 𝑛 R n . Concentration of the operator norm of random matrices with subgaussian entries and related spectral results (Courant–Fischer, Weyl, Davis–Kahn theorems). Stochastic Block Model and community detection problems. Spectral clustering algorithm and k-means algorithm. Laplacian operator and unnormalized Laplacian spectral clustering. Strong law of large numbers, Glivenko–Cantelli theorem, and the Dvoretzky–Kiefer–Wolfowitz (DKW) inequality. VC dimension, Rademacher complexity, and their role in statistical learning theory. Throughout the course, both theoretical and computational exercises are assigned and discussed, some of which are implemented during the laboratory sessions.
Prerequisites
A basic knowledge of probability and statistics is required, including elementary concentration inequalities, discrete and continuous random variables, expectation and variance, independence, the law of large numbers, and the central limit theorem. Familiarity with linear algebra (eigenvalues, eigenvectors, matrix norms) and basic mathematical analysis (sequences, limits, derivatives) is also useful. Knowledge of the Matlab programming language is required, as it will be used during the laboratory sessions to perform the assigned computational tasks.
Books
R. Vershynin. ``High-Dimensional Probability. An Introduction with Applications in Data Science". Cambridge University Press. Disponibile online (gratuitamente). M. J. Wainwright. ``High-Dimensional Statistics. A Non-Asymptotic Viewpoint". Cambridge University Press.
Frequency
Lectures will be delivered in presence. Attendance is strongly recommended.
Exam mode
Exams: oral and/or written exam.
Bibliography
R. Vershynin. ``High-Dimensional Probability. An Introduction with Applications in Data Science". Cambridge University Press. Disponibile online (gratuitamente). M. J. Wainwright. ``High-Dimensional Statistics. A Non-Asymptotic Viewpoint". Cambridge University Press.
Lesson mode
The course includes theoretical lectures (for a total of 44 hours) delivered at the blackboard or using a tablet, focusing on probability exercises, theorem statements, and detailed proofs., In addition, 12 hours of laboratory sessions are scheduled, during which students use Matlab to carry out practical exercises directly related to the theoretical topics covered in class.
  • Lesson code10611928
  • Academic year2025/2026
  • CourseApplied Mathematics
  • CurriculumMatematica per Data Science
  • Year2nd year
  • Semester1st semester
  • SSDMAT/06
  • CFU6