alessandra.faggionato@uniroma1.it's picture
Course Code Year Course - Attendance Bulletin board
ELEMENTI DI PROBABILITA E STATISTICA PER DATA SCIENCE 10595858 2023/2024

The program will be very similar (but not identical) to the one of the course in  2022/23 and  in 2021/22.

 

HOEFFDING’S INEQUALITY
Hoeffiding’S inequality for  Rademacher random variables (rvs) and for limited rvs. Application: boosting randomized algorithms

SUBGAUSSIAN RANDOM VARIABLES
Standard Gaussian random  variable: estimation of tails, moments and moment generating function. Definition of  subgaussian rvs and equivalent characterizations. Normed space of subgaussian rvs. Centering. Estimate of the subgaussian norm of the sum of independent subgaussian rvs  with zero mean.

COVERING NUMBER AND PACKING NUMBER
 epsilon-net and covering number. epsilon-separated set and packing number. Equivalence between covering and packing numbers. Covering number for subsets of R^n. Estimates on the covering number of the Euclidean ball and sphere.

BOUNDS ON THE NORM OF  SUBGAUSSIAN RANDOM MATRIXES
Use of epsilon-nets to approximate Lipschitz functions. Estimation of the norm of random matrices with independent, zero-mean, subgaussian entries. Estimation of the norm of random symmetric matrices with subgaussian, zero-mean, independent entries on and above the diagonal.

PERTURBATIVE THEORY FOR DETERMINISTIC MATRICES
 Weyl's inequality for eigenvalues. Davis-Kahan theorem for eigenvectors. Corollary of the Davis-Kahan Theorem: estimation of the distance between the eigenvectors under perturbations

CLUSTERING: STOCHASTIC BLOCK MODEL
Stochastic block model. Spectral analysis of the expected value of the adjacency matrix of the graph. Spectral clustering algorithm with the adjacency matrix. Effectiveness of the spectral clustering algorithm.

CLUSTER ANALYSIS
Cluster analysis. Dissimilarity function. Within cluster point scatter. Between-cluster point scatter. Combinatorial clustering based on the listing of surjective functions and criticality of its implementation. k-means clustering. Similarity and associated weighted graphs (epsilon-neighbourhood graph, k-nearest neighbor graph, mutual k-nearest neighbor graph, fully connected graph). Unnormalized Laplacian of a non-oriented weighted graph.
Spectral clustering algorithm with the non-normalized Laplacian (unnormalized spectral clustering).

HIGH DIMENSIONAL RANDOM  VECTORS
Covariance matrix for a random vector. Isotropic random vectors.  Basics on Gaussian rvs and Gaussian random vectors. Expression of the expected value of the standard Gaussian rv with Gamma functions. Concentration of the norm for vectors with independent, subgaussian entries, unitary second moment and application to the standard Gaussian vector. 

STANDARD GAUSSIAN MATRICES
 Standard Gaussian matrices and their properties. Grassmanian variety.

DIMENSION REDUCTION: JOHNSON-LINDENSTRAUSS LEMMA
 Concentration theorem for Lipschitz functions of a random vector uniformly distributed on the sphere. Johnson-Lindenstrauss lemma for N points.

GAUSSIAN PROCESSES, GAUSSIAN WIDTH AND SPHERICAL WIDTH
Stochastic processes, covariance function. Gaussian and spherical width and their comparison. Gaussian width of some remarkable sets.

HIGH DIMENSION ESTIMATORS
M*- bound. Estimators for linear Gaussian observations without noise/with noise: feasibility program and optimization program in terms of the Minkowski functional. Estimators for linear regression

SPARSE RECOVERY AND EXACT SPARSE RICOVERY
Sparsity in wavelet representation of signals and linear regression in genetics.  Estimators for linear Gaussian observations of a sparse vectors. Escape theorem. Exact sparse recovery. 

 

You can see the sections dedicated to the courses  of 2022/23 and 2021/22at the page 

 https://www1.mat.uniroma1.it/people/faggionato/didattica/dida1.html

 

Texts:  we will use several texts, mainly available online for free. You do not need to buy books. 

We will treat some parts of the following texts:


R. Vershynin. ``High-Dimensional Probability. An Introduction with Applications in Data Science". Cambridge University Press. Disponibile  online (gratuitamente).

M. J. Wainwright. ``High-Dimensional Statistics. A Non-Asymptotic Viewpoint". Cambridge University Press.

R. Vershynin, Estimation in high dimension: a geometric perspective. https://arxiv.org/pdf/1405.5103.pdf

T. Hastie, R. Tibshirani, J. Friedman. The elements of statistical learning. Springer series in Statistics. Free online

U. von Luxburg, A tutorial in spectral clustering Statistics and Computing, 17 (4), 2007 (free online)

 

Exams: surely oral exam, probably this year there will be also a written exam 

 

Dates of the exams: see the following webage (suggestion, on the top choose ``Ordina per data"): https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vQt4Q8_g7oiydvFHlkdF...

 

PROBABILITA' II 1051922 2023/2024

Program: it will be the same of the course of 2022/23 with exception that I will reduce the part on Markov chains and treat some other subjects, as e.g. the ergodic theorem or percolation.

 


Entropy: surprise of an event, entropy of a random variable, joint entropy for two random variables, conditional entropy, conditional joint entropy, Kullback-Leibler distance, mutual information of two random variables, relative entropy of probability measures.

Coding theory: uniquely decodable codes, prefix codes, Kraft inequality, Shannon theorem for the  estimation from below of the average codeword length by source entropy, Shannon-Fano code, Huffman code as optimal code.


Stochastic processes and entropy: discrete time stochastic processes, stationary stochastic processes, entropy rate of a stochastic process, existence of the entropy rate for stationary stochastic processes.

 

Coupling: coupling of two measures and of two random variables, distance in total variation, maximal coupling theorem, generalized inverse function of the partition function, simulation of a  real random variable  by means  its distribution function and the uniform random variable on [0,1].

 

Stochastic domination: stochastic domination of two probability measures on R, stochastic domination of two real random variables and equivalent formulations (through tails and partition functions), coupling and stochastic domination for real random variables, monotone coupling,  stochastic domination between binomial random variables, stochastic domination between Poisson random variables, partially ordered sets (poset), increasing subsets of poset, increasing functions on poset, stochastic domination of probability measures on a poset, Strassen's theorem and its implications.

 

Bernoulli bond percolation: percolation probability, critical probability, monotony of the critical probability with respect to the dimension, phase transition for d>1, FKG inequality and other basic techniques. Alternatively to Bernoulli bond percolation we could  decide to treat  random graphs.

 

Moment generating functions. 

 

Gaussian random vectors and basic notions about Brownian motion
 

 

For the courses of last two years you can see the dedicated section at the page 

https://www1.mat.uniroma1.it/people/faggionato/didattica/dida1.html

 

 

Texts: we will use several books available online for free, in particular we will treat some chapters of 

T.M. Cover, J.A. Thomas; Elements of information theory. 2nd Edition, John Wiley & Sons, inc., 2006.
J.R. Norris; Markov chains. Cambridge University Press.
 S. Roch. Modern Discrete Probability: An Essential Toolkit (online lecture notes)
G. Grimmett. Percolation. Springer Verlag
File with additional  notes written by  A. Faggionato  

 

Exams: written and oral exams

 

Date of exams: see the webpage 

 https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vQt4Q8_g7oiydvFHlkdF...

 

suggestion: on the top of the above webpage choose ``Ordina per data"

CALCOLO DELLE PROBABILITA' 1020421 2023/2024
PROBABILITA' II 1051922 2022/2023
ELEMENTI DI PROBABILITA E STATISTICA PER DATA SCIENCE 10595858 2022/2023
CALCOLO DELLE PROBABILITA' 1020421 2022/2023
ELEMENTI DI PROBABILITA E STATISTICA PER DATA SCIENCE 10595858 2021/2022
PROBABILITA' II 1051922 2021/2022
CALCOLO DELLE PROBABILITA' 1020421 2021/2022
MATEMATICA E STATISTICA 1045004 2020/2021
MATEMATICA 1039660 2020/2021
MATEMATICA E STATISTICA 1045004 2019/2020
PROCESSI STOCASTICI 1031451 2019/2020
PROBABILITA' II 1051922 2019/2020
PROCESSI STOCASTICI 1031451 2019/2020
MATEMATICA 1039660 2019/2020
MATEMATICA E STATISTICA 1045004 2018/2019
PROBABILITA' II 1051922 2018/2019
PROCESSI STOCASTICI 1031451 2018/2019
PROCESSI STOCASTICI 1031451 2018/2019
MATEMATICA 1039660 2018/2019
MATEMATICA E STATISTICA 1045004 2017/2018
STATISTICA MATEMATICA 1031375 2017/2018
STATISTICA MATEMATICA 1031375 2017/2018
MATEMATICA 1039660 2017/2018
CALCOLO DELLE PROBABILITA' 1020421 2016/2017
STATISTICA MATEMATICA 1031375 2016/2017
STATISTICA MATEMATICA 1031375 2016/2017

Da concordare con il docente