Course program
- Part 1 (about 18 hours)
1.1 Vector and Matrix algebra: Properties and operations on vectors and matrices, Geometric representation of a vectorial space, Eigendecomposition of a symmetric matrix, Decomposition of the covariance matrix, Random vectors and matrices, Mean vector and covariance matrix of linear combination of random variables, Decomposition of the sample mean vector and covariance matrix.
1.2 Multivariate Gaussian distribution: Definition, Contour of the bivariate Gaussian distribution, Properties, Sampling from the multivariate Gaussian distribution, Multivariate likelihood, Maximum likelihood estimates of the mean vector and of the covariance matrix.
- Part 2 (about 18 hours)
2.1 Linear model: Basic assumptions, Least-squares parameter estimates, Estimate of the error variance, Gauss-Markov theorem, Sample coefficient of determination, Maximum likelihood parameter estimators, Confidence intervals for the parameters, Hypothesis testing on the parameters.
2.2 Generalized linear models: Introduction, Basic assumptions on the probability distribution of the statistical units, Theoretical foundations, Examples of generalized linear models, Logistic regression, Maximum likelihood parameter estimation.
- Part 3 (about 36 hours)
3.1 Principal component Analysis: Introduction, Basic descriptive properties, Basic sample results, Sample properties when random variables are multivariate Gaussian, Variable standardization, Interpretation of the principal components, Selection of the optimal number of principal components.
3.2 Canonical correlation analysis: Introduction, Determination of the coefficient of canonical correlation, Developments in canonical correlation analysis, Estimation and hypothesis testing.
3.2 Factor analysis: Introduction, Factor analysis model, Main aspects of the model, Parameter estimates without multivariate Gaussian assumption, Maximum likelihood parameter estimates, Hypothesis testing on the number of factors, Factor rotation, Factor scores.
3.3 Cluster Analysis: Introduction, Dissimilarity and distance measures, Hierarchical methods, Non-hierarchical methods, Selection of the number of clusters.
3.4 Classification trees: Introduction, Classification rules. Steps of the procedure. Classification methods. CART.
Prerequisites
It is useful to have acquired the basic concepts of Statistica di Base, Probabilità, Inferenza statistica e laboratorio.
Books
VITALI O. (1993). Statistica per le Scienze Applicate, Cacucci Editore, Bari, vol. II - "Modello lineare Generale Classico" (Chapter 14, vol. I, no Sect. 12, 13) - "Modelli Lineari Generalizzati" (Chapter 17, no Sect. 6, 7, 8, 9) - "Componenti Principali" (Chapter 27, no Sect. 7.2, 7.3, 7.4) - "Analisi dei Fattori" (Chapter 29, no Sect. 5.2, 5.3, 8.3).
ZANI S., CERIOLI, A. (2007). Analisi dei dati e data mining per le decisioni aziendali, Giuffrè Editore, Milano - Cluster Analysis (Chapter VIII, Sect. 1, 2, 3, 4; Chapter IX, Sect. 1, 2, 3, 4, 5, 6, 8, 11; Chapter XI, Sect. 1, 2, 3, 4, 5).
Lecture notes. Richiami di Algebra Vettoriale e Matriciale (cfr. anche O. Vitali, App. A) - Normale Multivariata (see also O. Vitali, Chapter 24, Sect. 1, 2, 3, 4).
Teaching mode
Lectures include presentations of theoretical aspects and applications to concrete examples bu using the R software.
Frequency
No compulsory attendance, but highly recommended.
Exam mode
Oral examination at the end of the course aimed at ascertaining both the acquisition of the theoretical aspects and the ability to solve concrete problems.
Bibliography
Books on multivariate statistics
Lesson mode
Lectures include presentations of theoretical aspects and applications to concrete examples bu using the R software.