MULTI-VARIED STATISTICS

Course objectives

Learning goals. The main goal of the course is the acquisition of the fundamental statistical tools for the analysis of multivariate data and their use in real applications. At the end of the course, the student should be able to formalize the statistical goal of a real case study, to develop a strategy of analysis by selecting appropriate methods, to apply the methodology and derive the correct conclusions by producing a (short) technical report which selects and collects the main the results. Knowledge and understanding. When completing the course, students will have learnt the main issues and essential concepts of multivariate and multidimensional analysis (for example, dependence, dimension reduction, classification) and the standard methodologies to face and handle such problems (such as, linear regression, PCA and cluster analysis). Applying knowledge and understanding. When completing the course, students will be able to formalize a multivariate statistical problem and select the appropriate methodologies to face such a problem. Moreover, they will have the basic skills to explain possible choices, to make comparisons and to assess assumptions and applicability. Finally, they will be able to apply the methods to real data and interpret the results. Making judgements. Students develop the critical thinking by applying the methodologies learnt which they will be able to use in autonomy by means of statistical software. The capability to process data and produce the output by themselves reveals the autonomy in analyzing, making judgements necessary to make choices and comparisons taking into considerations theoretical criteria. In addition, students will learn to critically interpret the results obtained in real applications. Communication skills. By processing data and making short technical reports, students will learn the correct use of the technical language which is required in both coursework and final exam. Special attention is given to the skill of communicating results to non-specialists by using a rigorous but understandable language. Learning skills Students passing the exam have learnt: a) the theoretical background in advanced statistics to possibly pass to a Second Cycle Degree in either Statistics or Applied Statistics; b) the tools to develop and build a strategy of analysis in autonomy when analyzing data which are necessary either to tackle a job or to continue the Programme of Study.

Channel 1
DONATELLA VICARI Lecturers' profile

Program - Frequency - Exams

Course program
The course is structured into four parts focused on theoretical aspects and a lab module. Part 1: Introduction to Multivariate Statistics (about 6 h). Data Matrices and their statistical transformations; Fundamentals of matrix algebra. Part 2: Linear Regression (about 14 h). Least Square and ML estimation; Model validation: coefficient of determination, goodness of fit tests; Estimators: unbiasedness, Theorem of Gauss-Markov; Variance estimator, Confidence intervals and significance tests for regression parameters, Inference about prediction; Multicollinearity; Model selection; Influence: how to measure and handle. Part 3: Dimensionality reduction (about 12 h). Dimension reduction methods; Principal Component Analysis: definition and solution; PCA assessment: global, variable and unit assessment; correlation circle and unit contributions; Biplot. Part 4: Cluster Analysis (about 16 h). Distances and similarities between statistical units; Definition of cluster and cluster distance; Divisive hierarchical methods: Cavalli-Sforza method; Agglomerative hierarchical methods: Single linkage, complete linkage, average linkage, centroid linkage, Ward’s method; General formula of Lance and Williams; Indexed Hierarchy and dendrogram; Non-hierarchical methods; K-means: model, algorithm, properties; Criteria for cluster evaluation and for the choice of the number of clusters. Computer Laboratory (about 24 h) Real data applications of the multivariate statistical methodologies using the statistical package SAS; analysis and interpretation of the output from the SAS procedures concerning the methodologies learned in the theoretical part.
Prerequisites
In order to fruitfully acquire the competences and pass the exam, students are REQUIRED to know the fundamentals of Statistics (specifically, basic concepts of descriptive statistics for univariate and bivariate distributions) and Inference (i.e. sampling distributions, point and interval estimation, hypothesis testing). In the degree course where such a course is given, students are required to take the courses of Statistica di base and Inferenza Statistica where such skills are acquired. Moreover, students are recommended to know some basic notion of matrix algebra which can be acquired in the first-year course of Matematica I.
Books
S. Zani, A. Cerioli – Analisi dei Dati Statistici e Data Mining per le Decisioni Aziendali – Giuffrè Editore, 2007. Vitali O.– Statistica per le scienze applicate vol. I – Cacucci Editore, 1991. SAS Documentation available online (http://support.sas.com/documentation/) Teaching materials and SAS scripts
Teaching mode
Lectures in presence (up to health emergency) are focused on both theoretical and practical aspects of the methodologies of Multivariate Statistics. The coursework in the Computer Lab alternates lectures and the applications to real case studies to link theory and practice in a self-directed learning.
Frequency
Attendance in this course is strongly recommended. In case of impossibility students are encouraged to contact the teacher.
Exam mode
To pass the exam the student needs to pass: (a) a final written exam (about 1 hour) where he/she is required to discuss some questions on theoretical issues; (b) a practical exam (about 2 hours and a half) in the computer lab where he/she is required to carry on the analysis of a real case-study and to interpret the output by producing a (short) technical report. Such an exam allows to assess the knowledge of the theoretical concepts, the capability to formalize the statistical goal, the ability to build a strategy of analysis to solve practical problems, the use of appropriate language. Each part equally contributes to the final grade.
Bibliography
K.V. Mardia, J.T. Kent, J.M. Bibby– Multivariate Analysis – Academic Press, 1994. A.C. Rencher – Methods of Multivariate Analysis – Wiley, 2002. BOVE, G., OKADA, A., VICARI, D., Methods for the Analysis of Asymmetric Relationships, Series: Behaviormetrics: Quantitative Approaches to Human Behavior, Springer Nature, Singapore, 2021.
Lesson mode
Lectures in presence (up to health emergency) are focused on both theoretical and practical aspects of the methodologies of Multivariate Statistics. The coursework in the Computer Lab alternates lectures and the applications to real case studies to link theory and practice in a self-directed learning.
DONATELLA VICARI Lecturers' profile

Program - Frequency - Exams

Course program
The course is structured into four parts focused on theoretical aspects and a lab module. Part 1: Introduction to Multivariate Statistics (about 6 h). Data Matrices and their statistical transformations; Fundamentals of matrix algebra. Part 2: Linear Regression (about 14 h). Least Square and ML estimation; Model validation: coefficient of determination, goodness of fit tests; Estimators: unbiasedness, Theorem of Gauss-Markov; Variance estimator, Confidence intervals and significance tests for regression parameters, Inference about prediction; Multicollinearity; Model selection; Influence: how to measure and handle. Part 3: Dimensionality reduction (about 12 h). Dimension reduction methods; Principal Component Analysis: definition and solution; PCA assessment: global, variable and unit assessment; correlation circle and unit contributions; Biplot. Part 4: Cluster Analysis (about 16 h). Distances and similarities between statistical units; Definition of cluster and cluster distance; Divisive hierarchical methods: Cavalli-Sforza method; Agglomerative hierarchical methods: Single linkage, complete linkage, average linkage, centroid linkage, Ward’s method; General formula of Lance and Williams; Indexed Hierarchy and dendrogram; Non-hierarchical methods; K-means: model, algorithm, properties; Criteria for cluster evaluation and for the choice of the number of clusters. Computer Laboratory (about 24 h) Real data applications of the multivariate statistical methodologies using the statistical package SAS; analysis and interpretation of the output from the SAS procedures concerning the methodologies learned in the theoretical part.
Prerequisites
In order to fruitfully acquire the competences and pass the exam, students are REQUIRED to know the fundamentals of Statistics (specifically, basic concepts of descriptive statistics for univariate and bivariate distributions) and Inference (i.e. sampling distributions, point and interval estimation, hypothesis testing). In the degree course where such a course is given, students are required to take the courses of Statistica di base and Inferenza Statistica where such skills are acquired. Moreover, students are recommended to know some basic notion of matrix algebra which can be acquired in the first-year course of Matematica I.
Books
S. Zani, A. Cerioli – Analisi dei Dati Statistici e Data Mining per le Decisioni Aziendali – Giuffrè Editore, 2007. Vitali O.– Statistica per le scienze applicate vol. I – Cacucci Editore, 1991. SAS Documentation available online (http://support.sas.com/documentation/) Teaching materials and SAS scripts
Teaching mode
Lectures in presence (up to health emergency) are focused on both theoretical and practical aspects of the methodologies of Multivariate Statistics. The coursework in the Computer Lab alternates lectures and the applications to real case studies to link theory and practice in a self-directed learning.
Frequency
Attendance in this course is strongly recommended. In case of impossibility students are encouraged to contact the teacher.
Exam mode
To pass the exam the student needs to pass: (a) a final written exam (about 1 hour) where he/she is required to discuss some questions on theoretical issues; (b) a practical exam (about 2 hours and a half) in the computer lab where he/she is required to carry on the analysis of a real case-study and to interpret the output by producing a (short) technical report. Such an exam allows to assess the knowledge of the theoretical concepts, the capability to formalize the statistical goal, the ability to build a strategy of analysis to solve practical problems, the use of appropriate language. Each part equally contributes to the final grade.
Bibliography
K.V. Mardia, J.T. Kent, J.M. Bibby– Multivariate Analysis – Academic Press, 1994. A.C. Rencher – Methods of Multivariate Analysis – Wiley, 2002. BOVE, G., OKADA, A., VICARI, D., Methods for the Analysis of Asymmetric Relationships, Series: Behaviormetrics: Quantitative Approaches to Human Behavior, Springer Nature, Singapore, 2021.
Lesson mode
Lectures in presence (up to health emergency) are focused on both theoretical and practical aspects of the methodologies of Multivariate Statistics. The coursework in the Computer Lab alternates lectures and the applications to real case studies to link theory and practice in a self-directed learning.
  • Lesson code1022894
  • Academic year2024/2025
  • CourseStatistics, Economics, Finance and Insurance
  • CurriculumEconomia e finanza
  • Year3rd year
  • Semester1st semester
  • SSDSECS-S/01
  • CFU9
  • Subject areaStatistico, statistico applicato, demografico