Course program
Introduction to statistical terminology: population, statistical unit, variable. Statistical disaggregate distribution and frequency distribution. Cumulative frequencies. Frequency distribution by classes.
Graphs. Empirical distribution function and its graphical representation.
Introduction to means: analytical means and location indices. Arithmetic mean and its properties. Arithmetic mean for a frequency distribution and for a frequency distribution by classes. Mode, median and quantiles. Calculation of mode, median and quantiles for a frequency distribution and for a frequency distribution by classes.
Variability indices: simple average deviations, standard deviation and variance. Deviance decomposition formula. Mean absolute difference. Range. Interquartile range. Coefficient of variation. Concentration and its measures: Lorenz curve and the Gini coefficient for a statistical disaggregate distribution. Heterogeneity and its measures.
Symmetry and measures of skewness.
Index numbers: simple and complex.
Bivariate distribution. Absolute independence, perfect dependency and measures of dependency. Independence in average and measures of the dependency in average degree. Linear independence and measures of the linear association degree.
Introduction to the simple linear regression. Calculation of the parameters values of the regression line. Goodness of fit of the regression line to the observed data and the coefficient of determination.
Introduction to the probability theory. Random events and space of events. Probability definitions. Assignment of probabilities to events. The most important theorems of probability. Conditional probability and stochastic independence. The Bayes theorem.
Introduction to the random variables theory. Discrete random variable, continue random variable and distribution function. Expected value and variance of a random variable. Some discrete probabilistic models: Bernoulli, binomial and Poisson. Normal distribution and the use of its tables.
The law of large numbers and the central limit theorem.
Population and sample: introduction to the sampling distributions. The sampling distribution of the sample mean. References to the point estimation theory and to the properties of estimators.
Introduction to the confidence intervals theory. Confidence intervals for the mean of a normal population with known variance and with unknown variance. Confidence intervals for the mean of a population for a large sample. Confidence intervals for the proportion for a large sample.
Introduction to the statistical hypothesis testing theory and types of error. Hypothesis testing for the mean of a normal population with known variance and with unknown variance. Hypothesis testing for the mean of a population for a large sample. Hypothesis testing for the proportion for a large sample.
Chi-square test of independence.
Statistical inference for the simple linear regression model.
The time devoted to each part of the program may vary from time to time depending on the students' feedback
Prerequisites
Knowledge of the basic notions and instruments of calculus
Books
G. Cicchitelli, P. D’Urso, M. Minozzo, Statistica: principi e metodi, Pearson, 2022, except for the following sections and sub-sections: 4.3, 4.4, 4.6, 4.14, 5.3, 6.3, 14.7, 21, 24-26.
As an alternative: S. Borra e A. Di Ciaccio, Statistica, McGraw-Hill, IV ed. or former, except for the following sections, sub-sections and chapters: 3.3, 3.4, 6.7, 10.7, 11.9, 13.7.1, 13.8, 14.5-14.7, 16, 18-21; chapters 7 and 17 are not needed for the exam, but they can be useful for exercise.
Being an institutional course, students can actually refer to any basic statistic text at university level containing all the topics present in the program.
Further didactic materials will eventually be made available online at https://web.uniroma1.it/memotef/users/guagnano-giuseppina
Frequency
Attendance is not mandatory and is expected to be in presence
Exam mode
The evaluation aims to assess the knowledge that students acquired, as well as their skills in explaining theoretical concepts using the appropriate terminology, in quantitative analysis of real data (applying the most appropriate statistical tools) and in the critical interpretation of the results obtained in the statistical analysis.
The evaluation is based on a written test of two hours, containing numerical exercises and multiple-choice or open questions. In particular, the multiple-choice and open questions are designed to test the candidate's knowledge of the theoretical aspects of the subjects covered by the programme and the ability to interpret and critically evaluate the results of the use of statistical tools; the numerical exercises are designed to test the candidate's ability to use statistical tools in a quantitative analysis.
Each question is given a specific mark and the overall score of the written test is expressed in thirtieths.
In certain cases - e.g. if a student scores just under 18/30 ,or if the origin of the answers given is doubtful - the teacher reserves the right to supplement the written test with an oral examination. In all other cases, the oral test is optional. However, for those who obtain a score higher than 23/30 in the written test, the oral exam will be required to confirm this mark; in the absence of this further confirmation, the score of 23/30 will be recorded, regardless of the previous score. In all cases, the final mark will be the average of the marks obtained in the two tests (written and oral).
Lesson mode
The teaching activity is mainly carried out through lectures, but may also include group works and the draft of short paper dealing with real data analysis.