DATA MINING AND CLASSIFICATION

Course objectives

Learning goals Thanks to technological advances, the acquisition of data has become inexpensive and big data sets are easily obtained, for example, via Internet, e-commerce or by electronic banking services. Such data can be stored in data warehouses and data marts specifically intended to support business decisions. Data mining provides the tools to manage and analyse these data, to extract the relevant information and build forecasting models, fundamental tools in areas such as credit evaluation, marketing, customer relationship management. The course will examine the data preprocessing methods and their importance. We'll cover some of non-parametric models for classification and regression: decision trees, neural networks, support vector machines. Ensemble learning methods (Bagging, Boosting, Stacking, Blended) will be illustrated. The course will address also the analysis of textual data and images. Knowledge and understanding. Acquire the basics of data mining techniques. Understanding how and why to choose between alternative statistical methods, or possibly how to combine different methods. Ability to handle large amounts of data with the help of appropriate, commercial and open source, software. Applying knowledge and understanding. Students develop critical skills through the application of a wide range of statistical and machine learning models. They also develop the critical sense through the comparison between alternative solutions to the same problem obtained using different learning logics. They learn to critically interpret the results obtained by applying the procedures to real data sets. Making judgements. Students develop critical skills through the application of a wide range of machine learning and statistical models. They also develop the critical sense through the comparison between alternative solutions to the same problem obtained using different learning logics. They learn to critically interpret the results obtained by applying the procedures to real data sets. Communication skills. Students, through the study and execution of practical exercises, acquire the technical-scientific language of the discipline, which must be used appropriately in both the intermediate and final written tests and in the oral tests. Communication skills are also developed through group activities. Learning skills. Students who pass the exam have learned a method of analysis that allows them to tackle, in subsequent statistical area teachings, the study of the formal properties of data mining procedures in more complex modeling contexts.

Channel 1
AGOSTINO DI CIACCIO Lecturers' profile

Program - Frequency - Exams

Course program
Thanks to technological advances, data acquisition has become inexpensive and large data sets are easily obtainable. It is possible to analyze this data to extract relevant information and build forecasting models, fundamental tools in areas such as credit assessment, marketing, customer relationship management. The course will examine data preprocessing methods and their importance. We will cover some of the non-parametric models for classification and regression: decision trees, support vector machines. The learning methods using ensembles (Bagging, Boosting, Stacking, Blended) will be illustrated. Particular attention will be paid to neural network models. The course will also address the analysis of textual data and images. The software that we will use during most of the course is SAS Viya. We will also prepare for the (optional) certification exam which will take place in September after an additional mini course. Those who pass the exam by July will also receive the digital badge which will certify their skills in using SAS Viya for Machine Learning. In the last part of the course, we will study Natural Language Processing and we will also use Python to apply complex neural network models.
Prerequisites
In order to successfully attend the course it is necessary to have completed a statistical inference course.
Books
Course notes provided by the teacher and arguments drawn from the following texts: Data Mining: Concepts and Techniques (J. Han, M. Kamber); ), An Introduction to Statistical Learning with application in R (James, Witten, Hastie, Tibshirani); The Elements of Statistical Learning, Data Mining, Inference and Prediction (T. Hastie, R. Tibshirani, J. Friedman, Springer-Verlag). SAS Manual: Machine Learning using SAS Viya (LWCPML84).
Teaching mode
The teaching will preferably be carried out in the presence.
Frequency
Attendance is strongly recommended, considering activities with SAS software
Exam mode
Written, oral exam, homework project
Lesson mode
The teaching will preferably be carried out in the presence.
AGOSTINO DI CIACCIO Lecturers' profile

Program - Frequency - Exams

Course program
Thanks to technological advances, data acquisition has become inexpensive and large data sets are easily obtainable. It is possible to analyze this data to extract relevant information and build forecasting models, fundamental tools in areas such as credit assessment, marketing, customer relationship management. The course will examine data preprocessing methods and their importance. We will cover some of the non-parametric models for classification and regression: decision trees, support vector machines. The learning methods using ensembles (Bagging, Boosting, Stacking, Blended) will be illustrated. Particular attention will be paid to neural network models. The course will also address the analysis of textual data and images. The software that we will use during most of the course is SAS Viya. We will also prepare for the (optional) certification exam which will take place in September after an additional mini course. Those who pass the exam by July will also receive the digital badge which will certify their skills in using SAS Viya for Machine Learning. In the last part of the course, we will study Natural Language Processing and we will also use Python to apply complex neural network models.
Prerequisites
In order to successfully attend the course it is necessary to have completed a statistical inference course.
Books
Course notes provided by the teacher and arguments drawn from the following texts: Data Mining: Concepts and Techniques (J. Han, M. Kamber); ), An Introduction to Statistical Learning with application in R (James, Witten, Hastie, Tibshirani); The Elements of Statistical Learning, Data Mining, Inference and Prediction (T. Hastie, R. Tibshirani, J. Friedman, Springer-Verlag). SAS Manual: Machine Learning using SAS Viya (LWCPML84).
Teaching mode
The teaching will preferably be carried out in the presence.
Frequency
Attendance is strongly recommended, considering activities with SAS software
Exam mode
Written, oral exam, homework project
Lesson mode
The teaching will preferably be carried out in the presence.
  • Lesson code1022798
  • Academic year2025/2026
  • CourseStatistical Sciences
  • CurriculumDemografico sociale
  • Year1st year
  • Semester2nd semester
  • SSDSECS-S/01
  • CFU9