Scienze statistiche - Statistical Sciences

STATISTICAL LEARNING

Obiettivi formativi

Learning goals Devising new machine learning methods and statistical models is a fun and extremely fruitful “art”. But these powerful tools are not useful unless we understand when they work, and when they fail. The main goal of statistical learning theory is thus to study, in a statistical framework, the properties of learning algorithms mainly in the form of so-called error bounds. This course introduces the techniques that are used to obtain such results, combining methodology with theoretical foundations and computational aspects. It treats both the basic principles to design successful learning algorithms and the “science” of analyzing an algorithm’s statistical properties and performance guarantees. Theorems are presented together with practical aspects of methodology and intuition to help students develop tools for selecting appropriate methods and approaches to problems in their own data analyses. Methods for a wide variety of applied problems will be explored and implemented on open-source software like R (www.r-project.org), Keras (https://keras.io/) and TensorFlow (https://www.tensorflow.org/). Knowledge and understanding On successful completion of this course, students will: know the main learning methodologies and paradigms with their strengths and weakness; be able to identify a proper learning model for a given problem; assess the empirical and theoretical performance of different learning models; know the main platforms, programming languages and solutions to develop effective implementations. Applying knowledge and understanding Besides the understanding of theoretical aspects, thanks to applied homeworks and a final project possibly linked to hackathons or other data analysis competitions, the students will constantly be challenged to use and evaluate modern learning techniques and algorithms. Making judgements On successful completion of this course, students will develop a positive critical attitude towards the empirical and theoretical evaluation of statistical learning paradigms and techniques. Communication skills In preparing the report and oral presentation for the final project, students will learn how to effectively communicate original ideas, experimental results and the principles behind advanced data analytic techniques in written and oral form. They will also understand how to offer constructive critiques on the presentations of their peers. Learning skills In this course the students will develop the skills necessary for a successful understanding as well as development of new learning methodologies together with their effective implementation. The goal is of course to grow a active attitude towards continued learning throughout a professional career.

Canale 1

PIERPAOLO BRUTTI Scheda docente

Programmi - Frequenza - Esami

Programma

1. Review of basic probability and statistical inference 2. The big-5 of concentration of measure: Markov, Chebyshev, Chernoff, Hoeffding and Bernstein 3. Model complexity: VC dimension and Rademacher complexity 4. The optimization we need: from (stochastic) gradient to KKT conditions and convexity 5. Supervised and unsupervised learning: an overview 6. The quest for nonlinearities: from kernels, RKHS, trees and forest to deep networks 7. Nonparametric Regression and Density estimation 7. Nonparametric Classification. 8. Nonparametric Clustering: k-means, density clustering. 9. Ensemble methods 10. Minimaxity & Sparsity Theory

Prerequisiti

Inferenza Statistica, Probabilità di Base, Algebra Lineare, Analisi Matematica

Testi di riferimento

Riferimenti principali - Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar (2018). Fundamentals of Machine Learning. MIT Press Available at: https://cs.nyu.edu/~mohri/mlbook/ - Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (2013). An Introduction to Statistical Learning with Applications in R. Available at: http://www-bcf.usc.edu/~gareth/ISL/ - Larry Wasserman (2005). All of Nonparametric Statistics. - Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Available at: http://stanford.edu/~boyd/cvxbook/

Modalità insegnamento

Lezioni frontali

Frequenza

Facoltativa

Modalità di esame

Homeworks + Final Project/Hackathon

Bibliografia

More Advanced/In Depth: - Hastie et al., The Elements of Statistical Learning (https://web.stanford.edu/~hastie/ElemStatLearn/) - Mohri et al., Foundations of Machine Learning (2018) - Tsybakov, Introduction to Nonparametric Estimation (2009) - Shawe-Taylor and Cristianini, Kernel Methods for Pattern Analysis (2004) Deadly Alternatives: - Devroye et al., A Probabilistic Theory of Pattern Recognition (1996) - Wainwright, High-Dimensional Statistics (2019) R Programming - R for Data Science (http://r4ds.had.co.nz/) - Applied Predictive Modeling (http://appliedpredictivemodeling.com/) - Feature Engineering and Selection (http://www.feat.engineering/)

Modalità di erogazione

Lezioni frontali e attività laboratoriali