Course program
Module 1 – Introduction
What is Natural Language Processing (NLP): definition, application areas, connections with linguistics.
Brief history: from rule-based approaches to statistical models and deep learning.
Practical examples: machine translation, summarization, chatbots, semantic search.
Module 2 – Basic Tools
Introduction to computers and programming (elementary concepts: data, algorithms, memory, input/output).
Practical introduction to Python:
working with Jupyter Notebook/Google Colab
variables, data types, loops, functions, libraries
essential notions for reading/writing texts and manipulating strings.
(Objective: provide minimal autonomy to run NLP scripts without requiring advanced programming skills.)
Module 3 – Corpora and Linguistic Pre-processing
What is a corpus and why it is fundamental in NLP.
Cleaning and tokenization techniques.
Lemmatization and stemming.
Morphological analysis (PoS tagging).
Resources: Universal Dependencies, national linguistic corpora.
(Lab: build a mini-corpus and analyze it with NLTK or spaCy.)
Module 4 – Word Representation
Explicit approaches: bag-of-words, TF-IDF.
Embeddings: word2vec, GloVe, fastText.
Semantic visualization (vector spaces).
(Lab: compute semantic similarity between words using pre-trained embeddings.)
Module 5 – Lexical Semantics
Resources: WordNet, BabelNet, Wikipedia, Wikidata.
Task: Word Sense Disambiguation (WSD).
Techniques: resource-based vs. distributional models.
(Lab: disambiguate polysemous words in a short text.)
Module 6 – Phrasal Semantics
Resources: FrameNet, PropBank, VerbAtlas, NounAtlas.
Task: Semantic Role Labeling (SRL).
Introduction to Natural Language Inference (NLI).
(Lab: use a pre-trained model for semantic role labeling.)
Module 7 – Large Language Models
What is an LLM (e.g., GPT, BERT, mBERT, ChatGPT).
Differences between “classical” models and deep neural networks.
Prompting: how to interact with LLMs for linguistic tasks.
Limitations and biases of models.
(Lab: design prompts for translation, summarization, semantic classification.)
Module 8 – Advanced Applications
Automatic summarization (extractive vs. abstractive).
Machine translation (traditional MT vs. NMT).
Sentiment analysis and text classification.
Information extraction and knowledge graphs.
(Final lab: each student selects a small application project.)
Prerequisites
No prerequisite. No computer skills are required to start the course.
Books
M. Nissim, L. Pannitto. Che cos'è la linguistica computazionale?, Carocci, 2022.
E. Jezek, R. Sprugnoli. Linguistica computazionale. Introduzione all'analisi automatica dei testi, Il Mulino, 2023.
+ material provided during the lectures
Frequency
In class attendance.
Exam mode
The assessment will be based on the submission of homework assignments to be completed during the course or, alternatively, on the submission of a project.
Lesson mode
In class attendance.