Natural Language Processing for Linguists

Course objectives

The course "Natural Language Processing for linguists" aims to train professionals with both theoretical and practical skills in the field of Natural Language Processing (NLP), with a particular focus on the tools and methodologies used in computational linguistics. The objectives include acquiring fundamental knowledge about linguistic models and NLP techniques, as well as the practical applications of these technologies in the humanities, with a special focus on linguistics. Lectures will be supported by practical computer exercises, where students will learn to implement and use linguistic analysis and annotation software. At the end of the course, students will be able to process, analyze, and create linguistic data, understand and apply NLP techniques to text, and collaborate effectively with technical teams for the development of NLP solutions. The areas covered will include: the extraction and study of collocations, Word Sense Disambiguation, Entity Linking, Semantic Role Labeling and sentence representations, the creation of benchmarks for language understanding and Large Language Models.

Channel 1
ROBERTO NAVIGLI Lecturers' profile

Program - Frequency - Exams

Course program
Module 1 – Introduction What is Natural Language Processing (NLP): definition, application areas, connections with linguistics. Brief history: from rule-based approaches to statistical models and deep learning. Practical examples: machine translation, summarization, chatbots, semantic search. Module 2 – Basic Tools Introduction to computers and programming (elementary concepts: data, algorithms, memory, input/output). Practical introduction to Python: working with Jupyter Notebook/Google Colab variables, data types, loops, functions, libraries essential notions for reading/writing texts and manipulating strings. (Objective: provide minimal autonomy to run NLP scripts without requiring advanced programming skills.) Module 3 – Corpora and Linguistic Pre-processing What is a corpus and why it is fundamental in NLP. Cleaning and tokenization techniques. Lemmatization and stemming. Morphological analysis (PoS tagging). Resources: Universal Dependencies, national linguistic corpora. (Lab: build a mini-corpus and analyze it with NLTK or spaCy.) Module 4 – Word Representation Explicit approaches: bag-of-words, TF-IDF. Embeddings: word2vec, GloVe, fastText. Semantic visualization (vector spaces). (Lab: compute semantic similarity between words using pre-trained embeddings.) Module 5 – Lexical Semantics Resources: WordNet, BabelNet, Wikipedia, Wikidata. Task: Word Sense Disambiguation (WSD). Techniques: resource-based vs. distributional models. (Lab: disambiguate polysemous words in a short text.) Module 6 – Phrasal Semantics Resources: FrameNet, PropBank, VerbAtlas, NounAtlas. Task: Semantic Role Labeling (SRL). Introduction to Natural Language Inference (NLI). (Lab: use a pre-trained model for semantic role labeling.) Module 7 – Large Language Models What is an LLM (e.g., GPT, BERT, mBERT, ChatGPT). Differences between “classical” models and deep neural networks. Prompting: how to interact with LLMs for linguistic tasks. Limitations and biases of models. (Lab: design prompts for translation, summarization, semantic classification.) Module 8 – Advanced Applications Automatic summarization (extractive vs. abstractive). Machine translation (traditional MT vs. NMT). Sentiment analysis and text classification. Information extraction and knowledge graphs. (Final lab: each student selects a small application project.)
Prerequisites
No prerequisite. No computer skills are required to start the course.
Books
M. Nissim, L. Pannitto. Che cos'è la linguistica computazionale?, Carocci, 2022. E. Jezek, R. Sprugnoli. Linguistica computazionale. Introduzione all'analisi automatica dei testi, Il Mulino, 2023. + material provided during the lectures
Frequency
In class attendance.
Exam mode
The assessment will be based on the submission of homework assignments to be completed during the course or, alternatively, on the submission of a project.
Lesson mode
In class attendance.
  • Lesson code10616149
  • Academic year2025/2026
  • CourseLinguistics
  • CurriculumSingle curriculum
  • Year2nd year
  • Semester1st semester
  • SSDINF/01
  • CFU6