The aim of this tutorial is to provide you with conceptual understanding and hands-on experience of using state-of-the-art pre-trained language models and adapting them to relevant social data science applications. If you do not have much hands-on experience with neural language models and fine-tuning them on specific downstream tasks, this tutorial will introduce you to state-of-the-art methods from the field. Coding examples and methods introduced in this tutorial, can be easily adapted to variety of computational modelling questions in fields such as economics, finance, sociology, or political science. The tutorial will cover an array of different downstream tasks. After this tutorial, you will be in possession of the code basis - and the understanding of how to use it - for an operational transformer fine-tuning pipeline that you can modify for your own use cases.
BERT (and other models based on transformer architectures): a recent method of pre-training language representations which obtain had obtained state-of-the-art results on a wide array of natural language processing (NLP) tasks. More here: https://arxiv.org/abs/1810.04805.
Hugging Face: A library and API containing a vast collection pre-trained models that can be downloaded and used off the shelf. They can also be easily adapted to many real-life problems using small amounts of labelled data. More here https://huggingface.co/docs/transformers/index
Structure
A brief history of recent advancements in NLP
Overview of foundational language models (transformers) hand how to use them
Building an end-to-end pipeline to fine-tune foundational models on domain specific datasets and tasks
An overview of what's ahead: current capabilities, limitations, and latest developments