AI & EXTREMISMArtificial Intelligence and Extremism: The Threat of Language Models for Propaganda Purposes

By Stephane Baele

Published 25 October 2022

Recent large-scale projects in the field of Artificial Intelligence have dramatically improved the quality of language models, unfolding a wide range of practical applications. Language models are statistical models that calculate probability distributions over sequences of words. Language models can make many beneficial contributions, but they may also be misused by extremist actors for propaganda purposes.

Recent large-scale projects in the field of Artificial Intelligence have dramatically improved the quality of language models, unfolding a wide range of practical applications from automated speech/voice recognition and autocomplete to more specialized applications in healthcare and finance. Yet the power of this tool has also, inevitably, raised concerns about potential malicious uses by political actors. This CREST guide highlights the threat of one specific misuse: the potential use of language models by extremist actors for propaganda purposes.

The Rise of Language Models
Language models are statistical models that calculate probability distributions over sequences of words.  Over the past five years, language modelling has experienced massive improvement – amounting to no less than a ‘paradigm shift’ according to some researchers (Bommasani et al. 2021) – with the rise of ‘foundation models’.  Foundation models are large language models with millions of parameters in their deep learning neural network architecture, trained on extremely large and broad data, which can be adopted to a wide range of downstream tasks with minimal fine-tuning.

The development of these models is very expensive, necessitating large teams of developers, numerous servers, and extensive data to train on. As a consequence, performant models have been created by well-endowed projects or companies like Google (BERT in 2018), OpenAI (GPT- 2 in 2019, GPT-3 in 2020), and DeepMind (GOPHER in 2022), who entered a race to design and deliver the most powerful model trained on the biggest base corpus, implementing the most parameters, and resting on the most pertinent architecture. GPT-3, for instance, was trained on approximately 500 billion words scraped from a wide range of internet spaces between 2016 and 2019; its development is estimated to have costed over $15million on top of staff salaries.  Microsoft started an investment in OpenAI of no less than $1billion in July 2019.

Warnings of Malicious Use
These fast developments come with excitement and hype, but also serious concerns. As Bommasani and colleagues (2021, pp.7-8) ask, “given the protean nature of foundation models and their unmapped capabilities, how can we responsibly anticipate and address the ethical and social considerations they raise?”