EXTREMISMAnti-Jewish and anti-Israel Bias Found in Leading AI Models, New ADL Report Finds

Published 25 March 2025

A comprehensive evaluation found that all four large language models (LLMs) — GPT (OpenAI), Claude (Anthropic), Gemini (Google), and Llama (Meta) — exhibited measurable anti-Jewish and anti-Israel bias, though the degree and nature of bias varied across models.

ADL (The Anti-Defamation League) released the most comprehensive evaluation to date of anti-Jewish and anti-Israel bias in major large language models (LLMs) - GPT (OpenAI), Claude (Anthropic), Gemini (Google), and Llama (Meta).

The ADL Center for Technology and Society, in collaboration with ADL’s Ratings and Assessments Institute, evaluated responses from four leading AI models, and uncovered concerning patterns of bias, misinformation, and selective engagement on issues related to Jewish people, Israel, and antisemitic tropes in all four of these AI models.

“Artificial intelligence is reshaping how people consume information, but as this research shows, AI models are not immune to deeply ingrained societal biases,” said Jonathan A. Greenblatt, CEO of ADL. “When LLMs amplify misinformation or refuse to acknowledge certain truths, it can distort public discourse and contribute to antisemitism. This report is an urgent call to AI developers to take responsibility for their products and implement stronger safeguards against bias.”

Key Findings from the Report:

·  All four LLMs exhibited measurable anti-Jewish and anti-Israel bias, though the degree and nature of bias varied across models.

·  Meta’s Llama model displayed the most pronounced anti-Jewish and anti-Israel biases, providing unreliable and sometimes outright false responses to questions related to Jewish people and Israel. As the only open-source model in the assessed group, Llama was found to be the lowest scoring model for both bias and reliability. Additionally, Llama is the only model whose lowest score was on a question about the role of Jews in the great replacement conspiracy theory.

·  GPT and Claude showed significant anti-Israel bias, particularly in responses regarding the Israel-Hamas war, where they struggled to provide consistent, fact-based answers.

·  LLMs refused to answer questions about Israel more frequently than other topics, reflecting a troubling inconsistency in how AI models handle political and historical subjects.

·  AI models demonstrated a concerning inability to accurately reject antisemitic tropes and conspiracy theories, highlighting the persistent challenges in preventing AI from amplifying misinformation.

“LLMs are already embedded in classrooms, workplaces, and social media moderation decisions, yet our findings show they are not adequately trained to prevent the spread of antisemitism and anti-Israel misinformation,” said Daniel Kelley, Interim Head of the ADL Center for Technology and Society. “AI companies must take proactive steps to address these failures, from improving their training data to refining their content moderation policies. We are committed to working with industry leaders to ensure these systems do not become vectors for hate and misinformation.”

As AI continues to shape public discourse, its role in disseminating bias—whether intentional or inadvertent—has profound implications. These systems are increasingly used in education, workplaces, and public communications, making it critical to ensure they do not reinforce harmful stereotypes or misinformation. To address these findings, the report formulates recommendations.

Recommendations for Developers

·  Conduct rigorous pre-deployment testing in partnership with academia, civil society, and governments.

·  Carefully consider the usefulness, reliability, and potential biases of training data.

·  Follow the NIST Risk Management Framework (RMF) for AI.  

Recommendations for Government:

·  Ensure that efforts to encourage AI also have built-in focus to ensure the safety of content and uses.  

·  Prioritize a regulatory framework that would include requirements that AI developers follow industry trust and safety best practices.

·  Invest in AI safety research so that society can achieve the gains of AI while mitigating the harms.   

This research was conducted in partnership with Builders for Tomorrow (BFT), a venture philanthropy and research organization focused on combating anti-Jewish and anti-West ideologies. 

ADL assessed these AI tools by asking each model to indicate a level of agreement with various statements in six categories related to antisemitism and anti-Israel bias and analyzed patterns among the results. Each LLM was queried 8,600 times for a total of 34,400 responses. A similar methodology has been used to evaluate other forms of bias such as political biasimplicit reasoning bias, and steerability bias, among others. This project represents the first stage of a broader ADL examination of LLMs and antisemitic bias. The findings shared in this report underscore the need for improved safeguards and mitigation strategies across the AI industry.

The ADL Center for Technology and Society and ADL’s Ratings and Assessments Institute will continue to evaluate AI bias and push for greater accountability in the development of AI technologies. This report represents the first step in an ongoing effort to track and mitigate biases in artificial intelligence.

The article is published courtesy of the Anti-Defamation League (ADL).