Spotting climate misinformation with AI requires expertly trained models
Meta's Llama and Google’s Gemini lag behind proprietary ones in the task

Organizations that want to counter climate misinformation with AI need to bring in experts to guide training of the models, a new study suggests.
rob dobi/Moment/Getty Images Plus
By Ananya
Conversational AI chatbots are making climate misinformation sound more credible, making it harder to distinguish falsehoods from real science. In response, climate experts are using some of the same tools to detect fake information online.
But when it comes to classifying false or misleading climate claims, general-purpose large language models, or LLMs — such as Meta’s Llama and OpenAI’s GPT-4 — lag behind models specifically trained on expert-curated climate data, scientists reported in March at the AAAI Conference on Artificial Intelligence in Philadelphia. Climate groups wishing to use commonly available LLMs in chatbots and content moderation tools to check climate misinformation need to carefully consider the models they use and bring in relevant experts to guide the training process, the findings show.
Compared to other types of claims, climate change misinformation is often “cloaked in false or misleading scientific information,” which makes it more difficult for humans and machines to spot the intricacies of climate science, says Erik Nisbet, a communications expert at Northwestern University in Evanston, Ill.
To evaluate the models, Nisbet and his colleagues used a dataset called CARDS, which contains approximately 28,900 paragraphs in English from 53 climate-skeptic websites and blogs. The paragraphs fall into five categories: “global warming is not happening,” “human greenhouse gases are not causing global warming,” “climate impacts are not bad,” “climate solutions won’t work” and “climate movement/science is unreliable.”
The researchers built a climate-specific LLM by retraining, or fine-tuning, OpenAI’s GPT-3.5-turbo3 on about 26,000 paragraphs from the same dataset. Then, the team compared the performance of the fine-tuned, proprietary model against 16 general purpose LLMs and an openly available, small-scale language model (RoBERTa) trained on the CARDS dataset. These models classified the remaining 2,900 paragraphs of misleading claims.
Nisbet’s team assessed the models by scoring how well each classified the claims into the correct categories. The fine-tuned GPT model scored 0.84 out of 1.00 on the measure scale. The general purpose GPT-4o and GPT-4 models had lower scores of 0.75 and 0.74, comparable to the 0.77 score of the small RoBERTa model. This showed that including expert feedback during training improves classification performance. But the other nonproprietary models tested, such as those by Meta and Mistral, performed poorly, logging scores of up to only 0.28.
This is an obvious outcome, says Hannah Metzler, a misinformation expert from Complexity Science Hub in Vienna. The researchers faced computational constraints when using the nonproprietary models and couldn’t use more powerful ones. “This shows that if you don’t have huge resources, which climate organizations won’t have, of course there will be issues if you don’t want to use the proprietary models,” she says. “It shows there’s a big need for governments to create open-source models and give us resources to use this.”
The researchers also tested the fine-tuned model and the CARDS-trained model on classifying false claims in 914 paragraphs about climate change published on Facebook and X by low-credibility websites. The fine-tuned GPT model’s classifications showed high agreement with categories marked by two climate communication experts and outperformed the RoBERTa model. But, the GPT model struggled to categorize claims about the impact of climate change on animals and plants, probably due to a lack of sufficient examples in the training data.
Another issue is that generic models might not keep up with shifts in the information being shared. “Climate misinformation constantly varies and adapts,” Metzler says, “and it’s always gonna be difficult to run after that.”