Ethical AI: A bane for science?

AI holds vast potential, but a growing body of evidence has revealed deep flaws in how it is used in science. An interdisciplinary team of researchers led by Princeton University has published guidelines for responsible AI use in science

Image credits: Canva

AI holds the potential to help doctors find early markers of disease and policymakers avoid decisions that lead to war. But a growing body of evidence has revealed deep flaws in how machine learning is used in science, a problem that has swept through dozens of fields and implicated thousands of erroneous papers.

Now, an interdisciplinary team of 19 researchers, led by Princeton University computer scientists Arvind Narayanan and Sayash Kapoor, has published guidelines for the responsible use of machine learning in science.

“When we graduate from traditional statistical methods to machine learning methods, there are a vastly greater number of ways to shoot oneself in the foot,” said Narayanan, director of Princeton’s Center for Information Technology Policy and a professor of computer science. “If we don’t have an intervention to improve our scientific standards and reporting standards when it comes to machine learning-based science, we risk not just one discipline but many different scientific disciplines rediscovering these crises one after another.”

The authors say their work is an effort to stamp out this smouldering crisis of credibility that threatens to engulf nearly every corner of the research enterprise. A paper detailing their guidelines appeared on May 1 in the journal, Science Advances.

Since machine learning has been adopted across virtually every scientific discipline, with no universal standards safeguarding the integrity of those methods, Narayanan said the current crisis, which he calls the reproducibility crisis, could become far more serious than the replication crisis that emerged in social psychology more than a decade ago.

The good news is that a simple set of best practices can help resolve this newer crisis before it gets out of hand, according to the authors, who come from computer science, mathematics, social science and health research.

“This is a systematic problem with systematic solutions,” said Kapoor, a graduate student who works with Narayanan and who organised the effort to produce the new consensus-based checklist.

The checklist focuses on ensuring the integrity of research that uses machine learning. Science depends on the ability to independently reproduce results and validate claims. Otherwise, new work cannot be reliably built atop old work, and the entire enterprise collapses. While other researchers have developed checklists that apply to discipline-specific problems, notably in medicine, the new guidelines start with the underlying methods and apply them to any quantitative discipline.

One of the main takeaways is transparency. The checklist calls on researchers to provide detailed descriptions of each machine learning model, including the code, the data used to train and test the model, the hardware specifications used to produce the results, the experimental design, the project’s goals and any limitations of the study’s findings. The standards are flexible enough to accommodate a wide range of nuance, including private datasets and complex hardware configurations, according to the authors.

While the increased rigour of these new standards might slow the publication of any given study, the authors believe wide adoption of these standards would increase the overall rate of discovery and innovation, potentially by a lot.

“What we ultimately care about is the pace of scientific progress,” said sociologist Emily Cantrell, one of the lead authors, who is pursuing her PhD at Princeton. “By making sure the papers that get published are of high quality and that they’re a solid base for future papers to build on, that potentially then speeds up the pace of scientific progress. Focusing on scientific progress itself and not just getting papers out the door is really where our emphasis should be.”

Kapoor concurred. The errors hurt: “At the collective level, it’s just a major time sink,” he said. That time costs money. And that money, once wasted, could have catastrophic downstream effects, limiting the kinds of science that attract funding and investment, tanking ventures that are inadvertently built on faulty science, and discouraging countless numbers of young researchers.

In working towards a consensus about what should be included in the guidelines, the authors said they aimed to strike a balance: simple enough to be widely adopted, comprehensive enough to catch as many common mistakes as possible.

They say researchers could adopt the standards to improve their own work; peer reviewers could use the checklist to assess papers; and journals could adopt the standards as a requirement for publication.

“The scientific literature, especially in applied machine learning research, is full of avoidable errors,” Narayanan said. “And we want to help people. We want to keep honest people honest.”

Tweet Post Post

Crisis Response Journal

Tweets by @CRJ_reports

News and Blogs

The hidden resilience of football

July 2026: The physical demands facing World Cup players reveal how organisations can better understand fatigue and recovery in heat, write Andrew Lane and Ross Cloak

CRJ’s advisor, Matthew Porcelli, joins world’s leading security experts

July 2026: Recognising his contribution to security and life safety, Matthew Porcelli has been named among the Life Safety Alliance’s Top 40 Global Thought Leaders for 2026

Do you understand your risks?

July 2026: Amanda Coleman explores what FIFA’s 2026 World Cup reputational challenges can teach us about identifying, preparing for, and mitigating risks before they become crises

The human energy of transformation

July 2026: Erik de Soir explores why transformation is fundamentally human, and how leaders can rebuild energy, connection and resilience when organisations face disruption or trauma

When crime meets the climate

July 2026: Researchers at Goethe University Frankfurt, Germany, launched a research project to uncover how organised crime exploits climate regulations and undermines climate protection efforts

Does the military execute policy or make it?

July 2026: A doctoral thesis by Fredrik Westerlund from Åbo Akademi University, Finland, provided a new framework to classify how armed forces influence foreign policy, looking at insights from Russian military campaigns in Syria and Georgia