July 30, 2025Clash Report
On the same day as Donald Trump’s inauguration as president DeepSeek, a Chinese company, released a world-class large language model (LLM). It was a wake-up call, observed Mr Trump. Mark Warner, vice-chair of the Senate Intelligence Committee, says that America’s intelligence community (IC), a group of 18 agencies and organisations, was “caught off guard”.
Last year the Biden administration grew concerned that Chinese spies and soldiers might leap ahead in the adoption of artificial intelligence (AI). It ordered its own intelligence agencies, the Pentagon and the Department of Energy (which builds nuclear weapons), to experiment more aggressively with cutting-edge models and work more closely with “frontier” AI labs—principally Anthropic, Google DeepMind and OpenAI.
On July 14th the Pentagon awarded contracts worth up to $200m each to Anthropic, Google and OpenAI, as well as to Elon Musk’s xAI—whose chatbot recently (and briefly) self-identified as Hitler after an update went awry—to experiment with “agentic” models. These can act on behalf of their users by breaking down complex tasks into steps and exercise control over other devices, such as cars or computers.
The frontier labs are busy in the spy world as well as the military one. Much of the early adoption has been in the area of LLM chatbots crunching top-secret data. In January Microsoft said that 26 of its cloud-computing products had been authorised for use in spy agencies. In June Anthropic said it had launched Claude Gov, which had been “already deployed by agencies at the highest level of us national security”. The models are now widely used in every American intelligence agency, alongside those from competing labs.
AI firms typically fine-tune their models to suit the spooks. Claude, Anthropic’s public-facing model, might reject documents with classified markings as part of its general safety features; Claude Gov is tweaked to avoid this. It also has “enhanced proficiency” in the languages and dialects that government users might need. The models typically run on secure servers disconnected from the public internet. A new breed of agentic models is now being built inside the agencies.
The same process is under way in Europe. “In generative AI we have tried to be very, very fast followers of the frontier models,” says a British source. “Everyone in UKIC [the UK intelligence community] has access to top-secret [LLM] capability.” Mistral, a French firm, and Europe’s only real AI champion, has a partnership with AMIAD, France’s military-AI agency. Mistral’s Saba model is trained on data from the Middle East and South Asia, making it particularly proficient in Arabic and smaller regional languages, such as Tamil. In January +972 Magazine reported that the Israeli armed forces’ use of GPT-4, then OpenAI’s most advanced LLM, increased 20-fold after the start of the Gaza war.
Despite all this, progress has been slow, says Katrina Mulligan, a former defence and intelligence official who leads OpenAI’s partnerships in this area. “Adoption of AI in the national-security space probably isn’t where we want it to be yet.” The NSA, America’s signals-intelligence agency, which has worked on earlier forms of AI, such as voice-recognition, for decades, is a pocket of excellence, says an insider. But many agencies still want to build their own “wrappers” around the labs’ chatbots, a process that often leaves them far behind the latest public models.
“The transformational piece is not just using it as a chatbot,” says Tarun Chhabra, who led technology policy for Joe Biden’s National Security Council and is now the head of national-security policy at Anthropic. “The transformational piece is: once you start using it, then how do I re-engineer the way I do the mission?”
Sceptics believe that these hopes are inflated. Richard Carter of the Alan Turing Institute, Britain’s national institute for AI, argues that what intelligence services in America and Britain really want is for the labs to significantly reduce “hallucinations” in existing LLMs. British agencies use a technique called “retrieval augmented generation”, in which one algorithm searches for reliable information and feeds it to an LLM, to minimise hallucinations, says the unnamed British source. “What you need in the IC is consistency, reliability, transparency and explainability,” Dr Carter warns. Instead, labs are focusing on more advanced agentic models.
Mistral, for example, is thought to have shown would-be clients a demonstration in which each stream of information, such as satellite images or voice intercepts, is paired with one AI agent, speeding up decision-making. Alternatively, imagine an AI agent tasked with identifying, researching and then contacting hundreds of Iranian nuclear scientists to encourage them to defect. “We haven’t thought enough about how agents might be used in a war-fighting context,” adds Mr Chhabra.
The problem with agentic models, warns Dr Carter, is that they recursively generate their own prompts in response to a task, making them more unpredictable and increasing the risk of compounding errors. OpenAI’s most recent agentic model, ChatGPT agent, hallucinates in around 8% of answers, a higher rate than the company’s earlier o3 model, according to an evaluation published by the firm.
Some AI labs see such concerns as bureaucratic rigidity, but it is simply a healthy conservatism, says Dr Carter. “What you have, particularly in the GCHQ,” he says, referring to the NSA’s British counterpart, “is an incredibly talented engineering workforce that are naturally quite sceptical about new technology.”
This also relates to a wider debate about where the future of AI lies. Dr Carter is among those who argue that the architecture of today’s general-purpose LLMs is not designed for the sort of cause-effect reasoning that gives them a solid grasp on the world. In his view, the priority for intelligence agencies should be to push for new types of reasoning models.
Others warn that China might be racing ahead. “There still remains a huge gap in our understanding as to how and how far China has moved to use DeepSeek” for military and intelligence gaps, says Philip Reiner of the Institute for Security and Technology, a think-tank in Silicon Valley. “They probably don’t have similar guardrails like we have on the models themselves and so they’re possibly going to be able to get more powerful insights, faster,” he says.
On July 23rd, the Trump administration ordered the Pentagon and intelligence agencies to regularly assess how quickly America’s national-security agencies are adopting AI relative to competitors such as China, and to “establish an approach for continuous adaptation”.
Almost everyone agrees on this. Senator Warner argues that American spooks have been doing a “crappy job” tracking China’s progress. “The acquisition of technology [and] penetration of Chinese tech companies is still quite low.” The biggest risk, says Ms Mulligan, is not that America rushes into the technology before understanding the risks. “It’s that DoD and the IC keep doing things the way they’ve always done them. What keeps me up at night is the real possibility that we could win the race to AGI [artificial general intelligence]...and lose the race on adoption.”
October 2025
August 2025
August 2025
August 2025
August 2025
August 2025