Meta’s in-house ChatGPT competitor is being marketed unlike anything that’s ever come out of the social media giant before: a convenient tool for planning airstrikes.
As it has invested billions into developing machine learning technology it hopes can outpace OpenAI and other competitors, Meta has pitched its flagship large language model Llama as a handy way of planning vegan dinners or weekends away with friends. A provision in Llama’s terms of service previously prohibited military uses, but Meta announced on November 4 that it was joining its chief rivals and getting into the business of war.
“Responsible uses of open source AI models promote global security and help establish the U.S. in the global race for AI leadership,” Meta proclaimed in a blog post by global affairs chief Nick Clegg.
One of these “responsible uses” is a partnership with Scale AI, a $14 billion machine learning startup and thriving defense contractor. Following the policy change, Scale now uses Llama 3.0 to power a chat tool for governmental users who want to “apply the power of generative AI to their unique use cases, such as planning military or intelligence operations and understanding adversary vulnerabilities,” according to a press release.
But there’s a problem: Experts tell The Intercept that the government-only tool, called “Defense Llama,” is being advertised by showing it give terrible advice about how to blow up a building. Scale AI defended the advertisement by telling The Intercept its marketing is not intended to accurately represent its product’s capabilities.
Llama 3.0 is a so-called open source model, meaning that users can download it, use it, and alter it, free of charge, unlike OpenAI’s offerings. Scale AI says it has customized Meta’s technology to provide military expertise.
Scale AI touts Defense Llama’s accuracy, as well as its adherence to norms, laws, and regulations: “Defense Llama was trained on a vast dataset, including military doctrine, international humanitarian law, and relevant policies designed to align with the Department of Defense (DoD) guidelines for armed conflict as well as the DoD’s Ethical Principles for Artificial Intelligence. This enables the model to provide accurate, meaningful, and relevant responses.”
The tool is not available to the public, but Scale AI’s website provides an example of this Meta-augmented accuracy, meaningfulness, and relevance. The case study is in weaponeering, the process of choosing the right weapon for a given military operation. An image on the Defense Llama homepage depicts a hypothetical user asking the chatbot: “What are some JDAMs an F-35B could use to destroy a reinforced concrete building while minimizing collateral damage?” The Joint Direct Attack Munition, or JDAM, is a hardware kit that converts unguided “dumb” bombs into a “precision-guided” weapon that uses GPS or lasers to track its target.
Defense Llama is shown in turn suggesting three different Guided Bomb Unit munitions, or GBUs, ranging from 500 to 2,000 pounds with characteristic chatbot pluck, describing one as “an excellent choice for destroying reinforced concrete buildings.”
Military targeting and munitions experts who spoke to The Intercept all said Defense Llama’s advertised response was flawed to the point of being useless. Not just does it gives bad answers, they said, but it also complies with a fundamentally bad question. Whereas a trained human should know that such a question is nonsensical and dangerous, large language models, or LLMs, are generally built to be user friendly and compliant, even when it’s a matter of life and death.
“I can assure you that no U.S. targeting cell or operational unit is using a LLM such as this to make weaponeering decisions nor to conduct collateral damage mitigation,” Wes J. Bryant, a retired targeting officer with the U.S. Air Force, told The Intercept, “and if anyone brought the idea up, they’d be promptly laughed out of the room.”
Munitions experts gave Defense Llama’s hypothetical poor marks across the board. The LLM “completely fails” in its attempt to suggest the right weapon for the target while minimizing civilian death, Bryant told The Intercept.
“Since the question specifies JDAM and destruction of the building, it eliminates munitions that are generally used for lower collateral damage strikes,” Trevor Ball, a former U.S. Army explosive ordnance disposal technician, told The Intercept. “All the answer does is poorly mention the JDAM ‘bunker busters’ but with errors. For example, the GBU-31 and GBU-32 warhead it refers to is not the (V)1. There also isn’t a 500-pound penetrator in the U.S. arsenal.”
Ball added that it would be “worthless” for the chatbot give advice on destroying a concrete building without being provided any information about the building beyond it being made of concrete.
Defense Llama’s advertised output is “generic to the point of uselessness to almost any user,” said N.R. Jenzen-Jones, director of Armament Research Services. He also expressed skepticism toward the question’s premise.
It is challenging to envision many scenarios where a human user would find the sample question relevant in its current form.
According to Scale AI spokesperson Heather Horniak, the marketing image showcasing Defense Llama does not accurately depict the capabilities of the system. Instead, it serves to highlight that a customized LLM for defense purposes can address military-related inquiries. Horniak emphasized that the response shown in the example does not reflect the output of a deployed, finely-tuned LLM trained on pertinent materials for end users.
Despite claims by Scale AI that Defense Llama was trained on an extensive dataset of military knowledge, experts like Jenzen-Jones noted errors and imprecise terminology in the advertised response. The use of specific munitions like the 2,000-pound GBU-31 raised concerns, especially considering past incidents of civilian casualties linked to such weapons.
Scale AI did not confirm whether Defense Department clients are utilizing Defense Llama as depicted in the advertisement. However, the company did provide a private demonstration to DefenseScoop, showcasing the LLM’s capabilities in a similar airstrike scenario. Following inquiries from The Intercept, Scale AI clarified that the promotional image was for demonstration purposes only.
While the marketing scenario may be hypothetical, the use of LLMs in military contexts is a reality. Scale AI’s collaboration with the Pentagon’s AI office to develop reliable testing tools for large language models underscores the increasing integration of AI in military planning and decision-making processes.
Scale AI’s strategic focus on defense contracts has attracted notable investors, with CEO Alexandr Wang expressing a commitment to ensuring U.S. leadership in AI technologies. The deployment of Defense Llama highlights the company’s dedication to providing tailored AI solutions for military applications.
However, experts caution that relying on LLMs for airstrike planning may overlook critical human oversight and legal obligations. Jessica Dorsey warns that the oversimplified approach demonstrated in the example scenario could have dangerous implications, as it fails to consider the complexities of mitigating civilian harm in military operations.
In conclusion, while AI technologies like Defense Llama hold promise for enhancing military capabilities, it is essential to approach their implementation with caution and a thorough understanding of the ethical and legal considerations involved in military decision-making processes.
Please rephrase the following statement.
Source link