How federal agencies can separate AI hype from reality

There is a lot of marketing hype around AI, due in large part to the immense potential of the technology and the promise that it is transforming businesses overnight. However, it’s important to dig beyond the headlines and to approach AI with a critical mindset, as the actual capabilities of AI systems may vary, and the technology is not without its challenges and ethical considerations.

Federal agencies should look past the marketing hype created by companies trying to sell to the U.S. government and understand the current state of the technology and how it can be leveraged.

Here are some ways to identify fact from fiction:

Broad Claims vs. Specific Applications: Companies often tout their AI solutions as revolutionary, capable of solving a wide array of problems. However, real AI technology tends to have specific applications. For example, a company might claim their AI can “transform customer service across all industries,” but genuine AI solutions are usually designed for specific tasks, like an AI chatbot trained to handle banking inquiries, not universally applicable across sectors.

Transparency and Peer Review: Genuine AI advancements are typically accompanied by research papers or documentation that details the technology’s workings, often peer-reviewed or presented at reputable conferences. If a company’s claims about an AI technology are not backed by publicly accessible research or technical details, it may be a sign of hype. Google DeepMind’s AlphaGo, for instance, was validated through peer-reviewed research and publicized matches against human Go champions, establishing its legitimacy.

Realistic Limitations: Authentic AI research acknowledges limitations. AI technologies, whether Large Language Models or AI vision systems, have constraints based on current technological capabilities. Companies that claim their AI solutions have no limitations or fail to discuss potential challenges are likely overselling. For example, while LLMs like Generative Pre-trained Transformer (GPT) are powerful, researchers openly discuss their limitations in understanding context or generating accurate information beyond their training data.

Evidence of Practical Deployment: Look for evidence of the AI technology being deployed in real-world scenarios. Companies might claim their AI can automate complex tasks, but real success is demonstrated through case studies or testimonials from credible organizations. For example, IBM Watson’s deployment in healthcare is a mixed example; while it promised to revolutionize cancer treatment, actual outcomes have shown the challenges and limitations of applying AI in such a complex field, illustrating the gap between hype and practical application.

Scalability and Performance Claims: Be wary of claims that an AI solution can effortlessly scale to meet any demand. Scalability is a significant challenge in AI, and genuine solutions will discuss how they address this issue. Overhyped AI products might promise unlimited scalability without discussing the technological infrastructure required to support such claims.

How to identify hype vs reality around LLMs

One example is the hype surrounding Large Language Models (LLMs), which often paints them as near-magical tools capable of understanding and generating human-like text with little to no limitations. This overenthusiasm can lead to unrealistic expectations about the capabilities of LLMs, especially among those not familiar with the nuances of AI technology. Some claim that LLMs can replace human roles entirely in complex domains like journalism, creative writing, or legal advice, suggesting these models can understand context and nuance at a human level across any subject matter i.e. an LLM can autonomously write an entire novel that is indistinguishable from one written by a human author, complete with complex characters, intricate plots, and deep emotional insights.

However, once you understand the stochastic nature of LLMs—their probabilistic, rather than deterministic, way of generating text, it becomes clear that while LLMs are powerful tools for generating human-like text, they do not possess a true understanding of content or context the way humans do. They generate responses based on patterns in data they were trained on, rather than any real understanding or reasoning.

This understanding helps temper expectations, highlighting that while LLMs are revolutionary tools for assisting with creative tasks, augmenting content creation, and automating certain writing tasks, they are not replacements for human creativity, insight, and expertise.

The government has vast amounts of data in different locations and AI has many advantages which save agencies valuable time, money and resources, but federal agencies must know their data – doing AI requires a deep understanding of what data is available and the context of the data.

John Mark Suhy is CTO of Greystones Group. He brings more than 20 years of enterprise architecture and software development experience with agencies including FBI, Sandia Labs, Department of State, U.S. Treasury and the intel community.

Bandari Solutions Group

How federal agencies can separate AI hype from reality

How federal agencies can separate AI hype from reality

How to identify hype vs reality around LLMs

About

Quick Links