Pages

15 January, 2026

The bad data problem that could worsen in 2026

Bad data is just as bad as no data at all for AI-driven decisions, if not worse, say industry observers. IDC Futurescape has forecast that by 2027, companies that do not prioritise high-quality, AI-ready data will struggle scaling generative AI and agentic solutions, resulting in a 15% productivity loss.

Source: Cloudera. Remus Lim.
Source: Cloudera.
Lim.
Remus Lim, Senior VP, Asia Pacific & Japan, Cloudera said: "In 2026, many enterprises will confront a hard truth: the excitement of pilots fades in production. We see news of large organisations racing to pour tons of resources into the next breakthrough, and smaller players taking a more measured approach.

"However, no matter their size or ambition, every company will come to the same realisation: the success of AI is dependent on having a strong data foundation. As regulatory pressures tighten and expectations rise, getting their data right will determine how effectively organisations can scale safely, innovate confidently, and deliver measurable business impact." 

A new kind of AI slop will spike as companies generate data faster than they can manage, predicted Steve Yen, Co-Founder, Couchbase. "The messy part of AI for the enterprise won’t be the goofy content everyone jokes about. The real issue is the surge of semi-structured and regenerated data that AI produces, and the companies that shore up their data foundations now will be in the best position to take advantage of it," he said.

"As business teams start building features, rewriting content and generating new forms of data on their own, AI will create new tables, new fields and new analytical artifacts at a pace older systems were never designed to handle. Without the right data infrastructure, the result is confusion, duplicated information and a steady drop in confidence in what the data is telling you.

"To keep this from turning into chaos, enterprises need systems that can absorb constant structural changes, handle heavy ingest and support fast iteration. They also need ways to keep AI grounded in reliable operational data so outputs don’t drift or degrade. Pairing flexible JSON models with capabilities like vector search and production-grade scalability gives teams a practical way to move faster than the competition." 

Gopi Duddi, CTO, Couchbase, provided more insight: "Companies will start pulling far more insight from unstructured data because the cost and accessibility of agents will make it practical to mine information that once sat untouched in silos. Data in Slack, email and documents will no longer sit idle since agents can run continuously and gather it in a usable form. 

"As more communication happens in natural language, the systems that process it will learn to work without predefined schema. JSON becomes even more important because it is easy for humans to read and easy for machines to interpret. This shift favours platforms that that handle unstructured data well and can vectorise it for search and context-rich retrieval."

Matthew Oostveen, VP & CTO, Asia Pacific & Japan, Pure Storage, highlighted that data coherence will be paramount. "Enterprises will recognise that AI accuracy depends on data coherence. Many large organisations still wrestle with multiple versions of truth; slightly different copies of the same data scattered across divisions and regions," he noted.

"In 2026, the focus will shift from collecting and cleaning data to governing datasets: curated, versioned, and contextual sources of truth that can be trusted across the business. Those who can eliminate fragmentation and maintain coherent, traceable datasets will see significant gains in model reliability and decision accuracy. In the era of enterprise AI, the true competitive edge won’t come from having more data—but from having consistent, managed datasets that everyone can trust." 

Source: Syniti. Cody David.
Source: Syniti. David.

Cody David, Head of AI & Innovation, Syniti, part of Capgemini, called 2026 'the year AI has to tell the truth'. "AI excitement is still there, but it is sobered by what pilots revealed: AI only performs as well as the data and controls surrounding it," he said. 

"You can see it in meetings. The room starts with confidence about a new assistant or agent. Then someone asks two simple questions. 'How will it access the data?', and 'Who trusts that data?' That is usually the moment the conversation shifts from ideas to fundamentals. 

"The organisations that move from pilots to production will do something more disciplined in 2026. They will treat AI readiness as a gate, not a hope. That gate includes four basics: secure access paths, clear data ownership, minimum quality thresholds, and documented business meaning."

David also shared that a data-first approach will be how companies get there, with readiness assessments, remediation, cataloguing, governance, and ongoing monitoring. "None of this is flashy. All of it is essential," he commented.

Source: Syniti.
Hartwell.
"Data quality has returned to the forefront. Organisations adopting analytics, AI, and automation now recognise that without trustworthy data, all downstream initiatives fail," added Mary Hartwell, Global Practice Lead for Data Governance at Syniti, part of Capgemini. 

Hartwell said enterprises will invest in continuous quality monitoring, machine language (ML)-driven anomaly detection, data product service level agreements (SLAs), leadership-visible scorecards, and root-cause analysis tied to ownership in 2026. 

"Data quality is no longer just cleanup; it’s a value driver. Leaders are starting to treat clean, reliable data as an asset that powers both compliance and competitive advantage," she said. 

Inadvertent data captures can also happen, Edward Funnekotter, Chief AI Officer at Solace pointed out. "AI’s power lies in its ability to process large amounts of natural language much faster and cheaper than the human mind. As businesses race to give AI agents access to internal documents, SharePoint, and live web searches, we are creating a high-risk environment for data intermingling," Funnekotter said. 

Source: Solace. Edward Funnekotter.
Source: Solace.
Funnekotter.

"The value of agentic AI lies in its ability to make decisions without constant human oversight, but that autonomy creates a conflict: how do you trust a system with the keys to the shop?"

"The threat landscape is evolving at pace. We are already seeing real concerns around 'prompt injection', where nefarious actors embed malicious text blobs into web pages. When an agent fetches that page to summarise a topic, the hidden text acts like a hypnotist’s keyword, overriding the AI’s instructions and forcing it to exfiltrate internal data. Or, imagine an agent accidentally copying confidential salary information or commercially sensitive data into a public setting because it 'made sense' to the model at that moment," Funnekotter elaborated.

"In 2026, we will see a heavy focus on data management to solve this. The goal is to prevent the AI model from ingesting raw data unnecessarily. Instead of feeding an LLM a thousand rows of a database – which is slow, expensive, and prone to hallucination – we need systems where the AI simply directs a software tool to filter the data and return only the relevant answer."

Source: MongoDB. Thorsten Walther.
Source: MongoDB. Walther.
Thorsten Walther, MD, CXO Advisory Asia at MongoDB, pointed out that multiple large language models (LLMs) can do the job, whereas data will remain a constant. "The AI story in 2026 won’t be about which LLM is 'best'. It will be the realisation that models are becoming interchangeable, and the real competitive edge lies in the data that fuels them," he said.

"Enterprises will prioritise LLM portability so they can choose the right model for the right job, and they’ll invest even more in vector search, embeddings, and re-ranking to extract deeper value from their own data. Conversations across the market are already shifting in this direction, signalling a maturing view of what actually moves the needle in enterprise AI."

Against this backdrop, Gartner predicts that by 2029, those who failed to adequately invest in digital provenance capabilities will be open to sanction risks potentially running into the billions of dollars. 

"As organisations rely more on third-party software, open-source code, and AI-generated content, verifying digital provenance has become essential. Digital provenance refers to the ability to verify the origin, ownership, and integrity of software, data, media, and processes. New tools such as software bills of materials (SBoM), attestation databases, and digital watermarking offer organisations the means to validate and track digital assets across the supply chain," the consultancy said in a list of 2026 predictions.  

Hashtag: #2026Predictions 

No comments:

Post a Comment