Building trusted data foundations to unlock enterprise AI at scale

Mark Venables

Share this article

AI cannot deliver value if built on chaotic, ungoverned data. Enterprises must take a strategic approach to data classification, hygiene, and security to ensure their AI initiatives are accurate, resilient, and accountable.

Executives might not want to hear it, but many organisations are sitting on chaotic data sprawl. It is duplicated, unclassified, inconsistent, and often unsecured. While artificial intelligence has the potential to deliver profound insights and efficiencies, most enterprises are nowhere near ready to deploy it effectively. The data is simply not in a fit state.

Mark Molyneux, EMEA CTO at Cohesity, calls this the reality of the brownfield site. “Pretty much every company I talk to is completely siloed in how they manage data. They’ve got it all over the place, and that’s for various reasons. It’s the way the company’s grown, through acquisition, or because they’ve been using heritage systems and then bought new ones.”

The temptation is to chase value by liberating every silo. However, Molyneux argues that this is the wrong focus. “There’s this obsession that all data has value, but it simply doesn’t. Unstructured data is growing by 60 to 80 per cent a year. Yes, some of that might be critical, financial records, or seismic reports, but there’s also holiday photos, old operating system copies, and test databases that haven’t been touched in years.”

The challenge is not just one of volume but of relevance. What’s needed is not access to everything but intelligent curation: understanding the data, where it lives, how it relates to compliance, and whether it can be trusted. That’s the first step from chaos to confidence and, ultimately, to useful AI.

Redefining the backup as a strategic asset

Cohesity’s role in this process is not what one might initially expect. Born from data protection, the company has expanded well beyond simple backup and recovery. “We are a data lake,” Molyneux says. “When you back up the data, that’s our big lake. And because we use a single pane of glass across that, you can back up in one data centre or another; to us, it’s just one big pool. Because we can access that data, we can then use insights into that data.”

This shift from backup as insurance to backup as insight is central. The platform classifies and indexes the data it receives, identifying owners, structure, sensitivity, and retention requirements so that even before AI is applied, a baseline of visibility and control is already in place. “Say you’re in financial services and need to keep mortgage records for 15 years,” Molyneux says. “We can apply that straight away. Or if it’s healthcare data, we can classify it appropriately. When deploying AI, you don’t have to start from scratch; the data is already indexed, classified, and ready to be queried.”

This foundation enables the use of natural language interfaces for querying data, not as a backup administrator but as a domain expert. “You start having a conversation with the data,” Molyneux says. “That’s when you unlock real business value.”

Clean to query or query to clean?

There is a common belief that AI only works on clean data. In Molyneux’s view, this is only half the story. “People think that you need to clean the data to have AI, but you can actually use AI to identify the data to clean,” he explains.

Organisations can quickly classify content, identify outliers, and isolate low-risk information by applying topic modelling and other machine learning tools to large, messy data sets. “You end up with three buckets: this is the stuff I absolutely need to address, this is the stuff I’ll address if I have time, and this is the stuff I don’t need to touch at all,” he adds.

Crucially, this process can be largely automated. “Structural stuff is the easiest to fix,” Molyneux continues. “If it’s Paris with a capital P versus a lowercase one, or a typo or extra punctuation, machine learning can handle that. You don’t need human intervention for basic normalisation. The challenge is doing it at scale and knowing where to apply trust and responsibility.”

Trust becomes the deciding factor, especially when granting access to sensitive data or building models with business-critical outcomes. “You can’t just toss AI in and expect it to work,” Molyneux adds. “If the underlying data is untrusted, you’ll get hallucinations or misleading results. And if a human intervenes incorrectly, maybe marking confidential data as public – you’re exposing the business to risk.”

Security and resilience by design

“Tensions often arise between innovation teams eager to deploy AI and IT departments worried about compliance, privacy, and security,” Molyneux says. “There’s a disconnect. You get AI developers who feel held back by security and IT teams who feel AI is running wild.”

Cohesity’s approach is to bridge this gap with enforceable governance, built around concepts such as role-based access, multi-factor authentication, and zero-trust security. “If you allow someone to run natural language queries against your data lake, and they can ask things like, ‘Show me acquisition targets for 2026’, you better be sure they’re authorised to see that information,” Molyneux says.

The value of this governance is most apparent in scenarios involving ransomware or data breaches. If a threat actor gets into the platform, you already know what the data is, where it lives, and who has accessed it. It means that organisations can report that to regulators immediately. If the attacker asks for a ransom to unlock data that turns out to be meaningless, you have the confidence to not pay it.

This is where data hygiene plays directly into cyber resilience. Most organisations still approach ransomware recovery like a disaster recovery problem, restoring everything from backup. “But if you know exactly which data matters and in what order, you can recover the minimum viable company,” Molyneux explains. “You don’t need to restore petabytes of data. You can be strategic and surgical.”

Third-party accountability in an age of regulation

As AI adoption accelerates, so does the complexity of managing third-party risk, particularly where external providers handle internal data. Legislation is beginning to catch up.

“There’s something called DORA – the Digital Operational Resilience Act – that’s coming into force across Europe,” says Molyneux. “It builds on UK operational resilience rules and is now woven into the AI Act. That means that if your third party does something risky with your data, it’s still your responsibility. Regulators will hold you accountable.”

The implications are clear. Third parties must be treated not as external suppliers but as integrated extensions of the enterprise. “You need to bring them into your trust model, ensure they have the right access controls, understand data lineage and hygiene, and operate under the same rules as your internal teams,” Molyneux continues. “This shift is more than compliance. It signals the professionalisation of AI as a discipline, a move away from the ‘Wild West’ phase of experimentation toward a world of transparency, governance, and responsibility.

Future-proofing the foundation for AI

For organisations looking to establish long-term readiness for AI, Molyneux highlights three enduring principles: security and compliance, classification, and governance. “Security and compliance is key because AI can see everything,” he says. “If someone queries the data lake and pulls sensitive information, you must know it was permitted. Role-based access, multi-factor authentication, and usage monitoring must be standard.”

Classification, or what Molyneux prefers to call data insights, is essential not only for compliance but also for effective AI use. Enforcing policies based on the classification of data, not just labelling it but using that label to drive decisions, is a powerful tool.

Governance is the third leg. “How is data accessed? Where is it stored? Who holds it? Are guardrails in place in the cloud? The cloud has become a silo, and many organisations have had to backtrack because confidential information was thrown into environments without proper control.”

Ultimately, the guardrails that matter are not built into the AI itself, but the data. “You can’t think of the AI as the safety net,” Molyneux explains. “The data needs to be governed, secured, and trusted. The IP isn’t just in the AI model; it’s in the training data, and if that’s not properly controlled, everything else falls apart.”

As Molyneux puts it, the path from data chaos to AI confidence is not a single leap but a structured journey. It requires enterprises to rethink backup, rethink value, and take responsibility for every byte of information. Only then can AI deliver on its promises, not as a risky experiment, but as a resilient, responsible engine for growth.