As artificial intelligence becomes embedded across business operations, organisations are discovering that the hardest part is not deploying models but feeding them with reliable, governed data. A new study suggests that the infrastructure underpinning AI, particularly the movement of operational data into data lakes and warehouses, is becoming a critical bottleneck.
Research released this week by Conduktor highlights the growing strain placed on enterprise data architectures as more organisations rely on streaming data to support both operational systems and AI-driven applications. The findings point to a widening gap between the ambition to operate in real time and the practical complexity of delivering trusted data at scale.
Based on a survey of 200 senior IT and data executives at large organisations with annual revenues above $50 million, the report identifies three dominant challenges. The first is infrastructure, specifically the ability to scale and manage data pipelines that perform reliably. The second is security, as sensitive data must be protected while in motion. The third is integration, synchronising multiple data sources as they flow into lakes and warehouses.
Together, these issues reveal a structural problem. AI systems increasingly depend on continuous streams of operational data, yet the architectures used to deliver that data are fragmented, complex and often fragile.
Fragmentation at the heart of modern data estates
The study shows that enterprises are operating across a wide range of data lakes and warehouses. Respondents cited platforms including Amazon S3 and Lake Formation, Databricks Delta Lake and Google Cloud Platform for data lakes, alongside warehouses such as Google BigQuery, Amazon Redshift, Azure Synapse Analytics and IBM Db2 Warehouse.
This diversity is mirrored in the tools used to move data from streaming systems into storage. Nearly three quarters of respondents said they build custom pipelines using frameworks such as Spark or Flink. Around 69 per cent rely on Kafka Connect or similar tools, while half use fully managed services such as Firehose or Snowpipe. Others employ micro-batching approaches or traditional ETL and ELT tools including Fivetran and Airbyte.
While each of these tools and platforms has a valid role, their combined use creates what the report describes as severe pain points. Data teams must manage multiple governance models, schema formats and latency profiles, often in parallel. The result is slower delivery of data to analysts, engineers and AI systems, undermining the very real-time capabilities organisations are trying to achieve.
The three biggest pain points identified were time efficiency, the challenge of collecting and analysing data in a streamlined way; schema changes, which increase data complexity as structures evolve; and parallel architectures, which require additional resources and skills to operate.
AI increases the cost of weak governance
These challenges are amplified by AI. Unlike traditional analytics, AI systems are sensitive to data quality, consistency and timeliness. Missed signals, duplicated work or poorly governed data flows can lead not just to inefficiency but to flawed decisions and unreliable automated outcomes.
Security and governance therefore emerge as central concerns. As operational data streams into lakes and warehouses, organisations must be able to control, validate and track what is being ingested. The report also highlights an internal skills gap, with many organisations struggling to build and maintain complex ingestion pipelines without specialist expertise.
Nicolas Orban, chief executive of Conduktor, argued that governance has become inseparable from streaming data adoption. As organisations expand their use of streaming, particularly to support AI, fragmented architectures create operational chaos. In his view, unifying operational data into a single platform is increasingly necessary to maintain visibility, control and productivity.
The findings suggest that many organisations are still treating streaming as an engineering problem rather than a strategic one. Yet as AI systems move closer to real-time decision making, weaknesses in data ingestion and governance become systemic risks rather than technical inconveniences.
A market growing faster than its foundations
The commercial stakes are rising quickly. According to Dataintelo, the global market for streaming data processing software was valued at approximately $9.5 billion in 2023 and is projected to reach $23.8 billion by 2032, growing at a compound annual rate of 10.8 per cent. Dataintelo attributes this growth to the surge in demand for real-time data processing driven by sources such as social media, IoT devices and enterprise systems.
Yet the Conduktor research suggests that growth in streaming adoption is outpacing organisations’ ability to manage it coherently. Without clearer governance and simpler architectures, enterprises risk building AI systems on data foundations that are difficult to scale, secure and trust.
As AI moves from experimentation to operational dependence, the focus is shifting from how much data organisations can process to how well they can control it. The report’s findings point to an uncomfortable truth. For many enterprises, the success of AI will be determined less by model performance and more by whether streaming data can be delivered reliably, securely and at scale.



