Video, AI & Data: Why data architecture is the new innovation in Media
TL;DR: The media industry is currently benefiting from an incredible surge in high-value data. While AI can extract thousands of data points from a single video, most companies lack the architecture to make that data actionable. Without a robust data strategy and integrated supply chain, AI results in "metadata noise" rather than operational ROI. Knox Media Hub explores how to turn this data explosion into a strategic asset.
At the end of February, Knox Media Hub’s CTO, Jonatan Roig, joined the DPP Innovation Showcase with a panel titled “AI Can Analyze My Video, Now What?“
The session captured a technological shift in the media industry: we are finally beginning to glimpse a more long-term landscape of AI adoption, one where it automates workflows, drives smarter insights, accelerates supply chains, and revolutionizes content discovery.
However, a paradox has emerged. While AI is being adopted to drive efficiency, many media companies find themselves drowning in data or paralyzed by a lack of data coherence. What is often framed as an AI adoption challenge is, in reality, a fundamental data architecture challenge. When content data remains unstructured, the result is low-quality, chaotic AI deployments that fail to deliver on their promise.
Part 1 - The Unstructured Goldrush
The media industry is currently experiencing what some call an "Unstructured gold rush" (IABM’s Mediatech Radar, September 2025), as Artificial Intelligence moves rapidly from experimental pilots into the daily workflows of broadcast and media operations.
We see near-instant transcription and automated highlight generation in newsrooms, allowing editors to surface archival footage in seconds. In sports, AI enables the automated logging of every pass, goal, or shift in momentum, fueling hyper-personalized fan experiences. For Broadcasters & Content owners, AI deployments allow for faster turnaround content, cost-effective localization, smarter ad placement or smarter asset discovery giving new life to deep hidden assets. And while we focus on the content supply chain, the end of that chain offers an infinite stream of viewer behavior data too.
This is a positive evolution; we finally have the tools to understand our libraries, view at a level of detail that was previously impossible.
All of this is possible thanks to the surge in data from AI, that is shifting from asset-level metadata (describing the file as a whole) to temporal, frame-by-frame metadata. While a human might tag a video with five keywords, an AI processes the visual and audio streams simultaneously to record every object, face, sentiment, and spoken word at specific millisecond intervals. But as the data becomes richer, it also becomes harder to manage.
At Knox Media Hub, we see firsthand how AI transforms the scale of Media Asset Management (MAM). Traditionally, a single video file might carry around 270 original metadata values (comprising basic timecoded, descriptive, and technical information). However, once that same file undergoes AI enrichment, the number of metadata values can easily skyrocket to over 3,100. If you scale this across a standard media library of 50,000 video assets, you are no longer managing 10 million metadata entries; you are suddenly wrangling 130 million metadata entries.
As today's media companies rapidly adopt a range of AI tools —from face recognition and mood detection to automated ad-break placement— we’ve detected an urgent need for a "common place" for this data. Regardless of the vendor or tool, the result is a massive incoming "lake" of new data. Simply put, there is data everywhere: from audiences and users to the content itself.
Part 2 - From Metadata ‘Noise’ to Structural Integrity
This massive influx of information presents a somehow intimidating challenge: many media companies have a fundamental data challenge coming.
Media Companies have the tools to extract the information, but often lack the architecture to make sense of it. It is great to have highly accurate object detection, facial recognition, audience preferences and automated transcriptions, but it is incredibly complex to organize this data and surface it to users in a way that actually drives business value.
According to the latest DPP Media CTO Survey, 75% of CTOs identified AI and automation as their top priority, yet 53% simultaneously noted that data integration and architecture programs are just as critical .
“Data is rapidly moving up the priority list across the business, from metadata to business process, largely driven by a desire to unlock AI as an enabler of significant optimisation.”
We are seeing operational friction where fragmented platforms and incompatible metadata schemas prevent companies from using the data they’ve paid to extract. Rushing into AI extraction without a solid, future-proof data strategy is a missed opportunity. You must ask: Where is the data going? How does it fit your existing organization? How will you visualize and act upon it?
A successful strategy requires a data-first architecture where systems talk to each other and exchange data seamlessly. To bridge this gap, the cloud is the only way to build a future-proof orchestration layer.
Part 3 - The Foundation of Video Data: Metadata
When we talk about "data" in media libraries, we are really talking about metadata and timecoded metadata. As highlighted in the NewscastStudio series on AI: metadata is crucial to enable automation and AI investment worth in the media industry.
To turn "messy" data into a core deliverable, we believe media companies must master metadata handling. At Knox Media Hub, we break this down into three essential stages: Sourcing, Transformation, and Delivery.
Metadata Sourcing: As mentioned, a single video file might carry around 270 original metadata values—comprising basic timecoded, descriptive, and technical information. AI enrichment can turn 270 values into 3,100.
*Extraction shouldn't be limited to the video file. Massive amounts of value are trapped in "hidden" files like PDFs, scripts, and spreadsheets. Our Document Metadata Scanner automates the extraction of this data so it can be cataloged alongside the video.
Metadata Transformation: Once you have sourced the data points, you need a way to make sense of them and need to adapt to your business schemas.
Metadata Delivery: The final hurdle is ensuring data meets the strict requirements of different endpoints—whether that is a distribution platform, a global client, or an internal orchestration layer.
Our Metadata Delivery Compliance tools automatically validate and reshape metadata to ensure it perfectly matches the necessary templates and formats.
📚 More on the topic: https://www.knoxmediahub.com/blog/7-expert-tips-for-effective-metadata-management-in-media
Part 4 - Moving Toward an Orchestrated Supply Chain
The industry is doing great at extracting information, but we’re still learning how to orchestrate it. You can have the most accurate facial recognition in the world, but if that data isn't integrated into your supply chain, it’s often useless. Success with AI happens when it stops being a "standalone tool" and starts being part of the supply chain.
The most successful AI deployments move away from "standalone point solutions." As our Sales Manager, Aitor Falcó, pointed out in a recent NewscastStudio roundtable:
“ROI typically fails when AI is deployed as isolated point solutions without integration into the MAM and orchestration layer. Even accurate AI outputs deliver limited value if they cannot drive automated, governed actions across the media supply chain.”
When AI is deeply integrated into the media supply chain, it can drive automated, governed actions—such as faster content discovery, rapid localization, and automated compliance—rather than just generating unused metadata .
Our Knox Video Lab serves as the centralized UI for this use case, designed specifically to display and search thousands of multi-layered AI metadata entries seamlessly on a timeline, all while staying deeply connected to our orchestration tools.
Automating the Future
The data challenge is significant, but the rewards are worth it. AI is a tool to assist in understanding your library, but it is not a replacement for a structured asset repository.
By shifting focus from simply collecting data to orchestrating it, media companies can turn the AI metadata explosion from a messy burden into their greatest operational asset. At Knox Media Hub, we’ve built the tools to make this transition seamless.
For more insights, watch Jonatan Roig’s session at the DPP Innovation Showcase:
Want more Media Technology insights? Subscribe to our newsletter 📥