The Data-Driven Learning Ecosystem: From Learning Data Pond to Enterprise Data Lake | Reinventing L&D in the Age of AI | Episode 5

The 5th episode in the series on "Reinventing L&D in the Age of AI"

Throughout this series, we have explored how AI is reshaping corporate L&D. We reimagined the 70-20-10 model. We challenged how we think about skills. We proposed a new role for L&D as guardian of decision intelligence.

Every one of those ideas depends on the same thing: data.

AI-driven performance support only works if AI has access to the right data. Evidence-based skills intelligence only works if performance data and learning data are connected. Decision intelligence only works if the data that AI learns from is trustworthy, structured, and current.

Data is not a supporting topic. It is the foundation. And yet, in most L&D organizations, data remains an afterthought — something we try to extract from tools after the fact, not something we design for from the start.

This episode is about changing that. It introduces the concept of a data-driven — or more precisely, data-centric — learning ecosystem. Not as an IT project, but as a fundamental shift in how L&D thinks about technology, tools, and architecture.

— Peter

From Tools to Data Why tool-centric thinking no longer works	The Data Problem Beneath the Surface What your learning tools are hiding from you
From Pond to Lake Connect learning data to business and outcomes	Getting Started Four steps towards a data-centric ecosystem

From Tool-Centric to Data-Centric Thinking

Today, most learning ecosystems are designed around tools.

We typically have a Learning Management System. Often a Learning Experience Platform. Tools for evaluations and surveys. Systems for classroom administration. Platforms for virtual learning. A wide range of tools for content design and development. In more mature environments, many of these are technically integrated — they talk to each other, data & content flows between systems, and processes are reasonably well aligned.

That is good. But it is no longer enough.

Even today, one of the biggest challenges L&D teams face is getting the right data out of their tools. Not just for learning analytics, but increasingly to fuel learning AI. And there is a simple reason for this: most learning tools were not designed with data as a first principle. They were designed from an administrative perspective — the LMS — or sometimes from a user experience perspective — the LXP — but rarely from a data architecture perspective.

Over the years, I have seen many major learning platforms from the inside. And while some have a really solid data structure, many remain poorly structured when it comes to data. It’s not seldom I see inconsistencies in how data is stored, difficulties to extract data (and I mean the raw data you need for analytics and AI, not reports!), and restrictive in how data can be reused and shared. Sometimes this is accidental — a legacy of how the tool was originally built. Sometimes it is strategic — vendors prefer customers to use their analytics modules rather than external tools. Either way, the effect is the same: your data is locked in silos.

In addition, many L&D organizations and teams have not fully considered data when configuring the systems. With system configuration I mean not just the process of adjusting the generic LMS or LXP to your company specific requirements by agreeing how to use specific data fields available, but also of the time adding data fields that are specific to you. We refer to this as custom data. For example, you might want to be able to identify which business owns what learning assets in your LXP. So you create a custom field “business owner” that you populate with the right data. We see most organizations have these custom fields, however, we also see insufficiencies and inconsistencies.

Why This Matters Now More Than Ever

Here is the core issue. Learning no longer happens in one place. And it certainly does not happen in one tool. Trying to do analytics within individual tools gives you only partial insight. You can analyze fragments of reality, but never the whole picture.

The same applies to AI. Analytics and AI become more powerful when they have access to more data, when that data comes from multiple sources, and when the data is consistent, structured, and trustworthy. An AI model trained on data from a single learning platform will never be as valuable as one that learns from learning systems, HR data, skills data, performance data, and business systems combined.

This is where the mindset shift begins. A data-driven learning ecosystem does not replace the importance of user experience — that remains critical. But it adds an equally important lens: data architecture. When selecting or designing learning technology, new questions must become central: How well is data structured in this tool? How easy is it to extract and reuse that data? Can the data be aligned with our terminology and definitions? Does the tool support integration into a central data layer?

These questions will matter far more in the future than flashy features or polished interfaces alone.

One Ecosystem, Many Tools — One Data Backbone

In a data-driven ecosystem, you stop obsessing over which tools people use. You move away from the idea that there must be one corporate LMS, that learning must be centralized in one platform, or that content creation must be owned by one team.

Instead, what really matters is this: whatever learning tools are used, the data they generate must be collected centrally.

That central data layer becomes the foundation for learning analytics, the training ground for learning AI, and the source of truth for skills and performance intelligence. As more data flows into this backbone, AI becomes richer, more accurate, and more valuable. Over time, AI itself starts to take a central role in the ecosystem — not replacing tools, but connecting and enhancing them.

The principle is straightforward:

❝

data first, tools second

The Data Problem Beneath the Surface

If the principle is straightforward, the practice is not. Let me be honest about the reality most L&D teams face.

When you try to extract data from your learning tools today, you will typically encounter several problems. Data definitions vary between systems — what one tool calls "completion" another calls "passed," and neither may mean what you think it means. Data quality is inconsistent — records are incomplete, timestamps are unreliable, user identifiers do not match across platforms. Data is hard to access — some tools offer APIs, others only export spreadsheets, and some make extraction deliberately difficult.

These are not edge cases. They are the norm. And they create a hidden cost that most L&D teams underestimate: the enormous amount of time spent cleaning, reconciling, and interpreting data before any analysis can even begin.

This is why I distinguish between having data and having data that is analytics-ready and AI-ready. Most L&D teams have plenty of data. Very few have data they can actually trust and use at scale.

The path from one to the other requires deliberate design — and that is where the concept of the Learning Data Pond comes in.

From Learning Data Pond to Enterprise Data Lake

Many organizations today are building or already operate an enterprise data lake — a centralized repository that brings together data from across the organization: finance, operations, sales, HR, customer data, and more.

My strong recommendation is simple:

❝

learning data must join the enterprise data lake

Not as a side project. Not as an afterthought. But as a deliberate, structural part of the enterprise data strategy. Only then can learning data become part of the bigger picture — literally and figuratively.

But learning data cannot go straight from individual tools into the enterprise lake. It needs an intermediate step. That step is what I call the Learning Data Pond.

What Is the Learning Data Pond?

In practice, people learn using many different tools — LMS platforms, LXPs, virtual classrooms, content tools, evaluation systems, coaching platforms, performance support solutions. All of these generate data. But for that data to be useful — for analytics and for AI — it must first be cleaned, aligned, and structured.

The Learning Data Pond is the intermediate layer where this happens. It sits between your individual learning tools and the enterprise data lake. It collects data from all learning tools into one place, merges data from different sources, resolves inconsistencies, cleans raw data into something analytics-ready and AI-ready, and structures everything into a coherent model.

At this stage, you already unlock significant value. You can run meaningful learning analytics and even basic AI use cases on top of the data pond alone. You can see patterns across tools that were previously invisible. You can start answering questions that no single system could answer on its own.

But it is still only part of the story.

From Pond to Lake: Where the Real Power Lies

The real breakthrough happens when cleaned, structured learning data flows onward from the pond into the enterprise data lake. At that moment, learning data stops being isolated.

It becomes linkable to business performance data, operational metrics, financial outcomes, customer data, and workforce movement and productivity. This is where learning meets performance. Where skills meet outcomes. Where evidence replaces assumptions.

Analytics and AI deployed at the enterprise level can then correlate learning with business results, identify which skills actually drive performance, support evidence-based decision intelligence, and continuously learn as the organization evolves.

The Data Driven learning Ecosystem connected to the Enterprise Data Lake

This is not just better reporting. This is learning becoming an integral part of how the organization understands and runs itself.

Why This Architecture Matters

The pond-to-lake approach gives L&D something it has rarely had: a credible data foundation.

It allows L&D to maintain ownership of learning-specific logic and quality — you know your data better than IT does. It ensures data is trustworthy before it enters the enterprise ecosystem. It positions L&D as an equal partner with IT, data, and business teams — contributing clean, structured data rather than asking others to make sense of your mess. And it scales analytics and AI far beyond what any single learning tool could ever support.

Most importantly, it turns learning data into strategic data. The kind of data that earns L&D a place in conversations about business performance, workforce planning, and organizational strategy.

The Full Picture

Let me step back for a moment and connect this to everything we have discussed across the series.

In Episode 1, we saw how AI — through in-house LLMs and agentic AI — is fundamentally changing how people learn and how L&D operates. In Episode 2, we reimagined the 70-20-10 model: AI-driven performance support, AI-enhanced social learning, and Boutique Learning as L&D's strategic differentiator. In Episode 3, we challenged how we think about skills — arguing for evidence-based, contextual skills intelligence grounded in performance and Jobs to Be Done. In Episode 4, we proposed that L&D should become the guardian of decision intelligence — ensuring AI learns the right things and produces trustworthy guidance.

Every one of those ideas requires data.

The AI-driven 70% needs data to personalize and improve. Skills intelligence needs data to move from opinions to evidence. Decision intelligence needs data to be trustworthy and current. Boutique Learning needs data to demonstrate its value and justify its investment.

❝

The data-driven learning ecosystem is not a fifth idea alongside the other four. It is the infrastructure beneath all of them. Without it, the vision remains a vision. With it, the vision becomes operational.

Getting Started

Building a data-centric learning ecosystem sounds like a large undertaking. And at full scale, it is. But you can begin with steps that are small, practical, and entirely within L&D's control.

1. Audit What You Actually Have

Before thinking about ponds and lakes, understand your current reality. For each major learning tool you use, ask three questions: Can I extract the data I need? Is the data consistent and trustworthy? Can I connect it to data from other systems?

You do not need a formal assessment. A simple inventory — even a spreadsheet — that maps your tools against these three questions will reveal where your biggest gaps are. In my experience, most L&D teams discover that they have far more data than they thought — and far less of it is usable than they assumed.

2. Start Designing for Data

This connects directly to the tip from Episode 2. Whenever you design a new learning experience — especially a Boutique Learning program — ask two questions before anything else: "What should improve if this works?" and "What data will we need to see whether it did?"

Then design the experience so that it generates that data. Choose tools and formats that produce structured, extractable data. Define your metrics before launch, not after. This single habit, applied consistently, will transform the quality of your data over time.

3. Have the Data Lake Conversation

Find out whether your organization has an enterprise data lake — or is building one. If it does, find out who owns it and start a conversation about including learning data. This conversation may feel premature. You may not have clean data yet. That is fine. The goal is not to deliver data tomorrow. The goal is to ensure that when the data is ready, there is a place for it — and that L&D is seen as a partner in the enterprise data strategy, not an afterthought.

If your organization does not yet have a data lake, this is still a valuable conversation to have with IT and data teams. Understanding where the organization is heading with data helps you design your learning ecosystem in a way that will be compatible when the time comes.

4. Pick One Connection Point

You do not need to connect learning data to every business system at once. Pick one. The most natural starting point is often HR data — employee profiles, role information, tenure, mobility. Connecting learning data to even this one source opens up questions you could never answer before: How does learning participation relate to internal mobility? Do people who complete certain programs stay longer? Is there a pattern between learning activity and time-to-productivity for new hires?

These are small questions. But they demonstrate the value of connected data in a way that is immediately credible to business stakeholders.

The Data-Driven Learning Ecosystem