- Learning Analytics Made Easy
- Posts
- It's all about the data
It's all about the data
The what, where and how of L&D data in the June edition of the "Learning Analytics Made Easy" Newsletter.
Hi! And welcome to the June edition of the Learning Analytics Made Easy Newsletter!
One of the most frequent feedback I get from L&D practitioners when talking about analytics is the struggle to access data. In many cases this is linked to the underlying frustration that many people see the lack of data as a roadblock for analytics. Because without data, you can not do any meaningful analytics as we saw last month.
Data sits at the heart of analytics, but possibly more important today, data also sits at the heart of Artificial Intelligence!
In short, data is everything!
That is why this newsletter we’re going to take a closer look at data for L&D. We’ll discuss when and where in the L&D process you should start thinking of data. How your intentions and objectives determine what data you require. I will explain that the more impactful your objectives are, the more data you will be needing. And I’ll share a secret on what is the best way to collect and store the data!
Let’s get started!
Peter
p.s. As a bonus, I’m sharing why data is so important when you are using, or want to use, Artificial Intelligence and what you can do to get most out of your data
What is the best time to start to consider data? | How the level of impact drives data needs |
By far the best way to collect and store your data | BONUS: What you can do tomorrow to prepare your data for AI |
When to consider data
Many of us start thinking about data when it’s time to evaluate the learning initiative. You have run a huge program in AI upskilling and leaders are curious how the program improved the AI skills in the company and possibly even how the program increased the use and value of AI in the company
Only then we start to think what data we should be pulling from what system to be able to answer these questions. We happily jump into our LMS or LXP reporting engine to pull the relevant data and start analyzing.
And then the problems start…
We actually do not really know what programs contributed to AI upskilling
We realize different programs measure skills in different ways
We do not have a real idea how to link learning to people using AI. Let alone create value through the use of AI
So none of the above questions can easily be answered.
And now it is too late to do something about it!
Only thinking about data when you have already delivered your programs is often to late.
The best moment to consider data is RIGHT NOW.
Yes, that is correct, data should be part of your L&D process from the very early beginning. No matter what process you use (did somebody say ADDIE?), data needs to be at the forefront of things. Because only when you have clearly defined your objectives, translated these objectives into KPI’s and analyzed what datapoints you require to measure and track these KPI’s, and only when your design and develop (and deploy) your programs with these datapoints in mind, only then you can be sure that you are indeed capturing the data you need in order to do the analytics you want!
So
Start thinking about data at the very beginning!
How the level of impact drives data needs…
Ok, so now I have your attention and you realize you should start considering data at the beginning of your learning initiative….an urgent question pops up: What data do I need?
Well. That really depends on what you are trying to achieve.
The 3 foundations: Programs, Process and Audience
The 3 data elements, or sources, you always need no matter what learning analytics you want to do, are your program(s), your L&D process and your audience.
Your program or programs tell you what employees do, should do or did in terms of learning. What was the program about, how long did it take, in which format was it delivered, what skill was addressed? At what difficulty level? Al of these data or information points are captured when you created the program. If you did it well, you will be able to do a ton of useful analytics. If you did it poorly, you will not. That is why complete and accurate tagging of learning programs are so important!
Consider the AI upskilling example; if you do not tag relevant programs as AI upskilling programs, you will not be able to identify what programs contribute to AI upskilling!
Then you have your learning processes, activities like nominating employees, registrations, assignments, and last but not least, completions. These processes are hopefully all configured in your learning platform. The whole reason you have these platforms is to enable you to execute these processes in a digital environment. Basically a replacement for paper based signup, sign in and sign off sheets! If you have clean and consistent processes, that nicely capture each step in your system, you are good to go. If you have messy or inconsistent processes, or you still do much on paper (local excel also counts as paper in this context) then you will struggle.
Consider the AI upskilling example. If you do not consistently record program progress and completions you actually cannot clearly define who really participated. And this will hinder your ability to analyze how much more value employees who participated in the AI upskilling program drive from AI!
The final of the three is your target audience. This always reminds me of a situation from the past where employees were recruited for classroom trainings in the hallway of a large multinational company. The reason this was done was to ‘fill the classrooms’ and show a good number on the learning dashboard that I created for them. Naturally it also resulted in classrooms filled with people for whom the training was not intended. Your target audience, and more specific, a precise definition of your target audience is crucial to be able to determine any form of impact through learning!
Consider the example of AI upskilling. If you do not carefully define your target audience, you could end up in a situation where you think you have trained sufficient people, but they were actually the wrong people!
Beyond the basics
Your learning programs, your learning process and your target audience are the 3 foundational data elements, they are not the only ones.
Defining what data you need depends on your ambitions. The more impact you want to achieve through learning, the more data you will need.
You only need compliance completions? Then the foundational 3 are sufficient
You want to demonstrate the long term business benefits of a multi million leadership program? You will need much, much more!
We’ve deconstructed the question on what data you need into a total of 12 L&D data components:
The 3 foundations plus
Objectives
Evaluation
Transfer
Application
Personal Performance
Business Performance
Costs
ROI and
Long term effects
The full list is shared below. And it would take a full newsletter to go into each of them.
But the key message of my story is that the more impact you want to achieve through learning, the more data you need to demonstrate that you are actually making that impact!
Short intermezzo….
The power of combining 1 & 2
With the realization that (1) you need to bring the topic of data to the beginning of your learning initiatives, and that (2) trying to achieve a bigger impact means more data (much more data), you already have a powerful tool in hand to start tackling the challenge of having insufficient access to data.
By bringing up the topic of data very early in the conversation with your business stakeholder, and bringing the ‘high impact = lots of data’ model to that conversation you can start managing the expectations of your customers. If they want business performance improvements, great! We want the same. However, that level of ambition comes at a price which is the need for more data, including significant data from the business.
If they do not have this data, or they are not willing to share it, you in essence have 2 options remaining:
1. You lessen your level of ambition so you do not need as much data
2. You keep the high level of ambition and work together to ensure access to high quality business data!
What to do with all that data
Now you know where to get all the data, and you realize that it’s a very good idea to start talking about data very early on in the L&D process. The next question is then what to do with all that data, and how you can make sure that it can be used for accurate and reliable analytics!
A great question! Many companies struggle with this.
Break down the Silos
Many companies still have their data stored in silos. What that means is each department, product line, function and sometime even locations stores their data in ‘their own’ database, isolated from the rest.
In L&D you actually also see situations where many different systems each have their own, isolated are stored data source, or data pond.
I’ve seen L&D teams operating 3 or more different evaluation or survey tools
Many L&D teams have both an LMS and an LXP. Each with their own reporting engine
If you’re in the food or health industry, you might have a completely separate platform to handle GxP related training
I see L&D teams having a separate platform for leadership development programs (often the platform of the vendor)
And then I have not even mentioned skills and external content libraries like Coursera and LinkedIn!
Unfortunately we are not the only ones. Data silos are estimated to cost companies up to 30% of their revenue not only due to additional operating and maintenance costs, but mostly due to lost opportunities: combining data sources and doing analytics across all of them can surface many more opportunities to improve things and uncover new business opportunities.
These days there is an additional very compelling argument to break down the data silos: Artificial Intelligence.
Artificial Intelligence runs on data. The more data, the better the AI will be able to add value. The same with more accurate and higher quality the data. Bringing all your L&D data into your enterprise data lake will massively improve your company’s AI that sits on top of all that data.
That’s why my advise always is…
Join your enterprise datalake!
Chances are that your organization already is using technology to store lots of data: Microsoft Azure/Fabric, SAP, Oracle. If an enterprise data lake exists I always advise to bring all your L&D data to that data lake.
It provides a lot of benefits:
It saves money in implementing your own data lake
You can leverage all kinds of standards and processes already used to manage and control data, as well as valuable expertise already available in your organization
You can more easily connect L&D data with other sources like HR, Finance
You can link L&D data up with business performance data to perform the ultimate goal of Learning Analytics: Business impact measurement!
BONUS: What you can start doing tomorrow to prepare your data for AI
To make AI truly useful, the quality of the data you ‘feed’ to your AI is everything. Even the most advanced AI will give flawed or meaningless insights if the input data is messy, inconsistent, or vague.
So here are six practical actions that, according to Chat GPT, an average L&D professional — who builds, delivers, or manages learning programs — can start doing tomorrow to dramatically improve the usefulness of their data for analytics and AI:
(and yes, this part of the newsletter is mostly written by Chat GPT…. that is after all an AI LLM that reads all the data and needs to learn from it!)
1. Use Clear and Consistent Program Naming
Why it matters: AI can’t connect “Safety Training”, “safety_train”, and “sft_trn” automatically.
What to do:
Create meaningful and concise program titles
Standardize naming conventions across your learning platforms
Avoid abbreviations and strong jargon unless they’re documented and consistent.
2. Define Your Fields and Stick to Them
Why it matters: AI needs to know what each field means (e.g., does “Score” mean test score or engagement score?).
What to do:
Create a data dictionary for key fields (e.g., “Completion Status = Complete, In Progress, Not Started”).
Use dropdowns or validation where possible to avoid free text chaos in your systems
Use data fields only for their intended purposes
Do not combine multiple data points into a single field
3. Capture Dates — All of Them
Why it matters: Time series and trend analysis require actual dates. “Q1” or “Spring 2025” are not enough.
What to do:
Log both start and end dates of courses, enrollments, and completions.
For live sessions, capture scheduled and actual delivery dates.
Ideally capture dates in the same format across your organization and platforms
4. Tag with Purpose
Why it matters: AI can find patterns only if there’s context — think role, topic, function, level, location.
What to do:
Add relevant metadata tags (e.g., “Function: Sales”, “Level: Manager”) to your programs
Don’t overdo it and tag a program with 20 skills — but do include tags that match common search or reporting filters.
5. Capture the Right Audience Info
Why it matters: “Who was trained?” is a basic question — and you can’t answer it without employee context.
What to do:
Link training data to HR attributes like role, business unit, location, or seniority.
Make sure the people data attributes in your learning system match with your HR system, ideally through an automated interface!
6. Audit and Clean Regularly
Why it matters: A few errors can skew an entire dashboard.
Why it matters: A few errors can skew an entire dashboard.
What to do:
Schedule monthly data hygiene check to address things like duplicates, missing values, obvious errors.
Use your L&D dashboard to flag issues.
Make people explicit responsible for their own data quality!
Drop me a note or reply to this newsletter if you’re interested in how you can prepare your data for the age of AI!
Thank you for joining me on this journey.
Remember, taking the first step is the hardest part, but I’ll be here to guide you along the way.
Let’s make data work for you.
Best,
Peter Meerman