A Data Readiness Audit You Can Run in a Week

There's a pattern I keep encountering at the start of AI initiatives: everyone agrees that data readiness matters, nobody disagrees that it should be checked first, and then the check never actually happens. The reason is almost always that "is our data ready?" feels too big and too vague to start. It sounds like a six-month data governance programme, so it gets deferred until the AI project is already underway — at which point the gaps surface mid-flight and cost months to remediate.

I've made the case that data is the non-negotiable foundation for AI, and that most organisations overestimate how ready theirs is. This post is the practical companion to that argument: a way to actually find out, quickly. You don't need a six-month programme to get a defensible answer. You need about a week, the right people in the room, and a structure that turns a vague worry into a scorecard. What follows is a five-day audit — one focused activity per day — scoped to inform a specific decision, not to boil the ocean. The point is not perfection; it's an honest, evidence-based read on where you stand before you commit budget.

A note before you start: scope the audit to a specific intended use case. "Is all our data ready for anything?" is unanswerable. "Is the data this forecasting initiative would depend on ready?" is answerable in a week. Pick the use case first, and audit the data that use case actually needs.

Day 1 — Inventory: What Data Do We Actually Have?

Start by mapping the data sources the intended use case would draw on. Not every source in the organisation — the ones this initiative depends on. For each, capture the basics: what it is, which system holds it, who owns it, how it's structured, and how far back it goes.

This sounds trivial and almost never is. The most common finding on day one is that nobody has a complete picture — the data is spread across systems that were never meant to talk to each other, owned by people who've moved on, or duplicated in several places with no agreed source of truth. The deliverable is simple: a list of the relevant data sources with an owner named against each. If you can't even produce that list in a day, you've already learned something important about your readiness, and it's better to learn it now than three months into a platform rollout.

Day 2 — Accessibility: Can We Actually Reach It?

Having data and being able to use it are different things. Day two tests whether the data you inventoried can actually be reached and combined. Take a small number of the most important sources and try to extract a real sample and join it the way the use case would require.

This is where integration gaps reveal themselves. Data locked in a legacy system with no clean export, records that can't be reliably matched across systems because there's no common key, fields that exist but are populated inconsistently across business units — these are the issues that quietly inflate the cost of every AI project, and an afternoon of actually trying to pull and join the data surfaces them fast. The deliverable is a short, honest accessibility rating for each key source: reachable and joinable, reachable with effort, or effectively locked. As I noted in the readiness piece, an AI initiative is often the first time anyone has needed to consolidate data that's been siloed for years — far better to discover that on day two of an audit than mid-project.

Day 3 — Quality and Completeness: Is It Good Enough to Learn From?

AI models will find patterns in whatever you give them, including your errors. Day three is an honest look at the quality and depth of the data, because thin or dirty data produces unreliable outputs no matter how good the model is.

Take a sample of the key sources and check the dimensions that matter: completeness (how much is missing, and is it missing at random or systematically?), consistency (are the same things recorded the same way?), accuracy (where you can check against a known truth, does it hold up?), and depth (is there enough history and coverage for the use case to learn from?). You're not aiming for a perfect data-quality report — you're aiming for a defensible judgment on whether the data is rich and clean enough to support what you intend to build. The deliverable is a quality read per source, with the specific problems named rather than a vague "the data's a bit messy."

Day 4 — Governance and Permissions: Are We Allowed to Use It?

Data you can reach and trust is still unusable if you're not permitted to use it the way you intend. Day four covers the governance questions that, left unanswered, stop a project dead when a vendor asks for a data export or a privacy review lands late.

Work through the essentials for each key source: who owns it and who must sign off on its use; what consent or contractual basis exists for processing it, particularly customer and employee data; whether any of it is sensitive or regulated under GDPR or equivalent regimes; and whether using it to train or inform an AI system is actually permitted under the basis on which it was collected. These are not questions to answer mid-project — they need answers before a project starts. The deliverable is a clear permitted / restricted / needs-review flag against each source, with the restrictions made explicit.

Day 5 — Scoring and Roadmap: Where Do We Stand?

The final day turns four days of findings into something a decision-maker can act on. Score each data source across the four dimensions you've assessed — accessibility, quality and completeness, governance, and the inventory clarity you established on day one — using a simple, consistent scale rather than false precision. A three-point scale (ready / workable with effort / not ready) is enough.

The scoring isn't the point in itself; the roadmap it produces is. A source that's high quality but not permitted needs a governance fix, not a data-cleaning project. A source that's permitted and accessible but thin needs more history or supplementation before it can carry a model. Mapping each gap to the kind of work that closes it turns the audit from a verdict into a plan. The deliverable is a one-page scorecard plus a short, prioritised list of what to fix and in what order.

What Good Looks Like

By the end of the week, you should be able to answer honestly:

Do we have a clear, owned inventory of the data this use case actually depends on?
Can we reach and combine those sources, or are integration gaps going to surface mid-project?
Is the data clean, complete, and deep enough to produce reliable outputs?
Are we permitted to use it the way we intend, with consent and regulatory questions answered up front?
Do we have a prioritised roadmap that maps each gap to the specific work that closes it?

If the audit reveals gaps — and for most organisations it will — that's not a reason to abandon the AI initiative. It's the roadmap for where to start, produced in a week rather than discovered the hard way over the following six months. The organisations that do this preparatory work before committing to a platform consistently move faster and waste less once they do commit, because the delays that derail less-prepared efforts have already been designed out. And the whole point of getting genuinely ready is that readiness isn't a gate you pass once — it's the foundation that makes everything you build on top of it more likely to work.

If you'd find it useful to run an audit like this against a specific initiative, or to pressure-test the readiness case before an investment decision, I'm happy to share the scorecard and frameworks I use.