By Asmaa Gad | 11 min read
Here is a stat that nobody puts in their vendor pitch deck: 60 to 70% of AI project budgets get consumed by data preparation. Not model training. Not tool licences. Just cleaning, mapping, and standardising the data that feeds the AI in the first place.
I have talked to dozens of procurement teams who invested in shiny AI tools only to discover the same frustrating reality: garbage in, garbage out. The tool works fine. The data doesn’t. And nobody warned them.
So before you spend another euro on AI software, let’s talk about the unsexy, unglamorous, critically important foundation that makes everything else work: your data quality.
Why Most Procurement Data Is Not AI-Ready
According to a 2024 Gartner study, 54% of organisations cite data quality as their top barrier to AI adoption. Not budget. Not skills. Not executive buy-in. Data.
Procurement data has a unique set of problems that make it especially challenging:
Duplicate Suppliers Everywhere
The same supplier appears as “Acme Corp,” “ACME Corporation,” “Acme Corp Ltd,” and “acme” across different systems. One FMCG company found 340 unique entries for a single logistics provider. Your spend analysis AI will treat each one as a separate supplier, destroying your category insights.
No Standard Categorisation
Business units use their own naming conventions. Marketing calls it “creative services,” finance calls it “professional fees,” and procurement calls it “consulting.” AI tools need consistent taxonomies to generate useful analysis. Without one, your spend cube is fiction.
Missing and Incomplete Fields
Payment terms blank. Delivery dates missing. Contract numbers not linked. A typical ERP extraction has 15 to 30% empty fields in critical columns. AI models either skip these records entirely or make wrong assumptions to fill the gaps.
Siloed Systems, Fragmented Truth
Spend lives in the ERP. Contract details live in a shared drive. Supplier performance data lives in a spreadsheet on someone’s laptop. AI needs a connected dataset to generate insights. Most procurement teams are working with puzzle pieces scattered across five systems.
The 30-Day Data Quality Sprint: A Practical Framework
You do not need a multi-year data governance programme to start using AI effectively. You need “good enough” data for your first use case. Here is a 30-day sprint framework that works.
Week 1: Audit What You Have
Step 1: Export your top 3 spend categories from your ERP. Just the basics: supplier name, invoice date, amount, category, PO number, and cost centre.
Step 2: Run a completeness check. What percentage of rows have all fields filled? Below 80%? You have a gap to fix before any AI tool will be useful.
Step 3: Count unique supplier names vs. actual unique suppliers. If you have 5,000 names but estimate 2,000 real suppliers, you have a deduplication problem. Use ChatGPT or Claude to help: paste a sample of 200 supplier names and ask it to identify likely duplicates.
Week 2: Fix the Critical 20%
Focus on your top 20% of suppliers by spend. These typically account for 80% of your total spend value. Deduplicating and categorising just these suppliers will dramatically improve any AI analysis.
Use AI to accelerate: Feed your supplier list into Claude or ChatGPT with a prompt like: “Standardise these supplier names to a consistent format: [Company Name] [Legal Entity Type]. Group likely duplicates together. Flag uncertain matches for manual review.”
Map to UNSPSC or your internal taxonomy. Even a Level 2 (segment + family) categorisation is enough to start. Perfect is the enemy of useful here.
Week 3: Build Your Clean Dataset
Create a “golden file” for your top categories: one clean Excel workbook with standardised supplier names, consistent categories, and complete key fields. This is your AI-ready dataset.
Document your rules. Write down the naming conventions and categorisation logic you used. This becomes your data dictionary and prevents the same mess from returning next quarter.
Validate with stakeholders. Share the clean dataset with two or three category managers. If they spot obvious errors, fix them now. A 10-minute review saves hours of rework later.
Week 4: Test With AI and Iterate
Run your first AI analysis on the clean dataset. Start with something simple: spend concentration analysis, duplicate payment detection, or savings opportunity identification.
Compare outputs. If you ran a similar analysis manually before, compare the AI output to your previous results. Where they differ, investigate whether it is a data issue or a genuine new insight.
Document what broke. Every AI analysis will reveal new data quality issues you missed. Add them to your cleanup list for the next cycle. This is normal and expected.
5 Quick Wins You Can Do This Week
If 30 days feels like a lot, start with these. Each one takes under two hours and immediately improves your AI readiness.
| Quick Win | Time | Impact |
|---|---|---|
| Remove test and dummy entries from your supplier master | 30 min | High |
| Standardise currency fields (USD vs US Dollar vs $) | 45 min | High |
| Use AI to batch-categorise your uncategorised spend | 90 min | Very High |
| Merge your top 50 supplier duplicates manually | 60 min | High |
| Create a one-page data dictionary for key fields | 45 min | Medium |
Copy-Paste Prompt: AI-Powered Data Quality Audit
Role: You are a procurement data quality analyst with experience in spend analysis and supplier master data management.
Objective: Audit the attached spend data for data quality issues that would prevent accurate AI-powered analysis.
Process:
1. Check completeness: What percentage of rows have all critical fields filled (supplier name, amount, date, category)?
2. Check consistency: Are there duplicate supplier names, inconsistent formats, or mixed currencies?
3. Check accuracy: Flag any obviously incorrect values (negative amounts where not expected, dates in the future, amounts that look like data entry errors).
4. Check categorisation: Are all transactions mapped to a category? Is the taxonomy consistent?
Output: Provide a Data Quality Scorecard with a score out of 100, broken down by the four dimensions above. List the top 10 specific issues to fix first, ranked by impact on AI analysis accuracy.
Stop: Do not attempt to fix the data. Only diagnose and prioritise the issues.
The Real Cost of Ignoring Data Quality
What Happens When You Skip This Step
Your spend analysis shows phantom savings. Duplicate suppliers inflate your category spend, making savings percentages look better than they are. When you present these numbers to the CFO and they don’t materialise, you lose credibility for the next AI business case.
Your supplier risk model has blind spots. If 20% of your supplier records are incomplete, your AI risk model is literally blind to 20% of your supply base. The next disruption could come from a supplier you didn’t even know existed in your data.
Your team loses trust in AI. When AI outputs don’t match reality, people stop using the tool. Not because the AI is bad, but because the data is. And once a team labels a tool as “unreliable,” it is extremely hard to bring them back.
The Boring Truth That Nobody Wants to Hear
The biggest ROI from AI in procurement does not come from picking the right tool. It comes from fixing your data first. Spend 30 days on data quality and every AI tool you use afterwards will perform dramatically better. Skip it, and every tool will disappoint you. Your choice.
Want the Complete Data Quality Toolkit?
Our Spend Analysis Unleashed 2.1 book includes a full data quality assessment framework, cleaning prompts, and Excel templates. And the free AI Skills Toolkit has a data readiness checklist you can use today.
Asmaa Gad is the founder of SupplyChain AI Pro, helping procurement and supply chain professionals master AI tools for real work.
