101 Guide to Microsoft Purview Classifiers: Part 1
- Tawsif Mulani
- Apr 17
- 7 min read
Imagine you’re running a bustling chai stall in Mumbai, serving steaming cups of masala tea to a diverse crowd. Your secret recipe—passed down from your nani—is written on a crumpled piece of paper, tucked away in a tin box. But what if that recipe falls into the wrong hands? Or worse, what if your pani puri vendor starts tweeting your customer list?
Enter Microsoft Purview, your digital vault to keep sensitive info safe, using clever tools called classifiers. Let’s dive into these classifiers with a fun, explaining how they work to protect your data like a security guard.
What Are Classifiers in Microsoft Purview?
Classifiers in Microsoft Purview are used to identify and tag sensitive information—think Aadhaar numbers, credit card details, or your samosa supplier contracts—across your organization’s files, emails, and databases. Purview offers three main types of classifiers: Sensitive Information Types (SITs), Trainable Classifiers, and Exact Data Match (EDM) classifiers.
Let’s explore these classifiers with a few examples!
1. Sensitive Information Types (SITs)
What Are SITs?
Sensitive Information Types (SITs) are pattern-based classifiers that identify specific data formats, like PAN numbers, phone numbers, or credit card details, using regular expressions (regex), keywords, and validation checks. Microsoft Purview provides over 300 pre-built SITs, but you can create custom ones for needs like detecting Indian PAN numbers or Diwali promo codes. SITs must be used in DLP policies or sensitivity/retention label policies to scan and protect data.
💎 Example: Contoso Jeweler’s PAN Panic Prevented by Pattern Precision
At Contoso Jeweler, a high-end brand renowned for its handcrafted jhumkas and glittering mangalsutras, business is booming — especially during the festive gold rush. But with great sparkle comes great responsibility. 💍✨
To comply with tax regulations in India, sales over ₹2 lakh require the customer's PAN (Permanent Account Number). These PAN details often fly around in emails, spreadsheets, or OneDrive folders faster than a diamond-studded bangle in a flash sale. 🏃♀️💨
And sometimes… they land in the wrong hands. Or worse — in unsecured folders.
🛡️ Enter: Microsoft Purview’s Sensitive Information Types (SITs)
You, the vigilant IT admin (with the reflexes of a vault door), step in and configure a custom Sensitive Information Type (SIT) designed to detect PAN numbers across your organization — before they leak like melted gold in a furnace.
🔬 How You Built It: Pattern Precision Like a Master Jeweler
Much like inspecting a flawless diamond, you use three main elements to detect sensitive info with confidence:
🧠 Regular Expression (Regex)
To catch PAN numbers, you define a regex pattern like: ^[A-Z]{5}[0-9]{4}[A-Z]{1}$
This matches the 10-character format: Five letters, four digits, one letter (e.g., ABCDE1234F).
🧾 Keywords Nearby
To avoid false positives (like a jewelry SKU “RING1234X”), you add keywords like: “PAN,” “Tax ID,” “KYC,” and “Invoice” that must appear within 300 characters of the matched text.
✅ Validation
Purview uses checksum logic to confirm it’s a legitimate PAN — not just a lookalike. This ensures only real tax identifiers are flagged, like verifying a gemstone’s authenticity certificate.
🎬 The Save-the-Day Scenario: Festival Frenzy Edition Your sales manager, Anjali, sends an email to a VIP client:
“Customer PAN: VWXYZ4321K for gold bangle invoice #CJ456” Suddenly — your DLP policy, powered by the custom SIT, springs into action like a Bollywood hero in the final act:
🎯 Pattern Match: Detects the valid PAN “VWXYZ4321K”
🗝️ Keyword Match: Finds “PAN” and “invoice” in close proximity
🔍 Checksum Passes: Confirms the number is valid and real
💥 Result: Purview blocks the email, locks the draft, and pops up a friendly-but-firm warning:
“Oops! You’re sharing a PAN number. This action is restricted.”Anjali is saved from an accidental compliance breach, and the compliance team is auto-notified faster than a courier delivering platinum earrings on your marriage anniversary day.
✨ Why It’s Brilliant:
300+ Pre-built SITs already detect global identifiers like SSNs (US), NINs (UK), and more.
Custom SITs let you tailor protection to your specific needs — like PAN numbers, internal order IDs, or promo codes.
Validation logic cuts out noise by ensuring accuracy — like rejecting fake diamonds.
Policy integration with DLP and retention labels makes detection actionable.
Regulatory Alignment with GDPR, India’s IT Act, and other compliance frameworks keeps your brand reputation as polished as your showroom floor.
2. Trainable Classifiers
What Are Trainable Classifiers?
Trainable Classifiers in Microsoft Purview are AI-powered tools that learn to identify sensitive content based on examples you provide — not just fixed patterns. Unlike traditional classifiers that rely solely on keywords or regular expressions, Trainable Classifiers are "taught" using real documents: you give them positive samples (what you want them to detect) and negative samples (what to ignore). Over time, they learn to understand context and structure, making them ideal for detecting complex or organization-specific content like business plans, HR reports, or R&D documents. Think of it like training a chef to spot the perfect biryani — not just by ingredients, but by aroma, texture, and timing
🧃 Example: Juice Lab’s Secret Formula Foiled
At Contoso Fruits Beverages, innovation is everything. Their secret to success? A lab full of nutritionists and food scientists crafting the trendiest superfood juices before anyone else even thinks of them.
This year, they’re cooking up something special — a limited-edition blend called the "Glow-Up Guava Elixir". It's packed with antioxidants, a hint of hibiscus, and just enough sparkle to trend on every influencer’s feed. This formula is top secret — even the interns sign NDAs thicker than mango pulp.
But one sunny Tuesday, something unexpected happens…
😱 The Oops Moment:
An over-enthusiastic marketing executive, while preparing for a pitch deck, drags and drops the entire R&D document — including the formulation breakdown — into the shared "Launch Campaigns" folder on SharePoint. This folder is accessible by almost 60 people, including external contractors and interns.
Cue panic at the juice lab.
💡 Enter: Microsoft Purview’s Trainable Classifier
You, the data protection hero, don’t rely on chance. You’ve already anticipated this kind of situation. So, you’ve built a custom Trainable Classifier in Microsoft Purview called “Product Formulas” to act like a digital food safety inspector.
Here’s how you trained it:
🟢 Positive Samples (What to Catch):50 internal R&D files with language like:
“Batch Composition v2.3”
“Raw Ingredient Ratios”
“Antioxidant Extraction Process”
“Nutritional Test Summary”
🔴 Negative Samples (What to Ignore):150 creative briefs, launch posters, juice memes, marketing slogans, and campaign calendars.
After training (which takes a few hours), your classifier becomes the watchdog of wellness.
🚨 The Save-the-Day Moment:
Within minutes, the classifier detects the sensitive content within the document titled: “GlowUp_Elixir_2025_Final_Formula.docx”
When this trainable classifier is used with policies in purview it automatically:
Applies an appropriate sensitivity label.
Restricts access to only the head of R&D and the internal product team.
Sends a policy alert to the admin and logs the incident for auditing.
Prevents the file from being shared externally or downloaded.
Meanwhile, the marketing exec gets a friendly message that says: “This file contains sensitive content and can’t be shared outside the approved group.”
The company breathes a sigh of relief — no formula leak, no competitive copycats, no ruined surprise. The Glow-Up Guava stays under wraps until its grand reveal on World Wellness Day.
🎉 Why This Rocks:
Your custom classifier didn’t just match keywords — it understood context, like a juice sommelier sniffing out the perfect blend.
You avoided a data leak that could’ve cost millions in IP and market advantage.
It took zero manual policing, just smart AI doing what it’s trained to do — protect what matters most.
3. Exact Data Match (EDM)
What is Exact Data Match (EDM)?
Exact Data Match (EDM) in Microsoft Purview is a powerful classification tool that matches sensitive data against an exact set of values, such as employee IDs, phone numbers, or customer details, stored in a secure, pre-uploaded file (e.g., a CSV). It uses hashed data for privacy, ensuring even the platform cannot see raw values, and flags any files containing these exact matches with a sensitivity label, such as "Restricted." This minimizes false positives by identifying data with high precision, making it ideal for protecting highly sensitive and unique data, like customer records, across large datasets
🍬 Example: Chai & Chaat Co.'s Loyalty Program Lockdown with EDM Classifiers
At Chai & Chaat Co., known for its irresistible chaats and spicy chai, you’ve built an exclusive loyalty program with over 10,000 customers — all tied to sensitive information like phone numbers and names stored in your CRM. Naturally, you're always worried about sneaky employees leaking this data to competitors offering free kulfi or other sweet deals.
To protect your customer base, you set up an Exact Data Match (EDM) Classifier by uploading a hashed CSV file containing Customer Name and Phone Number to Purview. This file is encrypted and hashed for privacy, ensuring that sensitive data remains safe.
🔒 The EDM Power Move:
One day, your delivery guy, Vijay, accidentally tries to upload a file to OneDrive containing customer details like:
"Ravi Sharma, +919876543210"As soon as the file hits the system, Purview’s EDM Classifier kicks in:
It matches the data exactly to your CRM records.
Applies a "Restricted" sensitivity label to the file.
Sends a real-time alert to your compliance team.
Faster than you can shout “golgappa!”, Vijay’s upload is blocked, and your valuable customer data remains as secure as gold in a bank vault.
🌟 Why This is Brilliant:
Precision: EDM Classifiers are designed for exact matches — no false alarms, like mistaking a random number for a customer’s phone number.
Privacy: The data is hashed, so even Purview can't access the raw details, keeping it private and secure.
Scalability: Whether you have thousands of customer records or Holi festival guest lists, EDM Classifiers scale up without a hitch.
🍛 Why Classifiers Are Your Data’s Biryani Base
Just like a perfect biryani needs the right mix of spices to create the ultimate flavor, your data protection strategy relies on classifiers to bring everything together. These tools help you:
Identify sensitive info across various platforms like SharePoint, OneDrive, and Exchange.
Apply labels (sensitivity or retention) to protect or govern your data.
Enforce policies like Data Loss Prevention (DLP) to prevent leaks, such as stopping Aadhaar numbers from leaving your network.
Ensure compliance with regulations like India’s DPDP Act, keeping your business in the clear and out of legal hot water.
🛠️ Getting Started with Purview Classifiers
Ready to spice up your data protection game? Here’s how to start:
Access Purview: Sign into the Microsoft Purview portal with Compliance or Security admin roles.
Explore SITs: Check out pre-built SITs or create a custom one for India-specific data, like PAN numbers (^[A-Z]{5}[0-9]{4}[A-Z]{1}$).
Train Classifiers: Go to Data Classification > Trainable Classifiers, upload samples, and train a custom classifier (give it 1-2 days to process).
Set Up EDM: Upload a hashed CSV to Data Classification > Exact Data Match and configure your rules.
Test & Tune: Use the Contextual Summary tab to review matches and improve accuracy.
Apply Policies: Link your classifiers to DLP or auto-labeling policies to safeguard your data like a fort around your chai stall.
🍽️ Conclusion: Keep Your Data as Safe as Your Nani’s Recipe
Microsoft Purview’s classifiers — SITs, Trainable Classifiers, and EDM — are like your digital bodyguards, ensuring that sensitive data stays out of the wrong hands, much like you wouldn’t want your pani puri recipe shared with a rival. Whether it’s catching Aadhaar numbers, securing Diwali bonus plans, or locking down customer lists, these tools combine AI-powered precision with a sprinkle of fun. Fire up Purview, safeguard your chai empire, and celebrate a thali of compliance success!
Commentaires