PII redaction: how we handle personal data
Personal data never reaches the analysis models. Here's how the redaction engine works.
By Howzer Team, Engineering
Why redaction comes first
In Howzer's pipeline, PII redaction is the very first processing step after ingest. Before any sentiment model, risk scorer, or language model sees a message, personally identifiable information is detected and replaced with category tokens (e.g., [NAME], [EMAIL], [PHONE]). This is by design, not an afterthought.
What gets detected
The redaction engine uses a combination of named entity recognition (NER), pattern matching, and context-aware rules. It covers the entity types most common in customer service communication:
- Names (first, last, full): pattern + NER hybrid.
- Email addresses: RFC-compliant pattern matching.
- Phone numbers: German and international formats.
- Postal addresses: street, city, ZIP detection.
- IBAN / account numbers: checksum-validated patterns.
- Date of birth: contextual detection (not all dates).
- Customer IDs: configurable patterns per deployment.
- Free-text identifiers: names embedded in sentences.
How it works in practice
Consider a message like: "My name is Maria Schmidt, you can reach me at maria.schmidt@example.com or 0172-1234567." After redaction, the downstream models see: "My name is [NAME], you can reach me at [EMAIL] or [PHONE]." The analysis quality is preserved. Sentiment, emotion, and root cause detection work on the structure and vocabulary of the message, not on the personal data.
Deployment context
Because Howzer runs self-hosted in your tenant, redacted data never leaves your network. The redaction engine itself runs locally, with no cloud API calls and no external NER services. This is a hard requirement for our customers in regulated industries.