engineering

PII redaction: how we handle personal data

Personal data never reaches the analysis models. Here's how the redaction engine works.

By Howzer Team, Engineering

Why redaction comes first

In Howzer's pipeline, PII redaction is the very first processing step after ingest. Before any sentiment model, risk scorer, or language model sees a message, personally identifiable information is detected and replaced with category tokens (e.g., [NAME], [EMAIL], [PHONE]). This is by design, not an afterthought.

Redaction in the pipeline
Ingestraw messageRedactPII → tokensAnalyzeclean text onlyEnrichhistory · policyResponddraft reply

What gets detected

The redaction engine uses a combination of named entity recognition (NER), pattern matching, and context-aware rules. It covers the entity types most common in customer service communication:

8+
Entity types
<5ms
Per-message latency
0
PII in model input
  • Names (first, last, full): pattern + NER hybrid.
  • Email addresses: RFC-compliant pattern matching.
  • Phone numbers: German and international formats.
  • Postal addresses: street, city, ZIP detection.
  • IBAN / account numbers: checksum-validated patterns.
  • Date of birth: contextual detection (not all dates).
  • Customer IDs: configurable patterns per deployment.
  • Free-text identifiers: names embedded in sentences.

How it works in practice

Consider a message like: "My name is Maria Schmidt, you can reach me at maria.schmidt@example.com or 0172-1234567." After redaction, the downstream models see: "My name is [NAME], you can reach me at [EMAIL] or [PHONE]." The analysis quality is preserved. Sentiment, emotion, and root cause detection work on the structure and vocabulary of the message, not on the personal data.

The original message with PII is stored separately with restricted access and configurable retention periods. Redacted versions are used for all analysis and model inference.

Deployment context

Because Howzer runs self-hosted in your tenant, redacted data never leaves your network. The redaction engine itself runs locally, with no cloud API calls and no external NER services. This is a hard requirement for our customers in regulated industries.