
# Data Loss Prevention Policy

The Data Loss Prevention (DLP) policy scans incoming request bodies for
sensitive data — personally identifiable information (PII), secrets and API
keys for dozens of vendors, payment and bank identifiers, and national IDs for
many countries — using a catalog of 60+ built-in recognizers plus any custom
patterns you add. When a match is found it takes a configurable action: mask
the matches, block the request, or log a warning and let it through.

Recognizers are selected individually or via entity groups (`secret`,
`finance`, `pii`, `id-us`, `id-uk`, `region-eu`, …). Detection runs entirely in the
gateway isolate using regular expressions, checksums (Luhn, mod-97, Verhoeff,
and friends), and context-word scoring — no request data leaves the gateway.

Pair with the
[Data Loss Prevention - Outbound](/docs/policies/data-loss-prevention-outbound)
policy to also scan upstream responses before they're returned to the client.

## Configuration

The configuration shows how to configure the policy in the 'policies.json' document.

```json title="config/policies.json"
{
  "name": "my-data-loss-prevention-inbound-policy",
  "policyType": "data-loss-prevention-inbound",
  "handler": {
    "export": "DataLossPreventionInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "action": "mask",
      "entities": ["secret", "finance", "contact-email", "id-us-ssn"],
      "mask": "[REDACTED]"
    }
  }
}
```

### Policy Configuration

- `name` <code className="text-green-600">&lt;string&gt;</code> - The name of your policy instance. This is used as a reference in your routes.
- `policyType` <code className="text-green-600">&lt;string&gt;</code> - The identifier of the policy. This is used by the Zuplo UI. Value should be `data-loss-prevention-inbound`.
- `handler.export` <code className="text-green-600">&lt;string&gt;</code> - The name of the exported type. Value should be `DataLossPreventionInboundPolicy`.
- `handler.module` <code className="text-green-600">&lt;string&gt;</code> - The module containing the policy. Value should be `$import(@zuplo/runtime)`.
- `handler.options` <code className="text-green-600">&lt;object&gt;</code> - The options for this policy. [See Policy Options](#policy-options) below.

### Policy Options

The options for this policy are specified below. All properties are optional unless specifically marked as required.

- `engine` <code className="text-green-600">&lt;string&gt;</code> - The detection engine. Only `builtin` (in-isolate regex + checksum detection with context-word scoring) is available today. This is the extension point for a future hosted `presidio-service` mode; declaring it now keeps adding that mode an additive, non-breaking change. Allowed values are `builtin`. Defaults to `"builtin"`.
- `entities` <code className="text-green-600">&lt;string[]&gt;</code> - Built-in recognizer ids and/or group selectors to enable. Entity ids follow a `{category}`-`{scope}`-`{name}` taxonomy, and any dash-aligned id prefix acts as a selector (for example `secret` is every secret, `id-au` is Australia's identifiers, `secret-aws` is both AWS entities), plus the named groups `pii` and `region-eu`. Available selectors: `contact`, `finance`, `finance-us`, `id`, `id-au`, `id-br`, `id-ca`, `id-es`, `id-fr`, `id-in`, `id-it`, `id-nl`, `id-pl`, `id-sg`, `id-uk`, `id-us`, `network`, `pii`, `region-eu`, `secret`, `secret-aws`. When omitted, the full built-in catalog is used.
- `customPatterns` <code className="text-green-600">&lt;object[]&gt;</code> - Additional customer-defined regex recognizers. Invalid patterns are logged and skipped rather than failing the request.
  - `name` **(required)** <code className="text-green-600">&lt;string&gt;</code> - Identifier reported in findings and block details for this pattern.
  - `pattern` **(required)** <code className="text-green-600">&lt;string&gt;</code> - A JavaScript regular expression source string. Remember to escape backslashes for JSON (for example `\\d` for a digit).
  - `confidence` <code className="text-green-600">&lt;number&gt;</code> - Base confidence (0-1) for matches of this pattern. The default of 0.85 is above the default detection threshold; combine a low value with `context` words for patterns that are only sensitive in context. Defaults to `0.85`.
  - `context` <code className="text-green-600">&lt;string[]&gt;</code> - Context words that boost a match's confidence by 0.45 when one appears near the match (in the surrounding field, label, or key).
- `action` <code className="text-green-600">&lt;string&gt;</code> - What to do when sensitive data is detected. `mask` redacts matches before forwarding the request, `block` rejects with a 422 listing only the detected entity names, and `log` records a warning and forwards the request unchanged. Allowed values are `mask`, `block`, `log`. Defaults to `"mask"`.
- `mask` <code className="text-green-600">&lt;string&gt;</code> - The string that replaces detected values when `action` is `mask`. Defaults to `"[REDACTED]"`.
- `minConfidence` <code className="text-green-600">&lt;number&gt;</code> - Minimum confidence (0-1) a match must reach to count as a finding. Context-dependent recognizers (for example `finance-us-bank-account` or `finance-us-aba-routing`) sit below the default threshold of 0.5 until a context word near the match boosts them above it. Lower the threshold to surface them everywhere; raise it to keep only prefix- or checksum-validated matches. Defaults to `0.5`.
- `contentTypes` <code className="text-green-600">&lt;string[]&gt;</code> - Override the set of scannable content-type prefixes. When omitted, the built-in text content-type allow-list (JSON, XML, form-encoded, text/\*) is used.

## Using the Policy

This policy inspects the body of each incoming request for sensitive data and
applies a configurable action. It is the inbound counterpart to the
[Data Loss Prevention - Outbound](/docs/policies/data-loss-prevention-outbound)
policy, which inspects upstream responses.

Detection happens entirely inside the gateway isolate — request bodies are never
sent to a third-party service.

## Actions

- **`mask`** (default) — every detected value is replaced with the `mask` string
  and the modified body is forwarded upstream. Overlapping matches are merged
  and masked once.
- **`block`** — the request is rejected with a `422 Unprocessable Content`. The
  problem detail lists only the names of the detected entities, never the
  matched values, so the policy never leaks the data it caught.
- **`log`** — a structured warning is written (entity ids and counts only) and
  the request is forwarded unchanged.

## Built-in recognizers

Enable entities individually or by **group selector** in the `entities`
option, or omit it to use the full catalog. Entity ids follow a
`{category}-{scope}-{name}` taxonomy, and any dash-aligned prefix of an id is a
valid selector: `secret` enables every secret, `id-au` enables Australia's
identifiers, `secret-aws` enables both AWS entities. Two named groups (`pii`,
`region-eu`) bundle entities across categories.

| Group       | Entities                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `secret`    | `secret-private-key`, `secret-jwt`, `secret-aws-access-key`, `secret-aws-bedrock`, `secret-github`, `secret-gitlab`, `secret-zuplo`, `secret-openai`, `secret-anthropic`, `secret-google-api-key`, `secret-stripe`, `secret-slack`, `secret-discord-webhook`, `secret-npm`, `secret-pypi`, `secret-sendgrid`, `secret-twilio`, `secret-hugging-face`, `secret-databricks`, `secret-shopify`, `secret-square`, `secret-mailchimp`, `secret-mailgun`, `secret-postman`, `secret-terraform`, `secret-sentry`, `secret-digitalocean`, `secret-heroku`, `secret-perplexity`, `secret-azure-client`, `secret-telegram-bot` |
| `finance`   | `finance-credit-card` (Luhn), `finance-iban` (per-country length + mod-97), `finance-crypto-wallet`, `finance-us-aba-routing` (checksum), `finance-swift-bic`, `finance-us-bank-account`, `finance-cvv`                                                                                                                                                                                                                                                                                                                                                                                                              |
| `id`        | `id-us-ssn`, `id-us-itin`, `id-us-passport`, `id-uk-nino`, `id-uk-nhs` (mod-11), `id-ca-sin` (Luhn), `id-au-abn`, `id-au-acn`, `id-au-tfn`, `id-au-medicare` (all checksummed), `id-in-aadhaar` (Verhoeff), `id-in-pan`, `id-sg-nric` (checksum), `id-es-nif` (checksum), `id-it-fiscal-code` (checksum), `id-pl-pesel` (checksum), `id-nl-bsn` (11-proef), `id-br-cpf` (checksum), `id-fr-nir` (mod-97)                                                                                                                                                                                                             |
| `contact`   | `contact-email`, `contact-phone`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `network`   | `network-ipv4`, `network-ipv6`, `network-mac`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `pii`       | `contact` + `id`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| Prefixes    | `id-us`, `id-uk`, `id-au`, `id-ca`, `id-in`, `id-sg`, `id-es`, `id-it`, `id-pl`, `id-nl`, `id-br`, `id-fr`, `finance-us`, `secret-aws` — everything whose id starts with that prefix                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| `region-eu` | `id-es-nif`, `id-it-fiscal-code`, `id-pl-pesel`, `id-nl-bsn`, `id-fr-nir`, `finance-iban`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

## Context-word scoring

Every match gets a confidence score. Recognizers whose raw pattern is just "a
run of digits" (bank accounts, routing numbers, NHS numbers, …) carry a low
base confidence and a list of **context words**; when one of those words
appears near the match — in prose, or in a JSON key, form field, or header-like
label (`nhsNumber`, `routing_number`, `cvv:`) — the confidence is boosted above
the detection threshold.

For example, with the `id-uk-nhs` entity enabled, `{"nhsNumber": "9434765919"}` is
masked while the same digits in `{"orderId": "9434765919"}` pass through
untouched.

The threshold is configurable via `minConfidence` (default `0.5`): lower it to
detect context-dependent entities everywhere, raise it to keep only prefix- and
checksum-validated matches.

## Custom patterns

Add your own recognizers with `customPatterns`. Each entry has a `name`, a
JavaScript regular expression `pattern`, and optionally a `confidence` and
`context` words to participate in context scoring. Invalid patterns are logged
and skipped rather than failing the request. Remember to escape backslashes for
JSON (for example `\\d` to match a digit).

## Content types

Only text-based bodies (JSON, XML, form-encoded, and `text/*`) are scanned;
binary bodies pass through untouched. Override the allow-list with the
`contentTypes` option if you need to scan a different set of content types.

## Configuration

- `engine`: The detection engine. Only `builtin` is available today. **Default:**
  `builtin`
- `entities`: Recognizer ids and/or group selectors (prefixes, `pii`,
  `region-eu`) to enable. **Default:** all
  recognizers
- `customPatterns`: Additional `{ name, pattern, confidence?, context? }` regex
  recognizers
- `action`: `mask`, `block`, or `log`. **Default:** `mask`
- `mask`: Replacement string used when `action` is `mask`. **Default:**
  `[REDACTED]`
- `minConfidence`: Detection threshold (0-1). **Default:** `0.5`
- `contentTypes`: Override the scannable content-type allow-list

## Usage

Apply this policy to inbound requests in your route configuration:

```json
{
  "policies": [
    {
      "name": "data-loss-prevention-inbound",
      "policyType": "data-loss-prevention-inbound",
      "handler": {
        "export": "DataLossPreventionInboundPolicy",
        "module": "$import(@zuplo/runtime)",
        "options": {
          "action": "mask",
          "entities": ["secret", "finance", "id-us", "contact-email"],
          "mask": "[REDACTED]",
          "customPatterns": [
            {
              "name": "employee-id",
              "pattern": "EMP-\\d{6}",
              "confidence": 0.3,
              "context": ["employee"]
            }
          ]
        }
      }
    }
  ]
}
```

Read more about [how policies work](/articles/policies)
