Security & Validation

Data Loss Prevention Policy

The Data Loss Prevention (DLP) policy scans incoming request bodies for sensitive data — personally identifiable information (PII), secrets and API keys for dozens of vendors, payment and bank identifiers, and national IDs for many countries — using a catalog of 60+ built-in recognizers plus any custom patterns you add. When a match is found it takes a configurable action: mask the matches, block the request, or log a warning and let it through.

Recognizers are selected individually or via entity groups (secret, finance, pii, id-us, id-uk, region-eu, …). Detection runs entirely in the gateway isolate using regular expressions, checksums (Luhn, mod-97, Verhoeff, and friends), and context-word scoring — no request data leaves the gateway.

Pair with the Data Loss Prevention - Outbound policy to also scan upstream responses before they're returned to the client.

Configuration

The configuration shows how to configure the policy in the 'policies.json' document.


Code
{
  "name": "my-data-loss-prevention-inbound-policy",
  "policyType": "data-loss-prevention-inbound",
  "handler": {
    "export": "DataLossPreventionInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "action": "mask",
      "entities": ["secret", "finance", "contact-email", "id-us-ssn"],
      "mask": "[REDACTED]"
    }
  }
}

Policy Configuration

name <string> - The name of your policy instance. This is used as a reference in your routes.
policyType <string> - The identifier of the policy. This is used by the Zuplo UI. Value should be data-loss-prevention-inbound.
handler.export <string> - The name of the exported type. Value should be DataLossPreventionInboundPolicy.
handler.module <string> - The module containing the policy. Value should be $import(@zuplo/runtime).
handler.options <object> - The options for this policy. See Policy Options below.

Policy Options

The options for this policy are specified below. All properties are optional unless specifically marked as required.

engine <string> - The detection engine. Only builtin (in-isolate regex + checksum detection with context-word scoring) is available today. This is the extension point for a future hosted presidio-service mode; declaring it now keeps adding that mode an additive, non-breaking change. Allowed values are builtin. Defaults to "builtin".
entities <string[]> - Built-in recognizer ids and/or group selectors to enable. Entity ids follow a {category}-{scope}-{name} taxonomy, and any dash-aligned id prefix acts as a selector (for example secret is every secret, id-au is Australia's identifiers, secret-aws is both AWS entities), plus the named groups pii and region-eu. Available selectors: contact, finance, finance-us, id, id-au, id-br, id-ca, id-es, id-fr, id-in, id-it, id-nl, id-pl, id-sg, id-uk, id-us, network, pii, region-eu, secret, secret-aws. When omitted, the full built-in catalog is used.
action <string> - What to do when sensitive data is detected. mask redacts matches before forwarding the request, block rejects with a 422 listing only the detected entity names, and log records a warning and forwards the request unchanged. Allowed values are mask, block, log. Defaults to "mask".
mask <string> - The string that replaces detected values when action is mask. Defaults to "[REDACTED]".
customPatterns <object[]> - Additional customer-defined regex recognizers. Invalid patterns are logged and skipped rather than failing the request.
- name (required) <string> - Identifier reported in findings and block details for this pattern.
- pattern (required) <string> - A JavaScript regular expression source string. Remember to escape backslashes for JSON (for example \\d for a digit).
- confidence <number> - Base confidence (0-1) for matches of this pattern. The default of 0.85 is above the default detection threshold; combine a low value with context words for patterns that are only sensitive in context. Defaults to 0.85.
- context <string[]> - Context words that boost a match's confidence by 0.45 when one appears near the match (in the surrounding field, label, or key).
minConfidence <number> - Minimum confidence (0-1) a match must reach to count as a finding. Context-dependent recognizers (for example finance-us-bank-account or finance-us-aba-routing) sit below the default threshold of 0.5 until a context word near the match boosts them above it. Lower the threshold to surface them everywhere; raise it to keep only prefix- or checksum-validated matches. Defaults to 0.5.
contentTypes <string[]> - Override the set of scannable content-type prefixes. When omitted, the built-in text content-type allow-list (JSON, XML, form-encoded, text/*) is used.

Using the Policy

This policy inspects the body of each incoming request for sensitive data and applies a configurable action. It is the inbound counterpart to the Data Loss Prevention - Outbound policy, which inspects upstream responses.

Detection happens entirely inside the gateway isolate — request bodies are never sent to a third-party service.

Actions

mask (default) — every detected value is replaced with the mask string and the modified body is forwarded upstream. Overlapping matches are merged and masked once.
block — the request is rejected with a 422 Unprocessable Content. The problem detail lists only the names of the detected entities, never the matched values, so the policy never leaks the data it caught.
log — a structured warning is written (entity ids and counts only) and the request is forwarded unchanged.

Built-in recognizers

Enable entities individually or by group selector in the entities option, or omit it to use the full catalog. Entity ids follow a {category}-{scope}-{name} taxonomy, and any dash-aligned prefix of an id is a valid selector: secret enables every secret, id-au enables Australia's identifiers, secret-aws enables both AWS entities. Two named groups (pii, region-eu) bundle entities across categories.

Group	Entities
`secret`	`secret-private-key`, `secret-jwt`, `secret-aws-access-key`, `secret-aws-bedrock`, `secret-github`, `secret-gitlab`, `secret-zuplo`, `secret-openai`, `secret-anthropic`, `secret-google-api-key`, `secret-stripe`, `secret-slack`, `secret-discord-webhook`, `secret-npm`, `secret-pypi`, `secret-sendgrid`, `secret-twilio`, `secret-hugging-face`, `secret-databricks`, `secret-shopify`, `secret-square`, `secret-mailchimp`, `secret-mailgun`, `secret-postman`, `secret-terraform`, `secret-sentry`, `secret-digitalocean`, `secret-heroku`, `secret-perplexity`, `secret-azure-client`, `secret-telegram-bot`
`finance`	`finance-credit-card` (Luhn), `finance-iban` (per-country length + mod-97), `finance-crypto-wallet`, `finance-us-aba-routing` (checksum), `finance-swift-bic`, `finance-us-bank-account`, `finance-cvv`
`id`	`id-us-ssn`, `id-us-itin`, `id-us-passport`, `id-uk-nino`, `id-uk-nhs` (mod-11), `id-ca-sin` (Luhn), `id-au-abn`, `id-au-acn`, `id-au-tfn`, `id-au-medicare` (all checksummed), `id-in-aadhaar` (Verhoeff), `id-in-pan`, `id-sg-nric` (checksum), `id-es-nif` (checksum), `id-it-fiscal-code` (checksum), `id-pl-pesel` (checksum), `id-nl-bsn` (11-proef), `id-br-cpf` (checksum), `id-fr-nir` (mod-97)
`contact`	`contact-email`, `contact-phone`
`network`	`network-ipv4`, `network-ipv6`, `network-mac`
`pii`	`contact` + `id`
Prefixes	`id-us`, `id-uk`, `id-au`, `id-ca`, `id-in`, `id-sg`, `id-es`, `id-it`, `id-pl`, `id-nl`, `id-br`, `id-fr`, `finance-us`, `secret-aws` — everything whose id starts with that prefix
`region-eu`	`id-es-nif`, `id-it-fiscal-code`, `id-pl-pesel`, `id-nl-bsn`, `id-fr-nir`, `finance-iban`

Context-word scoring

Every match gets a confidence score. Recognizers whose raw pattern is just "a run of digits" (bank accounts, routing numbers, NHS numbers, …) carry a low base confidence and a list of context words; when one of those words appears near the match — in prose, or in a JSON key, form field, or header-like label (nhsNumber, routing_number, cvv:) — the confidence is boosted above the detection threshold.

For example, with the id-uk-nhs entity enabled, {"nhsNumber": "9434765919"} is masked while the same digits in {"orderId": "9434765919"} pass through untouched.

The threshold is configurable via minConfidence (default 0.5): lower it to detect context-dependent entities everywhere, raise it to keep only prefix- and checksum-validated matches.

Custom patterns

Add your own recognizers with customPatterns. Each entry has a name, a JavaScript regular expression pattern, and optionally a confidence and context words to participate in context scoring. Invalid patterns are logged and skipped rather than failing the request. Remember to escape backslashes for JSON (for example \\d to match a digit).

Content types

Only text-based bodies (JSON, XML, form-encoded, and text/*) are scanned; binary bodies pass through untouched. Override the allow-list with the contentTypes option if you need to scan a different set of content types.

Configuration

engine: The detection engine. Only builtin is available today. Default: builtin
entities: Recognizer ids and/or group selectors (prefixes, pii, region-eu) to enable. Default: all recognizers
customPatterns: Additional { name, pattern, confidence?, context? } regex recognizers
action: mask, block, or log. Default: mask
mask: Replacement string used when action is mask. Default: [REDACTED]
minConfidence: Detection threshold (0-1). Default: 0.5
contentTypes: Override the scannable content-type allow-list

Usage

Apply this policy to inbound requests in your route configuration:


Code
{
  "policies": [
    {
      "name": "data-loss-prevention-inbound",
      "policyType": "data-loss-prevention-inbound",
      "handler": {
        "export": "DataLossPreventionInboundPolicy",
        "module": "$import(@zuplo/runtime)",
        "options": {
          "action": "mask",
          "entities": ["secret", "finance", "id-us", "contact-email"],
          "mask": "[REDACTED]",
          "customPatterns": [
            {
              "name": "employee-id",
              "pattern": "EMP-\\d{6}",
              "confidence": 0.3,
              "context": ["employee"]
            }
          ]
        }
      }
    }
  ]
}

Data Loss Prevention Policy

Pair with the Data Loss Prevention - Outbound policy to also scan upstream responses before they're returned to the client.

Configuration

The configuration shows how to configure the policy in the 'policies.json' document.


Code
{
  "name": "my-data-loss-prevention-inbound-policy",
  "policyType": "data-loss-prevention-inbound",
  "handler": {
    "export": "DataLossPreventionInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "action": "mask",
      "entities": ["secret", "finance", "contact-email", "id-us-ssn"],
      "mask": "[REDACTED]"
    }
  }
}

Policy Configuration

name <string> - The name of your policy instance. This is used as a reference in your routes.
policyType <string> - The identifier of the policy. This is used by the Zuplo UI. Value should be data-loss-prevention-inbound.
handler.export <string> - The name of the exported type. Value should be DataLossPreventionInboundPolicy.
handler.module <string> - The module containing the policy. Value should be $import(@zuplo/runtime).
handler.options <object> - The options for this policy. See Policy Options below.

Policy Options

The options for this policy are specified below. All properties are optional unless specifically marked as required.

engine <string> - The detection engine. Only builtin (in-isolate regex + checksum detection with context-word scoring) is available today. This is the extension point for a future hosted presidio-service mode; declaring it now keeps adding that mode an additive, non-breaking change. Allowed values are builtin. Defaults to "builtin".
entities <string[]> - Built-in recognizer ids and/or group selectors to enable. Entity ids follow a {category}-{scope}-{name} taxonomy, and any dash-aligned id prefix acts as a selector (for example secret is every secret, id-au is Australia's identifiers, secret-aws is both AWS entities), plus the named groups pii and region-eu. Available selectors: contact, finance, finance-us, id, id-au, id-br, id-ca, id-es, id-fr, id-in, id-it, id-nl, id-pl, id-sg, id-uk, id-us, network, pii, region-eu, secret, secret-aws. When omitted, the full built-in catalog is used.
action <string> - What to do when sensitive data is detected. mask redacts matches before forwarding the request, block rejects with a 422 listing only the detected entity names, and log records a warning and forwards the request unchanged. Allowed values are mask, block, log. Defaults to "mask".
mask <string> - The string that replaces detected values when action is mask. Defaults to "[REDACTED]".
customPatterns <object[]> - Additional customer-defined regex recognizers. Invalid patterns are logged and skipped rather than failing the request.
- name (required) <string> - Identifier reported in findings and block details for this pattern.
- pattern (required) <string> - A JavaScript regular expression source string. Remember to escape backslashes for JSON (for example \\d for a digit).
- confidence <number> - Base confidence (0-1) for matches of this pattern. The default of 0.85 is above the default detection threshold; combine a low value with context words for patterns that are only sensitive in context. Defaults to 0.85.
- context <string[]> - Context words that boost a match's confidence by 0.45 when one appears near the match (in the surrounding field, label, or key).
minConfidence <number> - Minimum confidence (0-1) a match must reach to count as a finding. Context-dependent recognizers (for example finance-us-bank-account or finance-us-aba-routing) sit below the default threshold of 0.5 until a context word near the match boosts them above it. Lower the threshold to surface them everywhere; raise it to keep only prefix- or checksum-validated matches. Defaults to 0.5.
contentTypes <string[]> - Override the set of scannable content-type prefixes. When omitted, the built-in text content-type allow-list (JSON, XML, form-encoded, text/*) is used.

Using the Policy

Detection happens entirely inside the gateway isolate — request bodies are never sent to a third-party service.

Actions

mask (default) — every detected value is replaced with the mask string and the modified body is forwarded upstream. Overlapping matches are merged and masked once.
block — the request is rejected with a 422 Unprocessable Content. The problem detail lists only the names of the detected entities, never the matched values, so the policy never leaks the data it caught.
log — a structured warning is written (entity ids and counts only) and the request is forwarded unchanged.

Built-in recognizers

Group	Entities
`secret`	`secret-private-key`, `secret-jwt`, `secret-aws-access-key`, `secret-aws-bedrock`, `secret-github`, `secret-gitlab`, `secret-zuplo`, `secret-openai`, `secret-anthropic`, `secret-google-api-key`, `secret-stripe`, `secret-slack`, `secret-discord-webhook`, `secret-npm`, `secret-pypi`, `secret-sendgrid`, `secret-twilio`, `secret-hugging-face`, `secret-databricks`, `secret-shopify`, `secret-square`, `secret-mailchimp`, `secret-mailgun`, `secret-postman`, `secret-terraform`, `secret-sentry`, `secret-digitalocean`, `secret-heroku`, `secret-perplexity`, `secret-azure-client`, `secret-telegram-bot`
`finance`	`finance-credit-card` (Luhn), `finance-iban` (per-country length + mod-97), `finance-crypto-wallet`, `finance-us-aba-routing` (checksum), `finance-swift-bic`, `finance-us-bank-account`, `finance-cvv`
`id`	`id-us-ssn`, `id-us-itin`, `id-us-passport`, `id-uk-nino`, `id-uk-nhs` (mod-11), `id-ca-sin` (Luhn), `id-au-abn`, `id-au-acn`, `id-au-tfn`, `id-au-medicare` (all checksummed), `id-in-aadhaar` (Verhoeff), `id-in-pan`, `id-sg-nric` (checksum), `id-es-nif` (checksum), `id-it-fiscal-code` (checksum), `id-pl-pesel` (checksum), `id-nl-bsn` (11-proef), `id-br-cpf` (checksum), `id-fr-nir` (mod-97)
`contact`	`contact-email`, `contact-phone`
`network`	`network-ipv4`, `network-ipv6`, `network-mac`
`pii`	`contact` + `id`
Prefixes	`id-us`, `id-uk`, `id-au`, `id-ca`, `id-in`, `id-sg`, `id-es`, `id-it`, `id-pl`, `id-nl`, `id-br`, `id-fr`, `finance-us`, `secret-aws` — everything whose id starts with that prefix
`region-eu`	`id-es-nif`, `id-it-fiscal-code`, `id-pl-pesel`, `id-nl-bsn`, `id-fr-nir`, `finance-iban`

Context-word scoring

For example, with the id-uk-nhs entity enabled, {"nhsNumber": "9434765919"} is masked while the same digits in {"orderId": "9434765919"} pass through untouched.

The threshold is configurable via minConfidence (default 0.5): lower it to detect context-dependent entities everywhere, raise it to keep only prefix- and checksum-validated matches.

Custom patterns

Content types

Configuration

engine: The detection engine. Only builtin is available today. Default: builtin
entities: Recognizer ids and/or group selectors (prefixes, pii, region-eu) to enable. Default: all recognizers
customPatterns: Additional { name, pattern, confidence?, context? } regex recognizers
action: mask, block, or log. Default: mask
mask: Replacement string used when action is mask. Default: [REDACTED]
minConfidence: Detection threshold (0-1). Default: 0.5
contentTypes: Override the scannable content-type allow-list

Usage

Apply this policy to inbound requests in your route configuration:


Code
{
  "policies": [
    {
      "name": "data-loss-prevention-inbound",
      "policyType": "data-loss-prevention-inbound",
      "handler": {
        "export": "DataLossPreventionInboundPolicy",
        "module": "$import(@zuplo/runtime)",
        "options": {
          "action": "mask",
          "entities": ["secret", "finance", "id-us", "contact-email"],
          "mask": "[REDACTED]",
          "customPatterns": [
            {
              "name": "employee-id",
              "pattern": "EMP-\\d{6}",
              "confidence": 0.3,
              "context": ["employee"]
            }
          ]
        }
      }
    }
  ]
}