Documentation

LogShield is a deterministic CLI tool for sanitizing sensitive data from logs before they are shared, stored, or processed further. This documentation is authoritative.

Overview

LogShield reads log data from stdin or files, applies explicit rule-based detection, and writes safe, structure-preserving output to stdout.

Note: the npm package ships the CLI only; there is no supported JavaScript API surface.

Design Goals

Deterministic output (same input -> same output)
Prefer precision over recall (minimize false positives)
Pipeline-native (STDIN -> STDOUT)
Local-only, no network, no telemetry

Installation

Install LogShield globally via npm:

bash
npm install -g logshield-cli

Verify installation:

bash
logshield --version
Windows PATH note If logshield is not found after a global install, make sure your npm global bin directory is on PATH. Run npm prefix -g and add that directory to your environment PATH.

Basic Usage

Scan a log file:

bash
logshield scan app.log

Pipe logs via stdin:

bash
cat app.log | logshield scan
Auto-detection STDIN is auto-detected when input is piped. No additional flags required.

Output Behavior

LogShield is designed to be predictable and non-destructive.

What LogShield guarantees

  • Line order is preserved
  • Whitespace is preserved
  • Non-secret values are never modified
  • Only the value of detected secrets is replaced
  • Keys, field names, and surrounding text are untouched

Input Sources

LogShield supports two input modes:

1. File input

bash
logshield scan app.log

2. STDIN (pipe)

bash
cat app.log | logshield scan

When data is piped, LogShield automatically switches to stdin mode. The --stdin flag exists as an explicit override for edge cases.

Redaction Format

All redacted values follow this format:

format
<REDACTED_TYPE>

No hashing, masking, or partial replacement is performed. Examples:

Before
password=secret123 Authorization: Bearer eyJhbGciOiJIUzI1NiIs... x-api-key: 1234567890abcdef email: user@example.com
After (default mode, sanitized output):
password=<REDACTED_PASSWORD> Authorization: Bearer <REDACTED_TOKEN> x-api-key: <REDACTED_API_KEY> email: <REDACTED_EMAIL>

Determinism

LogShield is fully deterministic. This means:

  • The same input always produces the same output
  • There is no randomness or sampling
  • Results are reproducible across machines and runs
  • No heuristic scoring or context-dependent behavior

Determinism is critical for:

  • CI pipelines
  • Snapshot tests
  • Auditable incident workflows
  • Diff-based workflows
  • Reproducible builds

Flags Reference

Flag Description
scan Scan input logs for secrets
--strict Aggressive, security-first redaction. May redact more than default; intended for CI and production.
--dry-run Report detected redactions only. Does not output log content.
--stdin Force stdin mode (usually auto-detected)
--json Machine-readable JSON output (safe to serialize; supports --dry-run with empty output)
--summary Print compact redaction summary
--fail-on-detect Exit with code 1 if secrets are found (CI gate)
--version Print version
--help Show help

JSON Output

Use --json for structured output. In --dry-run JSON mode, the output field is intentionally an empty string to avoid accidental secret leakage through serialization.

bash
echo "password=secret123" | logshield scan --json --dry-run
json
{"output":"","matches":[{"rule":"PASSWORD"}]}

Exit Codes

LogShield uses explicit exit codes for automation. You should rely on exit codes in CI environments.

Code Meaning
0 Success (detection does not change exit code)
1 Secrets detected (with --fail-on-detect)
2 Invalid arguments, bounded input failure, or runtime error

Input Limits

LogShield applies a safety cap to avoid runaway memory/CPU usage in pipelines.

  • Maximum input size: 200KB
  • Maximum line length: 64KB per line

If a single line exceeds the line cap, LogShield fails with exit code 2 and a deterministic Log line <n> exceeds 64KB limit error.

These bounds are part of the v0.7.0 regex safety hardening work to keep worst-case input behavior explicit and predictable.

If you need to scan larger logs, split them or pipe a slice (for example: tail -n 2000 app.log | logshield scan).

Supported Secret Types

LogShield detects common secret patterns including:

  • Passwords
  • API keys (headers + key/value forms)
  • Authorization Bearer tokens
  • JWTs
  • GitHub tokens
  • Slack tokens
  • npm access tokens
  • npmrc auth tokens (:_authToken=...)
  • PyPI API tokens
  • SendGrid API keys
  • Private key blocks (PEM/OpenSSH, including ENCRYPTED PRIVATE KEY)
  • Emails
  • URLs with embedded credentials
  • Database credentials (DB_URL)
  • OAuth tokens (access_token, refresh_token)

Strict mode additionally detects:

  • Stripe keys (sk_live_, sk_test_)
  • AWS access keys (AKIA...)
  • AWS secret keys
  • Credit card numbers (Luhn-validated)

In v0.7.0, the strict-mode credit card matcher was tightened to reduce separator-heavy ambiguous near-miss matching without changing normal successful sanitize output.

v0.7.0 Notes

  • No new CLI flags
  • No breaking change to normal successful scan/sanitize output
  • Adversarial regression coverage was added for input boundaries and regex near-miss cases

See the source code for the full rule list.

Common Pipelines

Local debugging

bash
cat debug.log | logshield scan --dry-run

CI validation

bash
logshield scan app.log --strict --fail-on-detect

Sanitize and save

bash
cat app.log | logshield scan > safe.log

Upload sanitized logs

bash
docker logs app | logshield scan | upload-logs

GitHub Actions

Basic setup:

yaml
- name: Install LogShield run: npm install -g logshield-cli - name: Run tests (sanitized) run: npm test 2>&1 | logshield scan

CI gate - fail if secrets detected:

yaml
- name: Check for secrets run: | npm test > test.log 2>&1 logshield scan --dry-run --fail-on-detect < test.log

For more advanced patterns (reusable workflows, Docker builds, conditional strict mode), see the GitHub Actions tutorial on the blog.

What LogShield is NOT

To avoid ambiguity, LogShield intentionally does not:

  • Send data to the cloud
  • Use AI or probabilistic models
  • Modify log structure
  • Remove entire lines
  • Perform partial masking (e.g., sk_live_****)
  • Learn or adapt over time

LogShield is designed to be boring, predictable, and safe.

Security Model

Local-only by design LogShield runs fully locally. No outbound network calls. No telemetry. No data persistence.

All detection rules are:

  • Explicit
  • Inspectable
  • Versioned

You can audit every rule applied to your logs by reviewing the source code.

License

LogShield is released under the Apache-2.0 License.

Support