This demo runs 5 representative attacks against llama-3.1-8b-instant via Groq. The full local tool runs the complete attack library against any Ollama model you choose. GitHub ↗
Optionally set a system prompt to test. The scanner runs 5 representative attacks — prompt injection, DAN jailbreak, roleplay bypass, separator trick — and reports whether the model complied or refused each one.
Description
What it does: LLM Protector scans a local Ollama model for prompt injection and jailbreak vulnerabilities. It fires a library of categorized attack prompts at the model, detects whether the model complied or refused, and produces a per-attack report.
Why it matters: Most developers deploying local LLMs don't test their system prompts against adversarial inputs. A single DAN-style prompt or indirect injection can bypass guardrails that seem solid during normal use.
How it runs: FastAPI backend on localhost hits the Ollama/v1/chat/completions endpoint (with fallback to the native /api/chat). Attacks run concurrently up to a configurable concurrency limit. Results stream back to a React/Vite frontend as NDJSON.
Architecture
LLM Protector — Local Tool Architecture
══════════════════════════════════════════
┌──────────────────────┐ ┌──────────────────────┐
│ React / Vite UI │────▶│ FastAPI Backend │
│ localhost:5173 │ │ localhost:8000 │
│ │◀────│ │
│ · Select model │ │ · Load test_attacks │
│ · Set system prompt │ │ .yaml │
│ · View results live │ │ · Run attacks with │
│ · Filter by status │ │ semaphore (3 concurrent)
└──────────────────────┘ │ · Detect: refusal │
│ phrases vs compliance│
│ · Stream NDJSON back │
└──────────┬─────────────┘
│
▼
┌──────────────────────┐
│ Ollama │
│ localhost:11434 │
│ │
│ Any installed model: │
│ llama3, mistral, │
│ gemma, phi3, etc. │
└──────────────────────┘
Attack categories in test_attacks.yaml:
├── Prompt Injection (classic override, suffix, separator)
├── Jailbreaking (DAN, roleplay, fictional framing)
├── System Prompt Leak (extraction attempts)
└── Indirect Injection (via tool/context payloads)
WSL2 is fully supported — the backend auto-detects the Windows gateway IP when Ollama is running on the host.
Dev Notes
Detection Logic
Each attack specifies how to score it: refusal-phrase matching, keyword presence in the response, or both. The detect_vulnerability function returns vulnerable, safe, or uncertain.
WSL2 Support
WSL2 can't reach Windows localhost directly. The backend detects WSL via /proc/version and resolves the Windows host IP from the default gateway, then tests connectivity before choosing which URL to use.
Extending Attacks
All attacks live in test_attacks.yaml. Add a new entry with id, category, severity, prompt, and a detection rule — the backend picks it up with no code changes.