jsonlkit.com
JSONL (JSON Lines) utilities, in the browser
Say hi →

System-prompt Deduplicator

updated 11 May 2026 · for fine-tune dataset hygiene

100% client-side. No upload.

Dedupe

Drop a .jsonl file here, or

System-prompt Deduplicator

Find rows in a fine-tune dataset that share the same user prompt but produce different assistant outputs — a quiet but expensive data-quality bug. Two examples with "Translate Hello to French" mapping to Hola in one row and Bonjour in another teach the model to be inconsistent. This tool finds them, groups them, and lets you keep the first, the last, or just report and leave the file unchanged. Works for OpenAI, Anthropic, Gemini, and ShareGPT shapes. 100% in-browser.

What "duplicate" means here

This is not whole-line deduplication — for that, use the JSONL Deduplicator. This tool dedupes on the prompt side of each example, ignoring assistant outputs. Three ways to define "same prompt":

Normalization

Tiny whitespace differences will hide otherwise-identical prompts. Trim & collapse normalizes runs of whitespace to a single space and strips ends — the safest default. Lowercase goes further and is useful if you have inconsistent casing across rows.

Report mode

Sometimes you want to see what's duplicated before deciding to drop it — maybe the duplicates are intentional (e.g. paraphrased outputs for diversity). Pick Report only to get a list of duplicate groups with line numbers and their assistant outputs, without mutating the file.

Tips & common pitfalls

Related tools