JSONL Deduplicator
100% client-side. No upload.
Deduplicate
JSONL Deduplicator
Strip duplicate records out of a JSONL file — by full line, by canonical object (key order independent), or by a specific key path like id or user.email.
Three ways to match
Most "duplicates" in a JSONL file are not actually byte-identical lines. The same record exported twice may have keys in a different order, or one source might add a timestamp the other doesn't have. Pick the matching strategy that fits the cleanup you actually need.
Full line
Compares lines as raw strings. The fastest option, but it will treat
{"a":1,"b":2} and {"b":2,"a":1} as different. Use this when you
trust the source to emit records consistently — typically logs from a single producer.
Canonical object
Parses each line as JSON, sorts keys recursively, and compares the canonical form. Two records with the same data are treated as equal regardless of how the writer ordered keys. Slower than line compare, but it catches the "I joined two exports" class of duplicate.
Key path
Compares only the value at a specific path. Use id for top-level keys, or
dotted paths like user.email or meta.request_id for nested
fields. Records where the path is missing are passed through untouched (treated as not
participating in dedup) so you don't accidentally collapse them all into one row.
Keep first vs. keep last
Keep first walks the file top-to-bottom and discards any record whose signature has already been seen. Use this when older records are the source of truth.
Keep last retains the most recent occurrence of each signature. Use this for upserts — when a later record represents an update to an earlier one. Output order follows the position of the kept record.
Tips & common pitfalls
- Numeric keys are exact.
1and1.0become the same number after parse, so canonical and key-path modes treat them as equal. Full-line mode does not. - Missing keys aren't merged. In key-path mode, records missing the chosen path are kept as-is, not collapsed together. If you want them dropped, use the JSONL Validator first.
- Dedupe before fine-tuning. OpenAI fine-tune jobs charge per training token; running this before the OpenAI Fine-Tune Validator can shave real money off a run.
- Dedupe before splitting. If you're going to feed chunks to Splitter / Merger, do it after dedup so each chunk is uniformly sized.
Example
Input — same id twice with different ordering:
{"id":1,"name":"alice","ts":1000}
{"id":2,"name":"bob","ts":1001}
{"name":"alice","ts":1000,"id":1}
{"id":3,"name":"carol","ts":1002}
Match by Canonical object, keep first → 3 records. Match by Full line → all 4 kept (key order differs). Match by Key path id → 3 records.
Frequently asked questions
How big a file can it handle?
Limited only by browser memory. Tens of millions of short lines work in modern browsers; if your file is gigabytes, do dedup with a CLI (sort -u for byte-identical lines, or jq piped through awk for key-based).
Does ordering matter for canonical compare?
No — that's the whole point of canonical mode. Object keys are sorted lexically before comparison; arrays preserve their order (because order in arrays is semantically meaningful in JSON).
Can I dedupe on multiple keys?
Not directly. Workaround: pre-process with JSONL → CSV projecting just the keys you care about and dedup the resulting CSV in a spreadsheet, or use canonical mode after stripping irrelevant fields.
Is my data sent to a server?
Never. Everything runs in your browser. The privacy policy is here.