Remove Duplicate JSON / JSONL Records

updated 16 May 2026

JSONL deduplicator. Remove duplicate records by full line, canonical object, or a chosen key path (such as id or user.email). Keep first or last occurrence. Up to 1 GB, in your browser.

100% client-side. No upload.

⌨ Millions of records? jsonlkit dedupe --key id data.jsonl — 40-byte sha1 per record in memory.

Deduplicate

Match by: Key path: Keep:

Drop a .jsonl file here, or

Input

Output

JSONL Deduplicator

Strip duplicate records out of a JSONL file — by full line, by canonical object (key order independent), or by a specific key path like id or user.email.

— S., [email protected]

Three ways to match

Most "duplicates" in a JSONL file are not actually byte-identical lines. The same record exported twice may have keys in a different order, or one source might add a timestamp the other doesn't have. Pick the matching strategy that fits the cleanup you actually need.

Full line

Compares lines as raw strings. The fastest option, but it will treat {"a":1,"b":2} and {"b":2,"a":1} as different. Use this when you trust the source to emit records consistently — typically logs from a single producer.

Canonical object

Parses each line as JSON, sorts keys recursively, and compares the canonical form. Two records with the same data are treated as equal regardless of how the writer ordered keys. Slower than line compare, but it catches the "I joined two exports" class of duplicate.

Key path

Compares only the value at a specific path. Use id for top-level keys, or dotted paths like user.email or meta.request_id for nested fields. Records where the path is missing are passed through untouched (treated as not participating in dedup) so you don't accidentally collapse them all into one row.

Keep first vs. keep last

Keep first walks the file top-to-bottom and discards any record whose signature has already been seen. Use this when older records are the source of truth.

Keep last retains the most recent occurrence of each signature. Use this for upserts — when a later record represents an update to an earlier one. Output order follows the position of the kept record.

Tips & common pitfalls

Numeric keys are exact. 1 and 1.0 become the same number after parse, so canonical and key-path modes treat them as equal. Full-line mode does not.
Missing keys aren't merged. In key-path mode, records missing the chosen path are kept as-is, not collapsed together. If you want them dropped, use the JSONL Validator first.
Dedupe before fine-tuning. OpenAI fine-tune jobs charge per training token; running this before the OpenAI Fine-Tune Validator can shave real money off a run.
Dedupe before splitting. If you're going to feed chunks to Splitter / Merger, do it after dedup so each chunk is uniformly sized.

Example

Input — same id twice with different ordering:

{"id":1,"name":"alice","ts":1000}
{"id":2,"name":"bob","ts":1001}
{"name":"alice","ts":1000,"id":1}
{"id":3,"name":"carol","ts":1002}

Match by Canonical object, keep first → 3 records. Match by Full line → all 4 kept (key order differs). Match by Key path id → 3 records.

Frequently asked questions

How big a file can it handle?

Limited only by browser memory. Tens of millions of short lines work in modern browsers; if your file is gigabytes, do dedup with a CLI (sort -u for byte-identical lines, or jq piped through awk for key-based).

Does ordering matter for canonical compare?

No — that's the whole point of canonical mode. Object keys are sorted lexically before comparison; arrays preserve their order (because order in arrays is semantically meaningful in JSON).

Can I dedupe on multiple keys?

Not directly. Workaround: pre-process with JSONL → CSV projecting just the keys you care about and dedup the resulting CSV in a spreadsheet, or use canonical mode after stripping irrelevant fields.

Is my data sent to a server?

Never. Everything runs in your browser. The privacy policy is here.