JSONL Dataset Stats

updated 4 May 2026

100% client-side. Your data never leaves the page.

Compute stats

Drop a .jsonl file here, or

Overview

Top-level fields

Parse errors

Dataset stats

Drop a JSONL file and get the numbers you'd otherwise compute by hand: row counts, duplicate rate, parse-error count, and a per-field breakdown of fill rate, types, distinct value count, and top values. Useful for sanity-checking an export before you ingest it.

— S., [email protected]

What it measures

Total rows — non-blank lines in the file.
Valid rows — lines that parse as JSON. The rest go to the parse-error list.
Duplicate rows — lines that produce the same canonical (sorted-keys) JSON as a previously-seen row. Order of keys is ignored, so {"a":1,"b":2} and {"b":2,"a":1} are treated as duplicates.
Per-field fill rate — what fraction of records have this top-level key. A field at 100% is "required"; a field at 3% is probably an outlier or legacy column.
Type histogram — how often each value type (string / integer / number / boolean / null / array / object) appears for the field. Mixed types are usually a sign of inconsistent producers.
Top values — the three most frequent string/number/boolean values per field, with counts. Skipped for fields with more than 500 distinct values to keep the table small.

How to use it

Paste the JSONL into the box, or drop a .jsonl file onto the drop zone.
Click Compute stats.
Read the Overview table for the high-level counts.
Skim the Top-level fields table — fields with low fill rate or surprising types are usually the interesting ones.
If anything failed to parse, the Parse errors list shows the line, column, source, and a suggested fix. Run them through the auto-fixer to clean up.

What it doesn't do

Doesn't recurse into nested objects. Only top-level keys appear in the field table. For nested-shape inference, use the schema inferrer — it walks every level and emits a Draft-07 schema.
Doesn't compute correlations or distributions. The aim is "fast sanity-check on a fresh dump", not exploratory analysis.
Doesn't sample. It walks every line, so on very large files (100 MB+) it can be slow — your browser's RAM is the limit.

Related tools

Frequently asked questions

Why is the duplicate count higher than I expected?

Duplicates are computed on the canonical (sorted-keys) JSON form, not the raw line text. Two records that differ only in key order or whitespace count as duplicates. If you actually want byte-for-byte duplicates, the deduplicator has a "literal line" mode.

What about fields that are nested deep?

Use the schema inferrer instead — it walks every level and reports types and required-ness at any depth.