JSONL Dataset Stats
100% client-side. Your data never leaves the page.
Compute stats
Drop a .jsonl file here, or
Overview
Top-level fields
Parse errors
Dataset stats
Drop a JSONL file and get the numbers you'd otherwise compute by hand: row counts, duplicate rate, parse-error count, and a per-field breakdown of fill rate, types, distinct value count, and top values. Useful for sanity-checking an export before you ingest it.
— S., [email protected]
What it measures
- Total rows — non-blank lines in the file.
- Valid rows — lines that parse as JSON. The rest go to the parse-error list.
- Duplicate rows — lines that produce the same canonical (sorted-keys) JSON as a previously-seen row. Order of keys is ignored, so
{"a":1,"b":2}and{"b":2,"a":1}are treated as duplicates. - Per-field fill rate — what fraction of records have this top-level key. A field at 100% is "required"; a field at 3% is probably an outlier or legacy column.
- Type histogram — how often each value type (string / integer / number / boolean / null / array / object) appears for the field. Mixed types are usually a sign of inconsistent producers.
- Top values — the three most frequent string/number/boolean values per field, with counts. Skipped for fields with more than 500 distinct values to keep the table small.
How to use it
- Paste the JSONL into the box, or drop a
.jsonlfile onto the drop zone. - Click Compute stats.
- Read the Overview table for the high-level counts.
- Skim the Top-level fields table — fields with low fill rate or surprising types are usually the interesting ones.
- If anything failed to parse, the Parse errors list shows the line, column, source, and a suggested fix. Run them through the auto-fixer to clean up.
What it doesn't do
- Doesn't recurse into nested objects. Only top-level keys appear in the field table. For nested-shape inference, use the schema inferrer — it walks every level and emits a Draft-07 schema.
- Doesn't compute correlations or distributions. The aim is "fast sanity-check on a fresh dump", not exploratory analysis.
- Doesn't sample. It walks every line, so on very large files (100 MB+) it can be slow — your browser's RAM is the limit.
Related tools
Frequently asked questions
Why is the duplicate count higher than I expected?
Duplicates are computed on the canonical (sorted-keys) JSON form, not the raw line text. Two records that differ only in key order or whitespace count as duplicates. If you actually want byte-for-byte duplicates, the deduplicator has a "literal line" mode.
What about fields that are nested deep?
Use the schema inferrer instead — it walks every level and reports types and required-ness at any depth.