jsonlkit.com
JSONL (JSON Lines) utilities, in the browser
Say hi →

JSONL vs JSON vs NDJSON vs CSV vs Parquet

Format comparison guide · updated 21 May 2026 · overview · spec · examples · best practices

Five formats that look interchangeable but solve different problems. JSON is for documents. JSONL and NDJSON are streamable record sets. CSV is for spreadsheets. Parquet is for analytical warehouses. Picking right matters — converting later is cheap with tools like ours, but discovering you picked wrong after writing 50 GB is painful.

The 30-second version

If your data has nested structure and you want to stream/append, JSONL wins. If it's a single rich document, regular JSON. If it's a flat table for non-developers, CSV. If it's millions of rows for analytics, Parquet.

JSONL vs JSON

AspectJSONJSONL
Top-level shapeOne document (object, array, value)Sequence of independent values, one per line
StreamableNo — must read entire documentYes — parse line by line
AppendableNo — would need to rewrite or hack ]Yes — append a line
Splittable for parallel processingNoYes — any \n is a safe boundary
Memory costWhole document in RAMOne record at a time
Resilient to corruptionOne bad byte = whole file unparseableOne bad line = skip and continue
Diff-friendly (git)Poor — formatting changes look like content changesExcellent — line-based diffs work natively
Pretty-printingMulti-line, indentedOne record per line; pretty-printing breaks the format
Used forConfigs, API responses, single recordsDatasets, logs, ML training, ETL streams

Round-trip both ways with JSON → JSONL and JSONL → JSON.

When to pick JSON over JSONL

When to pick JSONL over JSON

JSONL vs NDJSON

Same format, different names. NDJSON is the spelling used in JavaScript / Node circles and pushed by ndjson.org. JSONL is the spelling used in Python, ML, and data engineering, and standardised by jsonlines.org. There is no functional difference.

If you receive .ndjson and your tool expects .jsonl (or vice versa), just rename the file. Both refer to "one JSON value per line, separated by \n." See the overview's naming section for history.

JSONL vs CSV

AspectJSONLCSV
SchemaSelf-describing per record (keys present)External — first row is usually the header
Nested dataNative — objects and arrays inside recordsNot supported — must flatten via dot-keys or JSON-in-cells
TypesDistinguishes string / number / bool / nullEverything is a string; types must be re-parsed
EncodingUTF-8UTF-8 in theory, often Windows-1252 or weird locales in practice
Quoting rulesStrict JSON quotingRFC 4180, but in practice quoting is everyone's footgun
Heterogeneous shapesPossible (not recommended)Impossible — every row must have the same column count
File sizeLarger — keys repeat every recordSmaller — header once, then values
Spreadsheet-friendlyNo — Excel can't open JSONL directlyYes — Excel, Sheets, Numbers all open CSV
Streamable / appendableYesYes

Round-trip with CSV → JSONL and JSONL → CSV.

When to pick CSV

When to pick JSONL over CSV

JSONL vs Parquet

AspectJSONLParquet
StorageRow-oriented textColumn-oriented binary
Human-readableYes — open in any text editorNo — need a Parquet reader
SchemaImplicit per recordEmbedded in file metadata
Compression ratioGood with gzip/zstdExcellent — column compression + dictionary encoding
Read patternFull file scanRead only needed columns
AppendTrivial — write a new lineHard — file is structured; multi-file partitions instead
StreamingNatural fitDesigned for files at rest, not streams
Used forLogs, fine-tune datasets, ETL transportAnalytical queries (DuckDB, Spark, Athena, BigQuery)

Convert with JSONL ↔ Parquet (DuckDB-WASM in your browser).

The typical workflow

Many data pipelines use both: JSONL for the landing zone / staging area / streaming transport (append-friendly, debuggable in a text editor), then Parquet for the queryable storage after a daily or hourly compaction (columnar reads, dictionary compression). Tools like dbt, Airbyte, and Snowflake's external tables understand this pattern out of the box.

Decision flowchart

Is the data ONE document (config, single API response)?
  └── Yes → JSON
  └── No  → continue

Are records flat with no nesting?
  └── Yes → audience non-developers?
              └── Yes → CSV
              └── No  → continue

Is the data at-rest for analytical queries (warehouse, BI)?
  └── Yes → Parquet
  └── No  → JSONL

Side-by-side: the same data in each format

JSON

{
  "users": [
    {"id": 1, "name": "Ada",     "active": true,  "tags": ["math", "code"]},
    {"id": 2, "name": "Babbage", "active": false, "tags": ["engine"]}
  ]
}

JSONL

{"id":1,"name":"Ada","active":true,"tags":["math","code"]}
{"id":2,"name":"Babbage","active":false,"tags":["engine"]}

CSV (lossy — arrays flattened)

id,name,active,tags
1,Ada,true,"math|code"
2,Babbage,false,engine

Parquet (binary; logical schema shown)

id      : int64
name    : utf8
active  : bool
tags    : list<utf8>

Row 0: 1, "Ada",     true,  ["math", "code"]
Row 1: 2, "Babbage", false, ["engine"]

Performance: what you can expect

Rough numbers on a modern laptop (M-series Mac, 16 GB RAM) for a 1 GB dataset of nested user records:

FormatFile sizeRead 100M rowsFilter on one column
JSONL (raw)1.0 GB~12 s~12 s (full scan)
JSONL + gzip~150 MB~16 s~16 s
JSONL + zstd -3~120 MB~10 s~10 s
Parquet (snappy)~180 MB~6 s~0.3 s (column-pruned)

The headline: Parquet wins for analytical filters because it can skip most of the file. JSONL wins for streaming, debuggability, and append-only workloads.

FAQ

Can I use JSONL as an HTTP response body?

Yes — set Content-Type: application/x-ndjson and use chunked transfer encoding. Each chunk should end at a line boundary so the client can parse complete records as they arrive. This is the standard streaming pattern for AI APIs (OpenAI, Anthropic) when returning many results.

Do databases support JSONL natively?

Many do — BigQuery, Snowflake, DuckDB, and Athena all import JSONL directly. Postgres and MySQL don't have a native loader but accept it via COPY with a JSON-per-line input.

Is JSONL slower than Parquet?

For analytical scans, yes — Parquet's column pruning and dictionary encoding give big wins on selective queries. For streaming, appending, debugging, and small-to-medium files, JSONL is faster end-to-end because it skips serialization overhead.

Can I store binary data (images, audio) in JSONL?

Only if you base64-encode it. JSON has no binary type. For real binary blobs, store them outside the JSONL and reference by path or URL — base64 adds 33% overhead and breaks streaming.

— S., [email protected]