JSONL vs JSON vs NDJSON vs CSV vs Parquet

Format comparison guide · updated 21 May 2026 · overview · spec · examples · best practices

Five formats that look interchangeable but solve different problems. JSON is for documents. JSONL and NDJSON are streamable record sets. CSV is for spreadsheets. Parquet is for analytical warehouses. Picking right matters — converting later is cheap with tools like ours, but discovering you picked wrong after writing 50 GB is painful.

The 30-second version

JSON — one document, often hierarchical. Configs, API responses, single-record exchanges.
JSONL / NDJSON — many independent records, one per line. Logs, datasets, streams, ML training data.
CSV — tabular rows, all the same flat shape. Spreadsheets, simple exchange, BI imports.
Parquet — typed columnar storage with metadata. Analytical workloads at scale.

If your data has nested structure and you want to stream/append, JSONL wins. If it's a single rich document, regular JSON. If it's a flat table for non-developers, CSV. If it's millions of rows for analytics, Parquet.

JSONL vs JSON

Aspect	JSON	JSONL
Top-level shape	One document (object, array, value)	Sequence of independent values, one per line
Streamable	No — must read entire document	Yes — parse line by line
Appendable	No — would need to rewrite or hack `]`	Yes — append a line
Splittable for parallel processing	No	Yes — any `\n` is a safe boundary
Memory cost	Whole document in RAM	One record at a time
Resilient to corruption	One bad byte = whole file unparseable	One bad line = skip and continue
Diff-friendly (git)	Poor — formatting changes look like content changes	Excellent — line-based diffs work natively
Pretty-printing	Multi-line, indented	One record per line; pretty-printing breaks the format
Used for	Configs, API responses, single records	Datasets, logs, ML training, ETL streams

Round-trip both ways with JSON → JSONL and JSONL → JSON.

When to pick JSON over JSONL

You're sending a single response over HTTP that fits comfortably in memory.
The data is a document, not a set — e.g. a configuration file, a search response with nested facets, an OpenAPI spec.
You need to cross-reference between records inside the same payload (a JSON array makes this easy; a JSONL stream doesn't).
The consumer is a browser or a tool that expects a single document (JSON.parse on the whole body).

When to pick JSONL over JSON

The data is a stream or batch of records, all the same shape.
The file may not fit in memory.
You want to append records over time without rewriting the file.
You want line-based Unix tooling (head, tail, grep, wc, jq) to work directly.
You want partial failure tolerance: one bad record shouldn't poison the rest.
You're feeding an ML training pipeline, log aggregator, or warehouse importer — these almost universally expect JSONL.

JSONL vs NDJSON

Same format, different names. NDJSON is the spelling used in JavaScript / Node circles and pushed by ndjson.org. JSONL is the spelling used in Python, ML, and data engineering, and standardised by jsonlines.org. There is no functional difference.

If you receive .ndjson and your tool expects .jsonl (or vice versa), just rename the file. Both refer to "one JSON value per line, separated by \n." See the overview's naming section for history.

JSONL vs CSV

Aspect	JSONL	CSV
Schema	Self-describing per record (keys present)	External — first row is usually the header
Nested data	Native — objects and arrays inside records	Not supported — must flatten via dot-keys or JSON-in-cells
Types	Distinguishes string / number / bool / null	Everything is a string; types must be re-parsed
Encoding	UTF-8	UTF-8 in theory, often Windows-1252 or weird locales in practice
Quoting rules	Strict JSON quoting	RFC 4180, but in practice quoting is everyone's footgun
Heterogeneous shapes	Possible (not recommended)	Impossible — every row must have the same column count
File size	Larger — keys repeat every record	Smaller — header once, then values
Spreadsheet-friendly	No — Excel can't open JSONL directly	Yes — Excel, Sheets, Numbers all open CSV
Streamable / appendable	Yes	Yes

Round-trip with CSV → JSONL and JSONL → CSV.

When to pick CSV

Your audience is non-developers who'll open the file in Excel or Sheets.
The data is genuinely flat (no nested objects, no arrays inside cells).
The file is going into a BI tool that prefers CSV imports.
File size matters more than self-description.

When to pick JSONL over CSV

You have nested data and don't want to flatten lossily.
You need type distinctions (a number that looks like a leading-zero string ID — CSV can't represent both safely).
Your downstream is a programming language, not a spreadsheet.
You want resilience to one row being malformed.

JSONL vs Parquet

Aspect	JSONL	Parquet
Storage	Row-oriented text	Column-oriented binary
Human-readable	Yes — open in any text editor	No — need a Parquet reader
Schema	Implicit per record	Embedded in file metadata
Compression ratio	Good with gzip/zstd	Excellent — column compression + dictionary encoding
Read pattern	Full file scan	Read only needed columns
Append	Trivial — write a new line	Hard — file is structured; multi-file partitions instead
Streaming	Natural fit	Designed for files at rest, not streams
Used for	Logs, fine-tune datasets, ETL transport	Analytical queries (DuckDB, Spark, Athena, BigQuery)

Convert with JSONL ↔ Parquet (DuckDB-WASM in your browser).

The typical workflow

Many data pipelines use both: JSONL for the landing zone / staging area / streaming transport (append-friendly, debuggable in a text editor), then Parquet for the queryable storage after a daily or hourly compaction (columnar reads, dictionary compression). Tools like dbt, Airbyte, and Snowflake's external tables understand this pattern out of the box.

Decision flowchart

Is the data ONE document (config, single API response)?
  └── Yes → JSON
  └── No  → continue

Are records flat with no nesting?
  └── Yes → audience non-developers?
              └── Yes → CSV
              └── No  → continue

Is the data at-rest for analytical queries (warehouse, BI)?
  └── Yes → Parquet
  └── No  → JSONL

Side-by-side: the same data in each format

JSON

{
  "users": [
    {"id": 1, "name": "Ada",     "active": true,  "tags": ["math", "code"]},
    {"id": 2, "name": "Babbage", "active": false, "tags": ["engine"]}
  ]
}

JSONL

{"id":1,"name":"Ada","active":true,"tags":["math","code"]}
{"id":2,"name":"Babbage","active":false,"tags":["engine"]}

CSV (lossy — arrays flattened)

id,name,active,tags
1,Ada,true,"math|code"
2,Babbage,false,engine

Parquet (binary; logical schema shown)

id      : int64
name    : utf8
active  : bool
tags    : list<utf8>

Row 0: 1, "Ada",     true,  ["math", "code"]
Row 1: 2, "Babbage", false, ["engine"]

Performance: what you can expect

Rough numbers on a modern laptop (M-series Mac, 16 GB RAM) for a 1 GB dataset of nested user records:

Format	File size	Read 100M rows	Filter on one column
JSONL (raw)	1.0 GB	~12 s	~12 s (full scan)
JSONL + gzip	~150 MB	~16 s	~16 s
JSONL + zstd -3	~120 MB	~10 s	~10 s
Parquet (snappy)	~180 MB	~6 s	~0.3 s (column-pruned)

The headline: Parquet wins for analytical filters because it can skip most of the file. JSONL wins for streaming, debuggability, and append-only workloads.

FAQ

Can I use JSONL as an HTTP response body?

Yes — set Content-Type: application/x-ndjson and use chunked transfer encoding. Each chunk should end at a line boundary so the client can parse complete records as they arrive. This is the standard streaming pattern for AI APIs (OpenAI, Anthropic) when returning many results.

Do databases support JSONL natively?

Many do — BigQuery, Snowflake, DuckDB, and Athena all import JSONL directly. Postgres and MySQL don't have a native loader but accept it via COPY with a JSON-per-line input.

Is JSONL slower than Parquet?

For analytical scans, yes — Parquet's column pruning and dictionary encoding give big wins on selective queries. For streaming, appending, debugging, and small-to-medium files, JSONL is faster end-to-end because it skips serialization overhead.

Can I store binary data (images, audio) in JSONL?

Only if you base64-encode it. JSON has no binary type. For real binary blobs, store them outside the JSONL and reference by path or URL — base64 adds 33% overhead and breaks streaming.

— S., [email protected]