What is JSONL?
JSONL — also called JSON Lines, NDJSON, or LDJSON — is a text format where every line is one independent JSON value. No enclosing array, no commas between records, no trailing punctuation. Each line is parsed on its own, which makes JSONL ideal for streaming, appending, log files, datasets larger than RAM, and anything that gets read line-by-line.
The shape, in three lines
{"id":1,"name":"Ada","email":"[email protected]"}
{"id":2,"name":"Babbage","email":"[email protected]"}
{"id":3,"name":"Hopper","email":"[email protected]"}
That is a complete, valid JSONL file. Three records, one per line. Compare with the same data as regular JSON:
[
{"id": 1, "name": "Ada", "email": "[email protected]"},
{"id": 2, "name": "Babbage", "email": "[email protected]"},
{"id": 3, "name": "Hopper", "email": "[email protected]"}
]
Same data, very different properties. The JSON version has to be read entirely into memory before any record is accessible. The JSONL version can be read one line at a time, processed, and discarded — which is exactly what you want for files that don't fit in memory, for log streams, and for fine-tuning datasets that can be terabytes.
Quick facts
| Property | Value |
|---|---|
| Format name | JSON Lines |
| Common nicknames | JSONL, NDJSON, LDJSON, line-delimited JSON |
| File extensions | .jsonl, .ndjson, sometimes .json |
| MIME type | application/x-ndjson (de-facto); also application/jsonl in some ecosystems |
| Text encoding | UTF-8 (mandatory in the spec, almost universal in practice) |
| Line separator | \n (LF). Many parsers also accept \r\n (CRLF). |
| Final-line newline | Trailing \n after the last record is recommended but not required |
| Comments | Not allowed (same as JSON) |
| Top-level type per line | Any JSON value: object, array, string, number, true/false/null. Objects are by far the most common. |
Where the format came from
JSONL emerged organically from the need to stream JSON. In 2013 the JSON Lines website codified the conventions that had already been used for years in log shippers (Fluentd, Logstash), search engines (Elasticsearch's _bulk API), and ETL pipelines. NDJSON (Newline-Delimited JSON) is the same format under a different brand promoted by ndjson.org and the npm package family. LDJSON is the same idea with a different acronym. There is no functional difference between the three names — most parsers accept any. See JSONL vs NDJSON for the full naming-and-history breakdown.
Why people pick JSONL
- Streamable. You can start processing record 1 while record 1 000 000 is still being written. Regular JSON requires the closing
]before any record is parseable. - Appendable. Adding a record is just appending a line. A regular JSON array would have to be rewritten or hacked with comma juggling.
- Resilient. One corrupted line doesn't poison the rest of the file. The parser can skip it and continue.
- Splittable. Any line boundary is a valid split point — convenient for MapReduce, Spark, and BigQuery's external tables.
- Diff-friendly. Line-based diff tools (
git diff,diff,delta) understand JSONL natively. Diffing a single regular-JSON blob is messy. - Plays well with Unix.
head,tail,wc -l,grep,sort,uniq, andjqall work line-by-line, which is exactly how JSONL is structured.
Where JSONL is used today
- LLM fine-tuning datasets. OpenAI, Anthropic, Google Gemini, Meta Llama, and Mistral all accept JSONL as the canonical training format. Each row is one conversation or example.
- Application logs. Structured loggers (pino, winston, slog, zap, structlog) emit one JSON object per line so log aggregators can index without re-parsing.
- Search/ingest APIs. Elasticsearch and OpenSearch
_bulk, Algolia batch indexing, Solrupdate/json: all line-delimited. - Data lakes and warehouses. BigQuery and Snowflake accept JSONL as a native bulk-import format. AWS Athena and DuckDB both read it directly.
- Event streams. Kafka, Kinesis, and Pub/Sub messages serialized as JSON arrive as one-per-line when persisted.
- HuggingFace datasets. Most modern dataset releases ship as compressed
.jsonl.gzor.jsonl.zst. - HTTP streaming. NDJSON is the standard payload for Server-Sent Events alternatives and chunked-response APIs.
The rules in one paragraph
One JSON value per line. Lines separated by \n. Each line, on its own, is valid JSON. The file as a whole is not a single JSON document — you cannot parse it with a JSON parser; you have to split on newlines first and parse each line. Blank lines are not in the spec but most parsers treat them as no-ops. There is no requirement that every record have the same shape (heterogeneous JSONL is valid), but in practice every consumer assumes a consistent schema and many will fail noisily if you mix shapes.
That's the format. For the formal rules and the edge cases parsers disagree on, see the JSONL specification. For a side-by-side breakdown of when to pick JSONL vs regular JSON vs CSV vs Parquet, see JSONL vs JSON vs NDJSON.
The "JSONL vs NDJSON vs LDJSON" question
All three names refer to the same format. In day-to-day usage:
- JSONL / JSON Lines — the most common name in the Python, ML, and data-engineering world. The extension
.jsonlis what HuggingFace, OpenAI, and most dataset providers use. - NDJSON — the name preferred in JavaScript and the npm ecosystem. The extension
.ndjsonshows up in logging and observability tools. See What is NDJSON. - LDJSON — historical synonym, rarely used today.
If you receive a file labeled .jsonl, .ndjson, or .ldjson, treat them the same. Trying to convert between them is a no-op — use the NDJSON → JSONL normalizer if you just need to rename and tidy line endings.
A 60-second tour with our tools
Everything below runs in your browser — nothing uploads:
- Viewer — paste or drop a JSONL file and see every record laid out with a tree view.
- Validator — flags every malformed line with its exact error.
- Auto-fixer — repairs trailing commas, single quotes, smart quotes, BOMs, comments, and other common corruptions.
- JSONL → CSV — flatten nested keys and download a spreadsheet-ready file.
- jq query playground — slice, filter, and reshape with jq syntax, in-browser.
- Schema inferrer — emit a JSON Schema describing every field of an unfamiliar dataset.
FAQ
Is JSONL the same as JSON?
No. JSON is a single document. JSONL is a stream of independent JSON documents, one per line, with no enclosing array. A JSONL file is not parseable as a JSON document; you must split on \n first and then parse each line.
What file extension should I use?
.jsonl is the most widely recognized today, including by OpenAI, HuggingFace, and most data-engineering tools. .ndjson is also fine and commonly used in JavaScript ecosystems. Avoid plain .json for line-delimited content — it leads to parsers assuming a single document and failing.
What MIME type should the server send?
application/x-ndjson is the most widely accepted and what tools like curl and httpie recognize. Some ecosystems use application/jsonl or application/json-lines. text/plain works in a pinch but loses type information.
Can JSONL be pretty-printed?
No — pretty-printed JSON contains newlines inside records, which would break the "one JSON per line" rule. If you have a pretty-printed JSON object spanning multiple lines, it is not JSONL. Use the formatter to flip between pretty-printed JSON and JSONL.
Is a single JSON object also a valid JSONL file?
Yes, if it's all on one line and followed by a newline. A JSONL file with one record is just one line of JSON. The format doesn't require multiple records.
What about gzipped JSONL?
.jsonl.gz and .jsonl.zst are common for large datasets. JSONL compresses very well because repeated keys (every record likely has the same keys) deduplicate efficiently. Gzip typically shrinks JSONL by 80–95%. Most tooling — DuckDB, jq, Python's gzip module — reads compressed JSONL natively.
— S., [email protected]