JSONL Specification
JSONL is not standardized by an RFC or ISO, but it has a de-facto specification with strong consensus across implementations. This page documents the rules in formal terms, calls out the spots where parsers diverge, and explains what conformant producers and consumers should do.
The five formal rules
- The file is a sequence of records, separated by line breaks.
- Each record is exactly one valid JSON value (RFC 8259), with no embedded unescaped newlines.
- The line break is
\n(U+000A LINE FEED). Implementations should also accept\r\n(CRLF) on input. - The file's text encoding is UTF-8 without a byte-order mark (BOM).
- A trailing newline after the last record is recommended but not required.
Grammar
In ABNF, with rules borrowed from RFC 8259 (JSON):
jsonl-file = *(record LF) [record [LF]]
record = json-value ; one RFC 8259 value, encoded on a single line
LF = %x0A ; the line feed character
json-value = false / null / true / object / array / number / string
; (full definition from RFC 8259 §3)
The grammar deliberately allows zero records (an empty file is valid JSONL) and allows the file to end without a trailing newline.
What "valid JSON per line" actually means
Each line, in isolation, must parse with JSON.parse (JavaScript), json.loads (Python), or any other RFC 8259-conforming parser. That means:
- Strings are double-quoted; single quotes are not allowed.
- Object keys are strings — no unquoted bare-word keys.
- No trailing commas inside objects or arrays.
- No comments (
//or/* */). - No
NaN,Infinity, or-Infinity— these are not valid JSON numbers. Usenullor a sentinel string. - No undefined; use
nullwhen a field is missing. - Numbers cannot have leading zeros (
0123is invalid;0.123is fine). - Unicode escapes use
\uXXXXhex; for codepoints above U+FFFF, surrogate pairs are required.
Encoding
JSONL is UTF-8. The historical JSON spec allowed UTF-16 and UTF-32 too, but every modern producer and consumer uses UTF-8. Tools that emit BOM (EF BB BF at the start of the file) cause more problems than they solve — many parsers treat the BOM as part of the first record's first character. Do not emit a BOM. Consumers should be tolerant: strip a leading BOM silently if present.
Non-ASCII characters are valid two ways:
- Encoded directly as their UTF-8 bytes —
"Ω"is two bytesCE A9. - Escaped as
\uXXXX—"Ω".
Both are equivalent. Most producers emit raw UTF-8 for readability and only escape control characters and quotes; ASCII-only output is achievable by escaping every non-ASCII codepoint.
Line endings
| Sequence | Spec status | What conformant parsers do |
|---|---|---|
\n (LF, 0x0A) | Canonical | Always accept |
\r\n (CRLF, 0x0D 0x0A) | Tolerated on input | Accept, normalize to \n on output |
\r alone (CR, 0x0D) | Discouraged | Some parsers split on it (Python's splitlines), most don't. Do not produce. |
| U+2028 / U+2029 (line/paragraph separators) | Not a record boundary | If inside a JSON string, must be escaped to
/
(per RFC 8259 §7). |
Producers should always emit \n. Consumers should accept \n and \r\n. The auto-fixer normalizes line endings to LF.
Whitespace and blank lines
The spec is silent on blank lines, but in practice:
- Trailing whitespace on a line — silently ignored by every conformant parser.
- Blank lines (containing only whitespace, or empty) — not in the spec; most parsers silently skip them. Some (strict) parsers raise an error.
- Leading whitespace inside a record — fine; it's part of the JSON value's own whitespace handling.
- Comments preceded by
#or//— never allowed. Strip before parsing.
To stay maximally compatible: produce no blank lines, and consume tolerantly (skip whitespace-only lines).
Record content rules
Any RFC 8259 value is allowed per line. In practice:
- Object per line (95%+ of real-world JSONL). One self-contained record. This is what every ETL tool, log shipper, and ML training format assumes.
- Array per line. Valid but rare. Sometimes used for fixed-shape rows (
[1, "Ada", "[email protected]"]) where the schema is known. - Scalar per line (number, string, bool, null). Valid but uncommon. Sometimes seen for token-per-line LLM outputs or single-value log fields.
- Mixed types across lines. Valid per the spec. Almost no consumer expects this — strict tools will reject it, ML pipelines will produce wrong results. Stick to one shape per file.
The maximum size of a single record is bounded by your parser. Most modern parsers handle multi-megabyte records, but if you have a very large nested object, the record may not fit one line for readability — split into multiple smaller records keyed by parent ID instead.
MIME type
There is no IANA-registered MIME type for JSONL. The de-facto choices, in order of recognition:
application/x-ndjson— most widely recognized by HTTP clients (curl, httpie, Postman), libraries, and CDNs. Recommended for HTTP.application/jsonl— emerging convention used by some newer APIs.application/json-lines— used by a handful of services.application/json— do not use this for JSONL. Clients will try to parse the whole body as a single document and fail.text/plain— works in a pinch (downloads as a text file) but loses type info.
For HTTP streaming (chunked transfer-encoded responses), application/x-ndjson is the standard.
Streaming and chunked transfers
One of JSONL's main advantages is streamability. For HTTP:
- Use
Transfer-Encoding: chunked(or HTTP/2's framing) to deliver records as they become available. - Set
Content-Type: application/x-ndjson. - Each chunk should end on a
\nboundary so the consumer can parse complete records as bytes arrive. - Producers should flush after each record (or every N records) to avoid the consumer waiting for buffer fills.
For file IO: parsers should read line-by-line, parse, and emit (or callback) per record, rather than slurping the whole file. Python's for line in open(file), Node's readline module, Go's bufio.Scanner, and Rust's BufReader::lines are the canonical patterns.
Compression conventions
.jsonl.gz— gzip. Most universally supported..jsonl.zst— Zstandard. Better compression ratios and faster decompression; supported by DuckDB, recent Pandas, and most modern data tools..jsonl.bz2— bzip2. Slower; rarely worth it over zstd..jsonl.xz— LZMA. Best ratio; slowest. Common for archival.
Gzip's framing supports concatenation, so you can cat a.jsonl.gz b.jsonl.gz > combined.jsonl.gz and consumers will see one continuous stream — the same trick works for plain JSONL via cat.
What the spec deliberately doesn't say
- Field order. JSON objects are unordered; JSONL inherits this. If you need deterministic field order (for diffs, hashing, reproducibility), sort keys at write time — but consumers must not assume order.
- Required fields. JSONL doesn't define schema. Use JSON Schema inferrer + JSON Schema validator for that.
- Field types. No annotations beyond JSON's own (string/number/bool/null/object/array). For typed schemas, layer JSON Schema on top, or use Parquet for typed columnar storage.
- Maximum line length. Parser-dependent. Most modern parsers handle gigabyte-scale lines, but downstream tools (text editors,
grep) may choke. - Duplicate keys in objects. RFC 8259 says behavior is "undefined." Most parsers keep the last-occurring value. Don't produce duplicates.
Conformance checklist for producers
- Emit UTF-8 without BOM.
- Use
\nline endings. - One JSON value per line, no embedded raw newlines inside records.
- No blank lines, no comments, no trailing commas.
- Same record shape every line.
- Trailing newline after the last record (so
wc -lmatches record count). - For HTTP:
Content-Type: application/x-ndjson, flush per record.
Conformance checklist for consumers
- Strip leading BOM if present.
- Accept
\nand\r\nline endings. - Skip blank lines silently (or error explicitly with a clear message).
- Parse each line independently — don't fail the whole file because one line is malformed; collect errors with line numbers.
- Stream-parse rather than slurp.
Run any file through the validator to check it against these rules, or the auto-fixer to make it conformant.
FAQ
Is there an official RFC for JSONL?
No. There is no IETF RFC or ISO standard. The closest thing to a canonical specification is jsonlines.org, which codifies the conventions. The format relies on RFC 8259 (JSON) for the per-line rules.
Can a record span multiple lines?
No. Records are delimited by line breaks. Any embedded newline inside a string value must be escaped as \n. Multi-line records would break every parser.
How do consumers handle a malformed line in the middle of a file?
The robust pattern is: log the line number and error, then continue with the next line. Failing the whole file is brittle for large datasets. Strict ETL pipelines may choose to fail-fast — both approaches are valid; the producer-consumer contract should make the choice explicit.
Does the format support comments?
No. JSON forbids comments and JSONL inherits that. If you need annotations, put them in a dedicated field like "_comment" per record. The auto-fixer can strip // and /* */ from files that incorrectly include them.
Why not just use a JSON array?
Because you can't stream a JSON array. The [ must come first, every record must be followed by a comma except the last, and the ] must come at the end. None of that is friendly to appending or to consumers that want to process records as they arrive. JSONL solves both with one rule change.
— S., [email protected]