jsonlkit.com
JSONL (JSON Lines) utilities, in the browser
Say hi →

JSONL Best Practices

Performance, gotchas, libraries · updated 21 May 2026 · overview · spec · vs JSON · examples

JSONL is a simple format with a small number of recurring pitfalls. This page is the field guide for using it well — covering streaming, large files, compression choices, encoding traps, schema evolution, and the canonical library for each language.

1. Always stream, never slurp

The whole point of JSONL is that you can read one record at a time. Don't load the file into memory.

Python

# Good — streams line-by-line, constant memory
import json
with open('events.jsonl', 'r', encoding='utf-8') as f:
    for line in f:
        record = json.loads(line)
        process(record)

# Bad — reads everything, blows up on a 50 GB file
with open('events.jsonl') as f:
    records = [json.loads(l) for l in f.readlines()]

Node.js

// Good — node:readline streams
import { createReadStream } from 'fs';
import { createInterface } from 'readline';

const rl = createInterface({
  input: createReadStream('events.jsonl', 'utf-8'),
  crlfDelay: Infinity,
});

for await (const line of rl) {
  if (!line) continue;
  const record = JSON.parse(line);
  process(record);
}

Go

// Good — bufio.Scanner streams; bump buffer for long lines
f, _ := os.Open("events.jsonl")
defer f.Close()
s := bufio.NewScanner(f)
s.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)  // up to 16 MB lines
for s.Scan() {
    var rec map[string]any
    if err := json.Unmarshal(s.Bytes(), &rec); err != nil {
        log.Printf("line %d: %v", lineNum, err)
        continue
    }
    process(rec)
}

Rust

use std::io::{BufRead, BufReader};
use std::fs::File;
let f = File::open("events.jsonl")?;
for line in BufReader::new(f).lines() {
    let line = line?;
    let rec: serde_json::Value = serde_json::from_str(&line)?;
    process(rec);
}

2. Compress for storage and transport

JSONL compresses extremely well because keys repeat every record. Typical ratios:

CodecRatioDecompress speedWhen to use
gzip (level 6)5–10×MediumUniversal compatibility, default everywhere
zstd (level 3)5–12×FastModern stacks (DuckDB, Pandas, ClickHouse); strongly recommended
zstd (level 19)8–20×FastArchival; small files, cold storage
brotli6–15×MediumHTTP-static delivery (Cloudflare, browsers)
xz / LZMA10–25×SlowCold archives where compute is cheap

Recommendation: Use zstd for working files, gzip for cross-tool compatibility, xz only for long-term cold archives. Most tooling reads .jsonl.gz and .jsonl.zst natively without decompressing the whole file first.

Concatenation trick: Both gzip and zstd support multi-frame concatenation:

# Concatenate compressed JSONL files without decompressing
cat 2026-05-21-*.jsonl.gz > day.jsonl.gz
# Consumers see one continuous stream. Works for zstd too.

3. Set the right MIME type for HTTP

HeaderUse case
Content-Type: application/x-ndjsonHTTP requests and responses carrying JSONL
Content-Type: application/x-ndjson; charset=utf-8Explicit, the most defensive choice
Transfer-Encoding: chunkedStreaming — flush after each record
Content-Encoding: gzip / brCompressed-on-the-wire transport

Never send JSONL with Content-Type: application/json — clients will assume it's a single document and fail at the second record.

4. Handle malformed lines gracefully

Real-world JSONL files often have a few bad lines, especially when produced by ad-hoc loggers or by tools that crashed mid-write. The robust pattern is:

def read_jsonl(path):
    errors = []
    for line_num, line in enumerate(open(path, encoding='utf-8'), start=1):
        line = line.strip()
        if not line:  # blank line — skip silently
            continue
        try:
            yield json.loads(line)
        except json.JSONDecodeError as e:
            errors.append((line_num, str(e), line[:200]))
    if errors:
        # decide: warn, log, or fail
        log_errors(errors)

Don't blow up the whole import on a single bad row. Collect errors with line numbers, decide policy explicitly (warn / log / fail-fast). If you've inherited a file with widespread corruption, run it through the auto-fixer first — repairs trailing commas, single quotes, smart quotes, BOMs, comments, and the dozen other things that break naive parsers.

5. Encoding traps

6. Schema evolution

JSONL is schemaless by design, which is great for prototyping and brutal for production unless you have a strategy:

7. Field naming conventions

8. Sort keys for stable diffs

JSON object keys are unordered, but many sources of value (git diffs, SHA hashes for cache busting, reproducible builds) depend on deterministic byte output. Sort keys recursively at write time when stability matters:

# Python
json.dumps(record, sort_keys=True, ensure_ascii=False)

# Node
JSON.stringify(record, Object.keys(record).sort())

# jq filter applied to existing file
jq -c 'walk(if type == "object" then to_entries | sort_by(.key) | from_entries else . end)' \
   input.jsonl > sorted.jsonl

9. Handle big files with the right tool

File sizeToolWhy
< 100 MBThis site (browser-based)Loads in seconds, no install, all features available
100 MB – 1 GBThis site or jq / DuckDB locallyBrowser memory usually fits; jq for filters, DuckDB for SQL-style
1 GB – 50 GBjsonlkit CLI or jqStreaming, line-by-line, never loads whole file
> 50 GBDuckDB, Spark, or partition by dateNative parallel readers; consider Parquet for analytics

10. Canonical libraries by language

LanguageRead / writeValidateQuery
Pythonstdlib json + line iteration; orjson for speedjsonschema, pydanticjq (subprocess), jsonpath-ng, duckdb
JavaScript / Nodereadline + JSON.parse; ndjson npm packageajvnode-jq, jsonpath-plus
Goencoding/json + bufio.Scannergojsonschemaitchyny/gojq
Rustserde_json + BufReadjsonschema cratejaq (jq in Rust)
Java / KotlinJackson JsonFactory + line streameverit-json-schemaJsonPath (Jayway)
Shellcat, head, tail, wc -ljq -e 'empty' per line; our validatorjq

11. Useful shell one-liners

# Count records
wc -l events.jsonl

# First and last record
head -1 events.jsonl ; tail -1 events.jsonl

# Pretty-print one record
head -1 events.jsonl | jq

# Filter by field
jq -c 'select(.user_id == 4287)' events.jsonl > user-4287.jsonl

# Extract one field across all records (TSV output)
jq -r '[.ts, .user_id, .event_type] | @tsv' events.jsonl

# Validate every line
jq -e -c . events.jsonl > /dev/null && echo "all valid"

# Sort by a field
jq -s 'sort_by(.ts) | .[]' events.jsonl > sorted.jsonl

# Dedupe by full line
sort -u events.jsonl

# Dedupe by a key
jq -s 'unique_by(.event_id) | .[]' events.jsonl

# Random sample of 1000 lines
shuf -n 1000 events.jsonl

# Split into 10 equal parts
split -n l/10 -d events.jsonl events_part_

12. Privacy: scrub PII before sharing

JSONL files often pick up personal data — emails, IP addresses, names, account IDs. Before sharing externally:

13. Producer-consumer contracts

Document these explicitly between teams:

14. Common mistakes

Treating .jsonl as a JSON array

The mistake: JSON.parse(fileContents) on a JSONL file. The fix: read line-by-line, parse each. Almost every "invalid JSON" error on a JSONL file is this.

Pretty-printing JSONL

Pretty-printing introduces newlines inside records, which breaks the format. JSONL records are always on one line each. Use the formatter to flip between pretty JSON and JSONL.

Trailing commas (LLM output)

LLMs love to insert , after the last property. Strict JSON rejects this. Run through the auto-fixer or strip with sed.

Missing newline before EOF

Common with naively concatenated files. Symptom: wc -l is off-by-one; the last record may be silently dropped by some consumers. Always end with a \n.

Mixed record shapes

Spec-legal but consumer-hostile. Stick to one shape per file. If you genuinely need heterogeneous records, add a discriminator field ("event_type":"...") and document it.

Using application/json for the MIME type

Clients will try to parse the whole body as one document. Use application/x-ndjson instead.

BOM in the file

EF BB BF at the start of byte 0 breaks parsing of the first record on naive parsers. Don't write a BOM; do strip it on read.

15. Where to go from here

— S., [email protected]