CSV to JSONL Converter
100% client-side. Your CSV stays in your browser.
Convert
CSV to JSONL Converter
Turn a CSV file (with a header row) into JSONL — one JSON object per row, ready for fine-tune uploads, BigQuery loads, or any tool that wants newline-delimited JSON. Numbers and booleans get auto-typed.
Before you start
You need a CSV file with a header row on line 1 — the column names become the JSON keys. Without that header, the output won't have meaningful field names. If your CSV is "headerless", open it in Excel or a text editor and prepend a row like col1,col2,col3 first.
The parser follows RFC 4180: quoted fields, escaped double-quotes ("" inside a quoted cell), embedded newlines inside quotes, and CRLF or LF line endings all work as expected. If your file uses semicolons or tabs (common for European exports or TSV dumps), pick the matching delimiter.
There's no hard size limit — everything happens in your browser's memory. For files over ~100 MB you'll want to switch to a CLI tool like mlr or csvkit.
Why convert CSV to JSONL?
The biggest reason today: LLM fine-tuning datasets. People build training data in Google Sheets or Excel because that's where annotators work, then need to ship it to OpenAI, Anthropic, or HuggingFace as JSONL. CSV in, JSONL out, no scripting required.
Other places this conversion matters:
- BigQuery loads — JSONL is the recommended ingestion format for nested schemas; CSV doesn't support nesting at all.
- Vector databases — Pinecone, Weaviate, and Qdrant all consume NDJSON for bulk imports.
- Streaming pipelines — once your data is JSONL it can be split, filtered, and merged with simple
grep/sed/jqwithout breaking a parser. - Replaying API requests — a JSONL file is a natural job queue: one request per line.
How to use it
- Paste your CSV into the Input pane (header row first), or drop a
.csvfile onto the page. - Pick your Delimiter — comma is the default, but Tab is safer when cells contain commas.
- Choose how aggressively to convert Types:
- auto:
"42"becomes42,"true"becomestrue,"null"becomesnull. - auto + blanks → null: same as auto, but empty cells are emitted as explicit
nullinstead of being omitted. - strings only: every value stays a string. Use this for IDs that look like numbers but aren't (zip codes, SKUs, phone numbers).
- auto:
- Decide whether headers like
user.nameshould produce flat keys or get expanded into nested objects. - Click Convert. Each data row becomes one minified JSON object on its own line.
- Copy or Download .jsonl when you're happy with the output.
Example
Input (CSV with dot.notation header):
id,user.name,user.age,tags
1,Ada,36,"eng,lead"
2,Linus,54,eng
Output (JSONL, types=auto, nesting=dot):
{"id":1,"user":{"name":"Ada","age":36},"tags":"eng,lead"}
{"id":2,"user":{"name":"Linus","age":54},"tags":"eng"}
Notice that the user.name and user.age columns got merged into a nested user object, and that id and age became real numbers, not strings. This is the round-trip pair to my JSONL to CSV tool — flatten with one, restore with the other.
Fine-tune dataset workflow
Most fine-tune datasets I've seen in the wild start as a Google Sheet with three columns: system, user, assistant. The fastest way to ship that to OpenAI is:
- File → Download → CSV from Sheets.
- Paste here. The output will be one object per line:
{"system":"…","user":"…","assistant":"…"}. - If you need OpenAI's
messagesshape, do a quick reshape withjqor paste through a small script — the OpenAI Fine Tune Validator will tell you exactly what's missing.
Options explained
Delimiter
Pick whatever your file uses. Comma is the default. Tab (TSV) is the safest choice if your cells contain commas, prose, or JSON snippets. Semicolon is common in European locales where the decimal separator is a comma.
Types
CSV is fundamentally string-based — there's no way to tell from the bytes alone whether 007 is the number seven or a James Bond reference. The "auto" mode does what most people want: clean integers, decimals, and the literal words true/false/null get converted. Anything ambiguous stays a string. Use "strings only" when you have IDs that look numeric but must stay as strings (zip codes, phone numbers, leading-zero SKUs).
Nesting
"Flat keys" gives you exactly what's in the header — {"user.name": "Ada"}. "Expand dot.notation" interprets dots as nested keys: {"user": {"name": "Ada"}}. Pick the second one if your CSV came from my JSONL to CSV flattener and you want the original structure back.
Tips & common pitfalls
- Quote your commas. If a cell contains a comma, it must be wrapped in double quotes:
"hello, world". Inside that cell, escape any literal double-quote by doubling it:"she said ""hi""". - Numbers with leading zeros become strings. The auto-typer is conservative —
00123stays a string because turning it into123would silently corrupt zip codes and IDs. - Empty cells. By default I omit them from the JSON object so your output stays compact. Switch to "blanks → null" if a downstream consumer requires every key on every line.
- Excel encoding. Files saved from Excel on Windows are sometimes UTF-16 with a BOM. If your output looks like garbage characters, re-save the CSV as "CSV UTF-8".
- Header collisions. If two columns have the same name, the second one overwrites the first. The status bar warns you about duplicates.
Troubleshooting
Some of my "numbers" stayed as strings.
The auto-typer only converts strict-format numbers (no leading zeros, no spaces, no thousands separators). "1,000", "01", and " 42 " all stay as strings on purpose — silently turning them into numbers tends to corrupt real-world data.
My output has commas in weird places.
That usually means the parser got confused about which delimiter you're using, or a quoted cell wasn't closed. Open the input in a CSV-aware editor and look for an unmatched ".
The first JSON object is missing fields.
Check that line 1 of your input is the header row, not a data row. Without a header, the first row's values become field names and you lose that data.
I want OpenAI's messages shape.
This tool produces flat objects from CSV columns. To reshape into {"messages": [{role, content}, …]}, run the JSONL through jq or a small script, then validate with the OpenAI Fine Tune Validator to confirm the schema.
Related tools
See also: if you need to do something adjacent on this site, try Formatter to pretty-print or minify your JSONL, Viewer to scan and filter the converted rows, or JSONL to JSON to wrap the JSONL into a single array for an API payload.
Frequently asked questions
Does the first row have to be a header?
Yes. The header row supplies the JSON keys. Without it, you'd have {"1": "Ada", "2": 36} on row 1 and lose the meaning of each column. If your file has no header, prepend one in any text editor.
What about CSV files with embedded newlines inside cells?
Supported. The parser follows RFC 4180, so a quoted cell can span multiple physical lines. Each logical CSV row produces one JSONL line.
How do I get nested objects from a flat CSV?
Name your columns with dots — user.name, user.email, address.city — and switch the Nesting option to "expand dot.notation". The tool will rebuild the nested structure.
Are arrays supported?
Not natively. CSV cells are scalar by design. If you need arrays, store them as JSON strings inside a cell ("[1,2,3]") and post-process the JSONL afterwards. The matching JSONL to CSV tool flattens arrays into indexed columns (tags.0, tags.1), and this tool can rebuild those if you keep the same naming.
How can I do this on the command line?
mlr --c2j cat input.csv from Miller produces a JSON array; pipe through jq -c '.[]' to get JSONL. csvjson from csvkit works similarly.
Is my CSV uploaded?
No — the parser runs entirely in your browser. The data never leaves your machine, which is why this is a fine choice for sensitive datasets, customer lists, or training data with PII.