OpenAI Fine-Tune JSONL Validator
OpenAI fine-tune JSONL validator. Paste a file and see every line that will fail OpenAI's upload checks: missing messages, bad roles, broken tool_calls, content-type mismatches, examples over 16,385 tokens, the legacy prompt/completion shape, and the same seven validation errors the official data-prep cookbook looks for. Runs in your browser, up to 1 GB, nothing uploaded.
Your training data never leaves this tab. OpenAI uploads the file when you start a job; this pre-flight check is fully local — useful when the dataset has PII you don't want to round-trip twice.
Validate
OpenAI Fine-Tune JSONL Validator
Validate your file against the two formats OpenAI's training API actually accepts: the modern chat shape ({"messages":[…]}) used by gpt-4.1, gpt-4o, and gpt-4o-mini, and the legacy prompt / completion shape that still runs on babbage-002 and davinci-002. Every line is parsed, every message is checked for role, content, and structure, and the result is mapped to the exact error code OpenAI returns at upload — so you can fix the source data before the job fails.
For other providers we have dedicated pages: Anthropic (Claude), Google Gemini, Llama 3 / ShareGPT / Alpaca, Mistral.
What this tool does
It runs every check from OpenAI's official chat_finetuning_data_prep cookbook against your file, locally, before you upload — plus the upload-time errors the cookbook misses (token-limit overshoot, BOM-corrupted UTF-8, JSON-array wrapping). Each problem is mapped to a specific line number and the exact OpenAI error string you'd otherwise see in the dashboard.
The intent it closes: "I don't want to find out my training file is broken after the upload-and-queue cycle, or after I've paid for a botched run." A bad invalid_training_file error costs the time to upload (minutes for big files) plus the queue wait plus the round-trip to find which of 10,000 lines is broken. This page returns the same verdict in seconds, against the same rules, without sending your data anywhere.
When you'd reach for it
- Pre-flight before any fine-tune upload. Run it as the last step in your data-prep pipeline — same checks as the cookbook, but interactive.
- Diagnose an
invalid_training_filerejection. Paste the file, every broken line is listed with a line number and a remediation hint. - Migrate from prompt/completion to chat. Switch the Format dropdown to OpenAI chat and find the lines that still look like the legacy shape.
- Sanity-check Python-generated data. Catches
str(dict)repr quotes,ensure_asciimojibake, missing assistant turns — the usual silent corruptions. - Filter out unsalvageable rows. Download valid examples only rebuilds a clean file with broken lines stripped, so you can keep training instead of debugging row by row.
- Sense-check size and cost. The summary shows total examples, average messages per example, and an approximate token count — useful before kicking off a billable run.
How validation works
The pipeline runs three passes per click of Validate.
1. Parse every line as JSON
Each non-empty line is parsed as a standalone JSON value. Parse failures are reported with the line number, the parser's column hint, and a suggested fix (Python-repr quotes, trailing commas, smart quotes, BOM). Blank lines are flagged separately rather than silently ignored — they're a common source of file contains 0 valid examples.
2. Check shape against the active format
For OpenAI chat: the line must be an object with a messages array of at least 2 entries; every message must have a valid role (system, developer, user, assistant, tool, function); every message must have content (or, for assistant turns, tool_calls / function_call); content arrays must contain typed parts; tool messages need tool_call_id; function messages need a string name; at least one assistant turn must exist somewhere in the example. For legacy prompt/completion: both fields must be non-empty strings.
3. Aggregate, count, estimate tokens
All errors across all lines are collected into one list with a line number per error — no early exit, so a single bad row doesn't hide the next thousand. Valid lines feed an approximate token count (chars ÷ 4 — fast, not exact; use the token counter when you need a real tiktoken number). The summary block reports total examples, valid examples, average messages per example, total characters, and the rough token estimate.
The OpenAI fine-tune JSONL format, in 30 seconds
Each line is one independent JSON object. Each object has a messages array. Each message has a role and a content (with a few exceptions for tool calls). Every example must end on an assistant turn — that's the target the model learns to produce.
{"messages":[
{"role":"system","content":"You are a terse assistant."},
{"role":"user","content":"2+2?"},
{"role":"assistant","content":"4"}
]}
Allowed roles in 2026: system, developer, user, assistant, tool. The legacy function role still parses but the modern equivalent is tool.
Limits cheatsheet (2026)
| Item | Value |
|---|---|
| Minimum training examples | 10 (≈ 50 to see signal, ≥ 100 for production) |
| Maximum tokens per example | 16,385 (gpt-4o family); anything over is truncated or rejected |
| Maximum training file size | 512 MB per upload (most accounts) |
| File format | UTF-8 JSONL, one record per line, purpose=fine-tune |
| Last message of every example | must be assistant |
| Models with SFT (2026) | gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini |
| Models with RFT (reinforcement) | o4-mini |
| Legacy prompt/completion | babbage-002, davinci-002 only · new training disabled 28 Oct 2024 · existing FTs still callable |
What this validator checks
Every check from OpenAI's official chat-finetuning data-prep cookbook, plus a handful of upload-time errors that the cookbook misses. Each problem maps to an error code you can grep for:
| Cookbook error code | What it means | Fix |
|---|---|---|
data_type | Line isn't a JSON object | Each line must be a single {…}, not an array, not a string |
missing_messages_list | No messages key | Wrap the conversation in {"messages":[…]} |
message_missing_key | A message lacks role or content | Add both; content may be null only when tool_calls is present on an assistant turn |
message_unrecognized_key | Keys outside the allowed set (role, content, name, function_call, tool_calls, tool_call_id, weight, refusal) | Remove extras like solution, final_answer, metadata |
unrecognized_role | Role is not system / developer / user / assistant / tool / function | Fix typos like asistant, usre |
missing_content | Empty or missing content on a non-tool-call assistant turn | Either fill content or make it a tool_calls turn with content:null |
example_missing_assistant_message | No assistant turn anywhere in the example | Add one — that's the target the model trains on |
How to use it
- Drop a
.jsonlfile into the dashed box, or paste records directly. - Pick the format. Default is OpenAI chat. Switch to legacy prompt/completion only for old
babbage-002/davinci-002datasets. - Click Validate. Each line is checked against the seven cookbook rules plus token-limit and structural checks.
- Inspect the Error List — every issue has a line number, an error code, and a one-line explanation.
- Download valid examples only rebuilds a clean file with broken lines stripped out — useful when 5–10 rows are unsalvageable and the rest are fine.
Examples for every supported shape
Minimal chat example
{"messages":[{"role":"system","content":"You are a terse assistant."},{"role":"user","content":"2+2?"},{"role":"assistant","content":"4"}]}
Multi-turn with the weight field
Set "weight": 0 on an assistant turn to keep it in the conversation context but exclude it from the training loss. Useful when you want to show prior assistant turns for context without teaching the model to imitate them.
{"messages":[
{"role":"system","content":"Marv is a sarcastic chatbot."},
{"role":"user","content":"What's the capital of France?"},
{"role":"assistant","content":"Paris, as if everyone doesn't know already.","weight":0},
{"role":"user","content":"Who wrote Romeo and Juliet?"},
{"role":"assistant","content":"Shakespeare. Original, I know.","weight":1}
]}
The developer role (replaces system on o-series and GPT-4.1)
For o1, o3, o4-mini, and the GPT-4.1 family, OpenAI recommends the developer role in place of system. Both are still accepted; developer is the future-proof choice.
{"messages":[
{"role":"developer","content":"Reply in JSON only."},
{"role":"user","content":"Color of grass?"},
{"role":"assistant","content":"{\"color\":\"green\"}"}
]}
Tool calling — full round-trip
Fine-tuning tool use means teaching the model when to emit a tool_calls array (with content:null), then how to compose the final answer after a tool message with the function result arrives. Every assistant turn that calls a tool must have a matching tool turn next, sharing the same tool_call_id. The top-level tools array describes available functions; parallel_tool_calls defaults to true.
{"messages":[
{"role":"user","content":"What's the weather in SF?"},
{"role":"assistant","content":null,"tool_calls":[
{"id":"call_1","type":"function","function":{"name":"get_weather","arguments":"{\"city\":\"San Francisco\"}"}}
]},
{"role":"tool","tool_call_id":"call_1","content":"{\"temp\":62,\"unit\":\"F\"}"},
{"role":"assistant","content":"It's 62 F in San Francisco."}
],
"tools":[{"type":"function","function":{
"name":"get_weather","description":"Get current weather",
"parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}
}}],
"parallel_tool_calls":false}
Vision fine-tuning
Multimodal SFT works on gpt-4o family. Each content becomes an array of typed parts; image_url can point at a public URL or a base64 data URI.
{"messages":[
{"role":"user","content":[
{"type":"text","text":"What's in this image?"},
{"type":"image_url","image_url":{"url":"https://example.com/cat.jpg"}}
]},
{"role":"assistant","content":"A tabby cat."}
]}
Legacy prompt / completion (babbage-002, davinci-002)
Only used for the two remaining base models. New training runs were disabled on 28 Oct 2024 for everyone except customers with existing fine-tunes. Strings must be non-empty, and OpenAI's old convention is that the completion starts with a single leading space and ends with a stop sequence.
{"prompt":"Translate to French: hello ->","completion":" bonjour\n"}
Exact OpenAI error strings — what you see when upload fails
These are the messages the OpenAI dashboard, the Python SDK, and the CLI actually return. Search them verbatim if you've already hit one; otherwise read down to see what this validator catches in advance.
| Error message | What's actually wrong | Fix |
|---|---|---|
The job failed due to an invalid training file. Unexpected file format, expected either prompt/completion pairs or chat messages. |
Different lines have different shapes, or the file is a single JSON array [ {…}, {…} ] instead of one object per line. |
Use real JSONL (newline-separated objects). Pick one schema and stick to it. |
Invalid file format for Fine-Tuning API. Must be .jsonl |
Wrong extension or wrong purpose on upload. |
Rename to .jsonl; upload with purpose="fine-tune". |
Line N, message M, key 'content.str': Input should be a valid string |
content is an object/number/null where a string is expected. |
Stringify the value, or use the parts-array form for vision. |
Example N contains invalid tokens |
UTF-8 / BOM / surrogate pair issue — common when Python writes with ensure_ascii=True and bad escapes. |
Re-encode as UTF-8 without BOM. Use json.dumps(..., ensure_ascii=False). |
At least one message must be from the assistant |
Example ends on a user turn — no target for the model to learn. | Append an assistant message. |
Example N exceeds the maximum token limit of 16,385 tokens |
Single example too long after tokenization. | Split the conversation, drop earlier history, or trim the system prompt. |
File contains 0 valid examples |
Lines look like Python repr ('role' single quotes), or have trailing commas, or the whole file is a JSON array. |
Use json.dumps per line — never str(dict). |
Training file must contain at least 10 examples |
Fewer than 10 lines after deduplication. | Add more rows. |
Recipes by intent
Pre-flight a chat fine-tune before upload
Format OpenAI chat. Drop the file. If the status bar says "Valid OpenAI chat file. N / N examples OK", upload with confidence. If anything is red, fix the listed lines first — every problem here would also fail upload.
Recover an already-rejected upload
Download the original file from your local export (don't try to fetch it back from OpenAI — they don't expose it). Paste in. The line numbers in the error list map 1:1 to the file you uploaded. Cross-reference with the OpenAI error string table below to identify the exact failure mode.
Strip unsalvageable rows and re-upload
Click Download valid examples only. This re-runs validation, keeps lines that produce zero errors, and writes a fresh openai-fine-tune-clean.jsonl. Useful when 5–10 rows of 10,000 are unfixable and you'd rather drop them than block the training run.
Migrate from prompt/completion to chat
Validate twice: first as legacy prompt/completion to confirm the old file is sound, then convert (most teams do this in code), then validate the result as OpenAI chat. Watch for missing assistant turns — a common migration mistake is using completion verbatim as the assistant content without wrapping it in a message.
Estimate cost before training
The summary shows an approximate total token count (chars ÷ 4). Multiply by your epochs and the per-token training price for the target model — that's a usable order-of-magnitude estimate. For exact numbers, run the token counter.
Errors and how to fix them
"invalid JSON" on a line
The line itself isn't valid JSON. Common causes: trailing commas, single-quote keys from Python's str(dict), unescaped newlines inside strings, stray NaN / Infinity, or unescaped backslashes. The JSONL auto-fixer repairs most of these mechanically. If you generated the file in Python, use json.dumps(record, ensure_ascii=False) per line, never str(record).
"missing 'messages' key" or "expected JSON object, got array"
Your file is probably a single JSON array ([ {…}, {…} ]) rather than line-delimited objects. Use JSONL ↔ JSON to flip the array into JSONL, then re-validate.
"messages[i].role 'X' is invalid"
A role typo (asistant, usre, assistat) or an unsupported role. Allowed values: system, developer, user, assistant, tool, function. Anything else fails upload silently — this validator catches it before the round-trip.
"no 'assistant' message found"
The example has no assistant turn. The assistant turn is the only thing the model is trained to predict, so an example without one carries no training signal and fails upload. Add an assistant message — usually as the final turn.
"messages[i].content is null" or "is an empty string"
null content is only allowed on an assistant turn that has tool_calls or function_call. Empty strings are never allowed for non-assistant roles. Either fill the content or restructure the turn.
"tool message missing 'tool_call_id'"
Every tool-role message must reference the assistant tool_calls[].id that asked for it. Pair them up; the IDs are how OpenAI matches request to response inside the turn graph.
Local validation passes but OpenAI still rejects
Three usual suspects. One: a single example exceeds 16,385 tokens after real tokenization (the page's chars/4 estimate is too generous); run the token counter on suspicious lines. Two: the file has a UTF-8 BOM or stray non-printables; re-save as UTF-8 without BOM. Three: wrong upload purpose — must be fine-tune, not assistants or batch.
Browser is lagging on a large file
Use the file-drop zone instead of paste — drag-and-drop reads from disk without round-tripping through the textarea. This tool is happiest under 200 MB. For multi-GB files, prefer the CLI: jsonlkit validate --openai training.jsonl.
FAQ
Is my data uploaded?
Never. There's no backend — the validator runs entirely in your browser. Disconnect from the internet after the page loads and it still works. Safe for files with PII or proprietary prompts.
How many examples do I need to fine-tune?
OpenAI's minimum is 10. You'll see meaningful signal at ~50, and most production fine-tunes run on 100–10,000 examples. Quality and consistency matter more than quantity — adding noisy examples often hurts more than it helps.
What does the weight field do?
weight: 0 on an assistant turn keeps it in the context but excludes it from the loss. weight: 1 is the default. Useful when you want prior assistant turns for grounding without teaching the model to imitate them.
What's the developer role?
It replaces system on the o-series (o1, o3, o4-mini) and the GPT-4.1 family. Both are still accepted; developer is the recommended choice going forward and this validator treats them identically.
How does this differ from running the cookbook script locally?
It runs the same checks in the browser instead of needing a Python install. The cookbook script is authoritative — when in doubt, run both; the verdict should match line-for-line. The advantage here is the interactive error list with line numbers and the "download valid examples only" filter.
Does the token count match tiktoken?
No — the page uses a chars/4 heuristic so it stays fast and dependency-free. It's accurate enough to flag examples that are obviously over 16,385, but for budget calculations or close-to-cap rows, use the token counter page, which runs a real tokenizer.
Related tools
See also: if your file is broken in unrelated ways, JSONL auto-fixer repairs trailing commas, smart quotes, BOMs, and Python-repr quotes; Formatter pretty-prints or minifies each record; JSONL → CSV flattens for spreadsheet review.