jsonlkit.com
JSONL (JSON Lines) utilities, in the browser
Say hi →

Gemini Fine-Tune JSONL Validator

updated 17 May 2026 · Vertex AI supervised tuning · separate pages for OpenAI, Anthropic, Llama, Mistral

Google Gemini fine-tune JSONL validator. Validates Vertex AI supervised-tuning data: contents array, parts[], user/model alternation, optional systemInstruction, function-calling tools, multimodal fileData and inlineData, and the 131,072-token-per-example cap. Catches Vertex's exact consecutive_turns and Invalid role 'assistant' errors before the dataset validation job rejects your file.

Your training data never leaves this tab. Vertex AI pulls your dataset from GCS when a job starts; this pre-flight check is fully local.

⌨ Prefer the terminal? jsonlkit validate --gemini data.jsonl — same checks, in a pipe.

Validate

Drop a fine-tune .jsonl file here, or

Gemini Fine-Tune JSONL Validator

Validates your Gemini supervised-tuning file against Vertex AI's actual shape: a contents array starting with a user turn, strict user/model alternation (Gemini calls the assistant role model, not assistant), every entry's parts array containing at least one part with text, functionCall, functionResponse, inlineData, or fileData, and an optional top-level systemInstruction. Function-calling and multimodal tuning are both supported on the 2.5 family. 100% in-browser.

Validating a different provider? OpenAI, Anthropic (Claude), Llama / ShareGPT, Mistral.

Where can I fine-tune Gemini in 2026?

One platform: Vertex AI supervised tuning on Google Cloud. The free tuning that used to be available in Google AI Studio / Generative Language API was deprecated in May 2025 with Gemini 1.5 Flash-001, and no replacement was launched — if you find a tutorial showing the {"text_input": "...", "output": "..."} AI Studio shape, that's the old format and the endpoint that consumed it no longer exists.

Supported Gemini models for supervised tuning (Vertex AI, May 2026):

Older 1.x models, including 1.5 Pro and 1.5 Flash, are no longer tunable on either Vertex AI or the deprecated AI Studio endpoint.

The exact JSONL shape Vertex AI accepts

Text — canonical example

{
  "systemInstruction": {
    "role": "system",
    "parts": [{"text": "You are a precise legal summarizer."}]
  },
  "contents": [
    {"role": "user",  "parts": [{"text": "Summarize Clause 4.2."}]},
    {"role": "model", "parts": [{"text": "Clause 4.2 limits liability to fees paid in the prior 12 months."}]}
  ]
}

Function calling — full round-trip

Function-calling tuning is a supported modality on the 2.5 family. The training example shows the model how to emit a functionCall, how to consume a functionResponse, and how to compose the final answer.

{
  "contents": [
    {"role": "user",  "parts": [{"text": "Weather in Kyiv tomorrow?"}]},
    {"role": "model", "parts": [{"functionCall": {"name": "get_weather", "args": {"city": "Kyiv", "when": "tomorrow"}}}]},
    {"role": "user",  "parts": [{"functionResponse": {"name": "get_weather", "response": {"tempC": 14, "sky": "rain"}}}]},
    {"role": "model", "parts": [{"text": "14 C with rain."}]}
  ],
  "tools": [{"functionDeclarations": [{
    "name": "get_weather",
    "description": "Get weather by city and time",
    "parameters": {
      "type": "object",
      "properties": {"city": {"type": "string"}, "when": {"type": "string"}},
      "required": ["city"]
    }
  }]}]
}

Multimodal — image via GCS URI

For dataset-scale multimodal tuning, Vertex AI prefers fileData pointing at a GCS URI over inline base64. Use inlineData for small one-offs.

{"contents": [
  {"role": "user",  "parts": [
    {"fileData": {"mimeType": "image/jpeg", "fileUri": "gs://my-bucket/cat.jpg"}},
    {"text": "What breed?"}
  ]},
  {"role": "model", "parts": [{"text": "Maine Coon."}]}
]}

The deprecated AI Studio shape — don't use this

{"text_input": "...", "output": "..."}
// Old Google AI Studio / Generative Language API tuning format.
// Endpoint deprecated May 2025. Will NOT work on Vertex AI.

Vertex AI dataset limits

ItemValue
Maximum tokens per training example131,072
Minimum dataset size16 (Vertex enforced); ≥ 100 recommended
Recommended dataset size100 – 500 examples to start; thousands for production
Recommended validation split10 – 20%
First message roleuser
Role names inside contentsuser, model only — strict alternation
System prompt locationTop-level systemInstruction (not inside contents)
Multimodal tuningText, document (PDF), image, audio, video, function-calling — all on the 2.5 family

What this validator checks

Common mistakes this validator catches

{"contents":[{"role":"assistant","parts":[{"text":"broken"}]}]}
// Error: Gemini uses 'model', not 'assistant'.

{"contents":[{"role":"user","parts":{"text":"Hi"}}]}
// Error: 'parts' must be an array, not an object.

{"contents":[{"role":"user","content":"Hi"}]}
// Error: Gemini uses 'parts', not 'content'.

{"contents":[
  {"role":"user","parts":[{"text":"A"}]},
  {"role":"user","parts":[{"text":"B"}]}
]}
// Error: consecutive_turns — roles must alternate user/model.

{"text_input":"Q","output":"A"}
// Error: deprecated AI Studio shape; not accepted by Vertex AI.

Real Vertex AI error strings

ErrorCauseFix
Dataset validation failed: {consecutive_turns: [N]}Two same-role turns in a row on line N+1.Enforce strict user/model alternation; merge or split the offending turn.
Invalid role 'assistant'Copy-pasted OpenAI shape.Rename role to model.
parts must be an arraySent "parts": {...}.Wrap in [ ]: "parts": [{"text": "..."}].
Field 'content' is unknownUsed OpenAI's content key.Replace with parts: [{"text": "..."}].
Example exceeds maximum token limit of 131072Single example too long.Split the conversation, drop history, or shorten the system instruction.
Dataset must contain at least 16 examplesToo few rows.Add more — recommended ≥ 100 for any real signal.
Unsupported modality: function_callTarget model variant doesn't support function-calling tuning.Use Gemini 2.5 Pro or 2.5 Flash; check the supported-modality matrix.
Invalid JSON on line NPretty-printed record, stray newline inside a value, or trailing comma.Minify each record onto one line; one record = one JSON object = one line.

How to use it

  1. Drop a .jsonl file or paste records directly.
  2. Click Validate. Every line is parsed and every part is checked.
  3. Inspect the Error List — each entry maps to a Vertex error you'd see at dataset-validation time.
  4. Download valid examples only to ship a clean file to GCS.
  5. Upload to GCS, then start the tuning job from the Vertex AI console or gcloud ai CLI.

Tips & common pitfalls

Troubleshooting

Vertex says consecutive_turns: [73]. What does this mean?

Two turns with the same role appear in a row. The number is the 1-based index of the offending turn within contents. Reorder, merge, or split that turn so the conversation strictly alternates user / model.

Can I still fine-tune in Google AI Studio?

No. AI Studio / Generative Language API tuning was deprecated in May 2025 alongside Gemini 1.5 Flash-001, and no replacement was launched. Use Vertex AI.

Which modalities can I actually train on?

On Gemini 2.5 Pro and 2.5 Flash: text, document (PDF), image, audio, video, and function-calling. Flash-Lite is text-focused. Modalities are mixed within the same JSONL — each parts array can contain any combination.

My example is over 131,072 tokens. What do I cut?

The system instruction first (it usually has the most slack). Then strip earlier conversation history. If you're tuning on long documents, switch to fileData with GCS URIs instead of inlining content.

Does the validator know about Gemini's tokenizer exactly?

It uses a character-based estimate. The Gemini tokenizer averages about 1 token per 4 characters for English text; CJK and code can be denser. Use the token counter with the Gemini preset for exact numbers.

Is my data uploaded?

Never. Everything runs in your browser. See the privacy policy.

Frequently asked questions

What is the JSONL format for Gemini fine-tuning?

One JSON object per line, with a contents array of {role, parts} entries. Roles are user and model; parts is always an array with at least one part containing text, functionCall, functionResponse, inlineData, or fileData. Optional top-level systemInstruction and tools.

What's the difference between contents and messages?

OpenAI/Anthropic/Mistral/Llama use messages; Google uses contents. Roles are user/model on Google vs user/assistant on the others. Content lives in parts (always an array) on Google vs a content string on the others. They are not interchangeable.

Why is the role called model?

It's Google's convention going back to the early Gemini API. Treat model as the equivalent of assistant elsewhere.

How do I validate a Gemini fine-tune JSONL before uploading?

Use this page, or run Vertex's own dataset validation in dry-run mode. This page catches the same structural and modality errors before you copy to GCS.

What's the minimum dataset size?

Vertex enforces a minimum of 16. Practical recommendation is ≥ 100 examples to see real signal; production fine-tunes typically use thousands.

Can I tune on function calling?

Yes, on Gemini 2.5 Pro and 2.5 Flash. Include the top-level tools array with functionDeclarations, and use functionCall / functionResponse parts inside contents.

What models support fine-tuning in 2026?

On Vertex AI: Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash-Lite. Gemini 2.0 Flash and 2.0 Flash-Lite are scheduled for shutdown on 1 June 2026 — don't start new SFT jobs there. 1.x models are no longer tunable on any platform.

Related tools