Tutorial on enforcing JSON output with Llama.cpp or the Gemini’s API
Large Language Models (LLMs) are great at generating text, but getting structured output like JSON usually requires clever prompting and hoping the LLM understands. Thankfully, JSON mode is becoming more common in LLM frameworks and services. This lets you define the exact output schema you want.
This post gets into constrained generation using JSON mode. We’ll use a complex, nested and realistic JSON schema example to guide LLM frameworks/APIs like Llama.cpp or Gemini API to generate structured data, specifically tourist location information. This builds on a previous post about constrained generation using Guidance, but focuses on the more widely adopted JSON mode.
Save Time and Effort When Building LLM Apps Using Guided Generation
While more limited than Guidance, JSON mode’s broader support makes it more accessible, especially with cloud-based LLM providers.
During a personal project, I discovered that while JSON mode was straightforward with Llama.cpp, getting it to work with Gemini API required some extra steps. This post shares those solutions to help you utilize JSON mode effectively.
Our JSON Schema: A Tourist Location document
Our example schema represents a TouristLocation. It’s a non-trivial structure with nested objects, lists, enums, and various data types like strings and numbers.
Here’s a simplified version:
{
"name": "string",
"location_long_lat": ["number", "number"],
"climate_type": {"type": "string", "enum": ["tropical", "desert", "temperate", "continental", "polar"]},
"activity_types": ["string"],
"attraction_list": [
{
"name": "string",
"description": "string"
}
],
"tags": ["string"],
"description": "string",
"most_notably_known_for": "string",
"location_type": {"type": "string", "enum": ["city", "country", "establishment", "landmark", "national park", "island", "region", "continent"]},
"parents": ["string"]
}
You can write this type of schema by hand or you can generate it using the Pydantic library. Here is how you can do it on a simplified example:
from typing import List
from pydantic import BaseModel, Field
class TouristLocation(BaseModel):
"""Model for a tourist location"""
high_season_months: List[int] = Field(
[], description="List of months (1-12) when the location is most visited"
)
tags: List[str] = Field(
...,
description="List of tags describing the location (e.g. accessible, sustainable, sunny, cheap, pricey)",
min_length=1,
)
description: str = Field(..., description="Text description of the location")
# Example usage and schema output
location = TouristLocation(
high_season_months=[6, 7, 8],
tags=["beach", "sunny", "family-friendly"],
description="A beautiful beach with white sand and clear blue water.",
)
schema = location.model_json_schema()
print(schema)
This code defines a simplified version of TouristLocation data class using Pydantic. It has three fields:
- high_season_months: A list of integers representing the months of the year (1-12) when the location is most visited. Defaults to an empty list.
- tags: A list of strings describing the location with tags like “accessible”, “sustainable”, etc. This field is required (…) and must have at least one element (min_length=1).
- description: A string field containing a text description of the location. This field is also required.
The code then creates an instance of the TouristLocation class and uses model_json_schema() to get the JSON Schema representation of the model. This schema defines the structure and types of the data expected for this class.
model_json_schema() returns:
{'description': 'Model for a tourist location',
'properties': {'description': {'description': 'Text description of the '
'location',
'title': 'Description',
'type': 'string'},
'high_season_months': {'default': [],
'description': 'List of months (1-12) '
'when the location is '
'most visited',
'items': {'type': 'integer'},
'title': 'High Season Months',
'type': 'array'},
'tags': {'description': 'List of tags describing the location '
'(e.g. accessible, sustainable, sunny, '
'cheap, pricey)',
'items': {'type': 'string'},
'minItems': 1,
'title': 'Tags',
'type': 'array'}},
'required': ['tags', 'description'],
'title': 'TouristLocation',
'type': 'object'}
Now that we have our schema, lets see how we can enforce it. First in Llama.cpp with its Python wrapper and second using Gemini’s API.
Method 1: The Straightforward Approach with Llama.cpp
Llama.cpp, a C++ library for running Llama models locally. It is beginner-friendly and has an active community. We will be using it through its Python wrapper.
Here’s how to generate TouristLocation data with it:
# Imports and stuff
# Model init:
checkpoint = "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"
model = Llama.from_pretrained(
repo_id=checkpoint,
n_gpu_layers=-1,
filename="*Q4_K_M.gguf",
verbose=False,
n_ctx=12_000,
)
messages = [
{
"role": "system",
"content": "You are a helpful assistant that outputs in JSON."
f"Follow this schema {TouristLocation.model_json_schema()}",
},
{"role": "user", "content": "Generate information about Hawaii, US."},
{"role": "assistant", "content": f"{location.model_dump_json()}"},
{"role": "user", "content": "Generate information about Casablanca"},
]
response_format = {
"type": "json_object",
"schema": TouristLocation.model_json_schema(),
}
start = time.time()
outputs = model.create_chat_completion(
messages=messages, max_tokens=1200, response_format=response_format
)
print(outputs["choices"][0]["message"]["content"])
print(f"Time: {time.time() - start}")
The code first imports necessary libraries and initializes the LLM model. Then, it defines a list of messages for a conversation with the model, including a system message instructing the model to output in JSON format according to a specific schema, user requests for information about Hawaii and Casablanca, and an assistant response using the specified schema.
Llama.cpp uses context-free grammars under the hood to constrain the structure and generate valid JSON output for a new city.
In the output we get the following generated string:
{'activity_types': ['shopping', 'food and wine', 'cultural'],
'attraction_list': [{'description': 'One of the largest mosques in the world '
'and a symbol of Moroccan architecture',
'name': 'Hassan II Mosque'},
{'description': 'A historic walled city with narrow '
'streets and traditional shops',
'name': 'Old Medina'},
{'description': 'A historic square with a beautiful '
'fountain and surrounding buildings',
'name': 'Mohammed V Square'},
{'description': 'A beautiful Catholic cathedral built in '
'the early 20th century',
'name': 'Casablanca Cathedral'},
{'description': 'A scenic waterfront promenade with '
'beautiful views of the city and the sea',
'name': 'Corniche'}],
'climate_type': 'temperate',
'description': 'A large and bustling city with a rich history and culture',
'location_type': 'city',
'most_notably_known_for': 'Its historic architecture and cultural '
'significance',
'name': 'Casablanca',
'parents': ['Morocco', 'Africa'],
'tags': ['city', 'cultural', 'historical', 'expensive']}
Which can then be parsed into an instance of our Pydantic class.
Method 2: Overcoming Gemini API’s Quirks
Gemini API, Google’s managed LLM service, claims limited JSON mode support for Gemini Flash 1.5 in its documentation. However, it can be made to work with a few adjustments.
Here are the general instructions to get it to work:
schema = TouristLocation.model_json_schema()
schema = replace_value_in_dict(schema.copy(), schema.copy())
del schema["$defs"]
delete_keys_recursive(schema, key_to_delete="title")
delete_keys_recursive(schema, key_to_delete="location_long_lat")
delete_keys_recursive(schema, key_to_delete="default")
delete_keys_recursive(schema, key_to_delete="default")
delete_keys_recursive(schema, key_to_delete="minItems")
print(schema)
messages = [
ContentDict(
role="user",
parts=[
"You are a helpful assistant that outputs in JSON."
f"Follow this schema {TouristLocation.model_json_schema()}"
],
),
ContentDict(role="user", parts=["Generate information about Hawaii, US."]),
ContentDict(role="model", parts=[f"{location.model_dump_json()}"]),
ContentDict(role="user", parts=["Generate information about Casablanca"]),
]
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
# Using `response_mime_type` with `response_schema` requires a Gemini 1.5 Pro model
model = genai.GenerativeModel(
"gemini-1.5-flash",
# Set the `response_mime_type` to output JSON
# Pass the schema object to the `response_schema` field
generation_config={
"response_mime_type": "application/json",
"response_schema": schema,
},
)
response = model.generate_content(messages)
print(response.text)
Here’s how to overcome Gemini’s limitations:
- Replace $ref with Full Definitions: Gemini stumbles on schema references ($ref). These are used when you have a nested object definition. Replace them with the complete definition from your schema.
def replace_value_in_dict(item, original_schema):
# Source: https://github.com/pydantic/pydantic/issues/889
if isinstance(item, list):
return [replace_value_in_dict(i, original_schema) for i in item]
elif isinstance(item, dict):
if list(item.keys()) == ["$ref"]:
definitions = item["$ref"][2:].split("/")
res = original_schema.copy()
for definition in definitions:
res = res[definition]
return res
else:
return {
key: replace_value_in_dict(i, original_schema)
for key, i in item.items()
}
else:
return item
- Remove Unsupported Keys: Gemini doesn’t yet handle keys like “title”, “AnyOf”, or “minItems”. Remove these from your schema. This has the consequence of a less readable and less restrictive schema but we don’t have another choice if insist on using Gemini.
def delete_keys_recursive(d, key_to_delete):
if isinstance(d, dict):
# Delete the key if it exists
if key_to_delete in d:
del d[key_to_delete]
# Recursively process all items in the dictionary
for k, v in d.items():
delete_keys_recursive(v, key_to_delete)
elif isinstance(d, list):
# Recursively process all items in the list
for item in d:
delete_keys_recursive(item, key_to_delete)
- One-Shot or Few-shot Prompting for Enums: Gemini sometimes struggles with enums, outputting all possible values instead of a single selection. The values are also separated by “|” in a single string, making them invalid according to our schema. Use one-shot prompting, providing a correctly formatted example, to guide it towards the desired behavior.
By applying these transformations and providing clear examples, you can successfully generate structured JSON output with Gemini API.
Conclusion
JSON mode allows you to get structured data directly from your LLMs, making them more useful for practical applications. While frameworks like Llama.cpp offer straightforward implementations, you might encounter issues with cloud services like Gemini API.
Hopefully, this blog allowed you to get a better practical understanding on how JSON mode works and how you can use it even when using Gemini’s API which only has partial support so far.
Now that I was able to get Gemini to somewhat work with JSON mode, I can complete the implementation of my LLM workflow where having data structured in a specific way is necessary.
You can find the main code of this post here: https://gist.github.com/CVxTz/8eace07d9bd2c5123a89bf790b5cc39e
How to Get JSON Output from LLMs: A Practical Guide was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
How to Get JSON Output from LLMs: A Practical Guide
Go Here to Read this Fast! How to Get JSON Output from LLMs: A Practical Guide