Avro to JSON Converter
Avro Schema Input
JSON Output
What this tool does
Paste an Apache Avro schema (the JSON-format schema, not the binary record) above and the tool generates a sample JSON document that conforms to it. Records become nested objects, arrays become JSON arrays, unions are resolved to their first non-null branch, and logical types are rendered as their typical string representation.
Note: this tool converts the schema, not Avro’s binary or JSON wire format. If you have an
.avrofile or a Kafka message body, you need a runtime decoder likeavro-tools tojson, the Confluent CLI, or theavrolibrary in your language.
What is Apache Avro?
Apache Avro is a schema-based data serialization format originally built for Hadoop and now most widely used in the Kafka ecosystem via the Confluent Schema Registry. Unlike protobuf, Avro doesn’t generate code by default — the schema travels with the data (in .avro container files) or is referenced by ID (in Kafka, the first 5 bytes of every message point to a schema in the registry).
The format has three flavours you should be aware of:
- Avro binary — the compact wire format used in Kafka and
.avrofiles. Field names aren’t included; values are positioned by schema order. You can’t decode binary Avro without the schema. - Avro JSON — a verbose JSON encoding defined by the Avro spec. Used mainly for debugging. Union values look like
{"string": "hello"}instead of just"hello". - Regular JSON — what most engineers actually want to see. This tool produces this form, not Avro JSON encoding.
When you encounter Avro schemas
Avro schemas show up most often when you’re working with:
- Apache Kafka topics governed by the Confluent Schema Registry.
- Hadoop / HDFS data lakes storing
.avrocontainer files. - Apache Flink, Apache Spark, or Apache Beam pipelines that read Avro inputs.
- Apache Pulsar with schema-aware producers and consumers.
- AWS Glue Schema Registry or Azure Event Hubs schema management.
- Debezium CDC streams that emit Avro-encoded change events.
If a service hands you a .avsc file or a schema-registry URL, this tool helps you see what a payload conforming to that schema would look like in plain JSON.
Avro’s type system → JSON
| Avro type | JSON representation |
|---|---|
null |
null |
boolean |
true / false |
int, long |
JSON number (long can lose precision if >2^53) |
float, double |
JSON number |
bytes, fixed |
base64 string |
string |
string |
record |
JSON object with named fields |
array |
JSON array |
map |
JSON object with string keys |
enum |
the symbol name as a string |
union ["null", "T"] (nullable) |
null or a value of type T |
logical: date |
string "YYYY-MM-DD" (or a day count in strict Avro JSON encoding) |
logical: timestamp-millis |
string "2026-05-20T10:30:00Z" (or millis-since-epoch) |
logical: decimal |
string "123.45" |
logical: uuid |
string "550e8400-e29b-41d4-a716-446655440000" |
Common pitfalls
1. Avro JSON encoding is not regular JSON. The Avro spec defines its own JSON encoding where union values look like {"string": "hello"} and bytes are encoded as JSON strings of arbitrary characters. If a tool produces output you don’t recognise, check whether it’s emitting Avro JSON or normal JSON.
2. Nullable fields are unions. There’s no nullable: true in Avro. The idiom is ["null", "string"] — a union where one branch is null. Many Avro tools require null to be listed first for the default value to apply correctly.
3. Schema evolution is asymmetric. A reader using schema v2 can read data written with schema v1 only if the changes are backward-compatible: new fields must have defaults, removed fields must have had defaults, types can widen (int → long → float → double) but not narrow. Renaming a field requires an alias.
4. Long values can silently lose precision. Avro long is 64-bit, but JSON numbers lose precision above 2⁵³. Many JSON libraries quietly truncate. If you’re dealing with snowflake IDs, transaction IDs, or nanosecond timestamps, treat them as strings end-to-end.
5. Logical types are layered on top of primitive types. A decimal is physically bytes with logicalType: "decimal". A timestamp-millis is physically long. If your decoder doesn’t understand the logical type, you’ll get the raw underlying value (bytes or a big integer) instead of the formatted string.
6. Default values must match the schema’s first union branch. This bites everyone: if you declare {"type": ["null", "string"], "default": null}, you’re fine. But {"type": ["string", "null"], "default": null} is invalid — the default must match the first type, which is string.
FAQs
What does this tool do?
It takes an Apache Avro schema written in the standard JSON-schema form (with"type": "record", "fields": [...], etc.) and generates a sample JSON document that matches the schema. It’s useful for previewing what an Avro-encoded Kafka message or HDFS record will look like once decoded into plain JSON.Does this tool decode binary Avro records?
No — this tool only works on the schema text. To decode an actual binary Avro file you needavro-tools tojson schema.avsc < data.avro, the Confluent CLI (confluent kafka topic consume --value-format avro), or an Avro library in your language. For Kafka specifically, the first 5 bytes of every Avro-encoded message are a magic byte (0x00) plus a 4-byte schema ID — you must strip those before passing the rest to a raw Avro decoder.What’s the difference between Avro JSON encoding and regular JSON?
Avro defines a JSON encoding in its own spec that’s not the same as the JSON you’d write by hand. The biggest difference is unions: a union value["null", "string"] containing the string "hello" is encoded as {"string": "hello"} in Avro JSON, but as just "hello" in normal JSON. This tool produces the normal-JSON view, which is what most engineers actually want to read.How do union types like [“null”, “string”] appear in JSON?
In normal JSON they appear as eithernull or a value of the non-null type. So a field of type ["null", "string"] with value "hello" is just "hello", and with no value is null. In Avro’s own JSON encoding the same value would be {"string": "hello"} or null. The tool uses the normal JSON form.What about logical types like date, timestamp, and decimal?
Logical types layer a semantic meaning on top of a primitive:date is an int (days since 1970-01-01), timestamp-millis is a long (milliseconds since epoch), decimal is bytes with a fixed precision and scale, and uuid is a string. In a JSON view these are typically rendered as their human-readable form: "2026-05-20", "2026-05-20T10:30:00Z", "123.45", "550e8400-e29b-41d4-a716-446655440000". The tool follows this convention.How do I convert binary Avro data to JSON?
Use one of: (1)avro-tools tojson --pretty schema.avsc < data.avro; (2) java -jar avro-tools.jar tojson data.avro if the file is a container with embedded schema; (3) the Confluent CLI: confluent kafka topic consume --value-format avro topic-name; (4) at runtime, DatumReader<GenericRecord>.read(...).toString() in the Java SDK, avro.io.DatumReader in Python, or avro.Decoder in Go. Remember to strip the Confluent magic-byte + schema-ID header (the first 5 bytes) if you’re consuming raw Kafka messages.Where do I get the schema for a Kafka message?
If your cluster uses the Confluent Schema Registry, the schema ID is encoded in the first 5 bytes of every message. Fetch it withcurl http://schema-registry:8081/schemas/ids/<id> or via the Confluent CLI: confluent schema-registry schema describe --subject <topic>-value. AWS Glue and Azure Event Hubs Schema Registry have equivalent APIs.