| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | Apache Kafka Message Definitions | 
					
						
							|  |  |  | ================================ | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Introduction | 
					
						
							|  |  |  | ------------ | 
					
						
							|  |  |  | The JSON files in this directory define the Apache Kafka message protocol. | 
					
						
							|  |  |  | This protocol describes what information clients and servers send to each | 
					
						
							|  |  |  | other, and how it is serialized.  Note that this version of JSON supports | 
					
						
							|  |  |  | comments.  Comments begin with a double forward slash. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When Kafka is compiled, these specification files are translated into Java code | 
					
						
							|  |  |  | to read and write messages.  Any change to these JSON files will trigger a | 
					
						
							|  |  |  | recompilation of this generated code. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | These specification files replace an older system where hand-written | 
					
						
							|  |  |  | serialization code was used.  Over time, we will migrate all messages to using | 
					
						
							|  |  |  | automatically generated serialization and deserialization code. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Requests and Responses | 
					
						
							|  |  |  | ---------------------- | 
					
						
							|  |  |  | The Kafka protocol features requests and responses.  Requests are sent to a | 
					
						
							|  |  |  | server in order to get a response.  Each request is uniquely identified by a | 
					
						
							|  |  |  | 16-bit integer called the "api key".  The API key of the response will always | 
					
						
							|  |  |  | match that of the request. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Each message has a unique 16-bit version number.  The schema might be different | 
					
						
							|  |  |  | for each version of the message.  Sometimes, the version is incremented even | 
					
						
							|  |  |  | though the schema has not changed.  This may indicate that the server should | 
					
						
							|  |  |  | behave differently in some way.  The version of a response must always match | 
					
						
							|  |  |  | the version of the corresponding request. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Each request or response has a top-level field named "validVersions."  This | 
					
						
							|  |  |  | specifies the versions of the protocol that our code understands.  For example, | 
					
						
							|  |  |  | specifying "0-2" indicates that we understand versions 0, 1, and 2.  You must | 
					
						
							|  |  |  | always specify the highest message version which is supported. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The only old message versions that are no longer supported are version 0 of | 
					
						
							|  |  |  | MetadataRequest and MetadataResponse.  In general, since we adopted KIP-97, | 
					
						
							|  |  |  | dropping support for old message versions is no longer allowed without a KIP. | 
					
						
							|  |  |  | Therefore, please be careful not to increase the lower end of the version | 
					
						
							|  |  |  | support interval for any message. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | MessageData Objects | 
					
						
							|  |  |  | ------------------- | 
					
						
							|  |  |  | Using the JSON files in this directory, we generate Java code for MessageData | 
					
						
							|  |  |  | objects.  These objects store request and response data for kafka.  MessageData | 
					
						
							|  |  |  | objects do not contain a version number.  Instead, a single MessageData object | 
					
						
							|  |  |  | represents every possible version of a Message.  This makes working with | 
					
						
							|  |  |  | messages more convenient, because the same code path can be used for every | 
					
						
							|  |  |  | version of a message. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Fields | 
					
						
							|  |  |  | ------ | 
					
						
							|  |  |  | Each message contains an array of fields.  Fields specify the data that should | 
					
						
							|  |  |  | be sent with the message.  In general, fields have a name, a type, and version | 
					
						
							|  |  |  | information associated with them. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The order that fields appear in a message is important.  Fields which come | 
					
						
							|  |  |  | first in the message definition will be sent first over the network.  Changing | 
					
						
							|  |  |  | the order of the fields in a message is an incompatible change. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In each new message version, we may add or subtract fields.  For example, if we | 
					
						
							|  |  |  | are creating a new version 3 of a message, we can add a new field with the | 
					
						
							|  |  |  | version spec "3+".  This specifies that the field only appears in version 3 and | 
					
						
							|  |  |  | later.  If a field is being removed, we should change its version from "0+" to | 
					
						
							|  |  |  | "0-2" to indicate that it will not appear in version 3 and later. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Field Types | 
					
						
							|  |  |  | ----------- | 
					
						
							|  |  |  | There are several primitive field types available. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-08-07 23:34:53 +08:00
										 |  |  | * "bool": either true or false. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | * "int8": an 8-bit integer. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | * "int16": a 16-bit integer. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-02-26 15:04:25 +08:00
										 |  |  | * "uint16": a 16-bit unsigned integer. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | * "int32": a 32-bit integer. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-02-26 15:04:25 +08:00
										 |  |  | * "uint32": a 32-bit unsigned integer. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | * "int64": a 64-bit integer. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-01-30 20:54:28 +08:00
										 |  |  | * "float64": is a double-precision floating point number (IEEE 754). | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | * "string": a UTF-8 string. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-02-26 15:04:25 +08:00
										 |  |  | * "uuid": a type 4 immutable universally unique identifier. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | * "bytes": binary data. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-02-26 15:04:25 +08:00
										 |  |  | * "records": recordset such as memory recordset. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | In addition to these primitive field types, there is also an array type.  Array | 
					
						
							|  |  |  | types start with a "[]" and end with the name of the element type.  For | 
					
						
							|  |  |  | example, []Foo declares an array of "Foo" objects.  Array fields have their own | 
					
						
							|  |  |  | array of fields, which specifies what is in the contained objects. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | For information about how fields are serialized, see the [Kafka Protocol | 
					
						
							|  |  |  | Guide](https://kafka.apache.org/protocol.html). | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | Nullable Fields | 
					
						
							|  |  |  | --------------- | 
					
						
							| 
									
										
										
										
											2020-01-30 20:54:28 +08:00
										 |  |  | Booleans, ints, and floats can never be null.  However, fields that are strings, | 
					
						
							| 
									
										
										
										
											2021-02-26 15:04:25 +08:00
										 |  |  | bytes, uuid, records, or arrays may optionally be "nullable".  When a field is  | 
					
						
							|  |  |  | "nullable", that simply means that we are prepared to serialize and deserialize | 
					
						
							|  |  |  | null entries for that field. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | If you want to declare a field as nullable, you set "nullableVersions" for that | 
					
						
							| 
									
										
										
										
											2021-02-26 15:04:25 +08:00
										 |  |  | field.  Nullability is implemented as a version range in order to accommodate a | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | very common pattern in Kafka where a field that was originally not nullable | 
					
						
							|  |  |  | becomes nullable in a later version. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If a field is declared as non-nullable, and it is present in the message | 
					
						
							|  |  |  | version you are using, you should set it to a non-null value before serializing | 
					
						
							|  |  |  | the message.  Otherwise, you will get a runtime error. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | Tagged Fields | 
					
						
							|  |  |  | ------------- | 
					
						
							|  |  |  | Tagged fields are an extension to the Kafka protocol which allows optional data | 
					
						
							|  |  |  | to be attached to messages.  Tagged fields can appear at the root level of | 
					
						
							|  |  |  | messages, or within any structure in the message. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Unlike mandatory fields, tagged fields can be added to message versions that | 
					
						
							|  |  |  | already exists.  Older servers will ignore new tagged fields which they do not | 
					
						
							|  |  |  | understand. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In order to make a field tagged, set a "tag" for the field, and also set up | 
					
						
							|  |  |  | tagged versions for the field.  The taggedVersions you specify should be | 
					
						
							|  |  |  | open-ended-- that is, they should specify a start version, but not an end | 
					
						
							|  |  |  | version. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You can remove support for a tagged field from a specific version of a message, | 
					
						
							|  |  |  | but you can't reuse a tag once it has been used for something else.  Once tags | 
					
						
							|  |  |  | have been used for something, they can't be used for anything else, without | 
					
						
							| 
									
										
										
										
											2021-06-19 23:58:57 +08:00
										 |  |  | breaking compatibility. | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | Note that tagged fields can only be added to "flexible" message versions. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-11-04 05:46:20 +08:00
										 |  |  | #### Default Value Handling for Tagged Fields
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In Kafka's serialization mechanism, a tagged field may be omitted from the serialized message  | 
					
						
							|  |  |  | if all its associated fields are equal to their default values, whether those defaults are explicit  | 
					
						
							|  |  |  | or implicit. This behavior optimizes message size by avoiding the transmission of redundant data. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | Flexible Versions | 
					
						
							|  |  |  | ----------------- | 
					
						
							|  |  |  | Kafka serialization has been improved over time to be more flexible and | 
					
						
							|  |  |  | efficient.  Message versions that contain these improvements are referred to as | 
					
						
							|  |  |  | "flexible versions." | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-06-19 23:58:57 +08:00
										 |  |  | In flexible versions, variable-length fields such as strings, arrays, and bytes | 
					
						
							| 
									
										
										
										
											2019-11-09 13:07:48 +08:00
										 |  |  | fields are serialized in a more efficient way that saves space.  The new | 
					
						
							|  |  |  | serialization types start with compact.  For example COMPACT_STRING is a more | 
					
						
							|  |  |  | efficient form of STRING. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | Serializing Messages | 
					
						
							|  |  |  | -------------------- | 
					
						
							|  |  |  | The Message#write method writes out a message to a buffer.  The fields that are | 
					
						
							|  |  |  | written out will depend on the version number that you supply to write().  When | 
					
						
							|  |  |  | you write out a message using an older version, fields that are too old to be | 
					
						
							|  |  |  | present in the schema will be omitted. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When working with older message versions, please verify that the older message | 
					
						
							|  |  |  | schema includes all the data that needs to be sent.  For example, it is probably | 
					
						
							|  |  |  | OK to skip sending a timeout field.  However, a field which radically alters the | 
					
						
							|  |  |  | meaning of the request, such as a "validateOnly" boolean, should not be ignored. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | It's often useful to know how much space a message will take up before writing | 
					
						
							|  |  |  | it out to a buffer.  You can find this out by calling the Message#size method. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Deserializing Messages | 
					
						
							|  |  |  | ---------------------- | 
					
						
							|  |  |  | Message objects may be deserialized using the Message#read method.  This method | 
					
						
							|  |  |  | overwrites all the data currently in the message object with new data. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Any fields in the message object that are not present in the version that you | 
					
						
							|  |  |  | are deserializing will be reset to default values.  Unless a custom default has | 
					
						
							|  |  |  | been set: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * Integer fields default to 0. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-01-30 20:54:28 +08:00
										 |  |  | * Floats default to 0. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | * Booleans default to false. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * Strings default to the empty string. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * Bytes fields default to the empty byte array. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-02-26 15:04:25 +08:00
										 |  |  | * Uuid fields default to zero uuid. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * Records fields default to null. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | * Array fields default to empty. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-06-19 23:58:57 +08:00
										 |  |  | You can specify "null" as a default value for a string field by specifying the | 
					
						
							| 
									
										
										
										
											2019-03-08 11:55:28 +08:00
										 |  |  | literal string "null".  Note that you can only specify null as a default if all | 
					
						
							|  |  |  | versions of the field are nullable. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | Custom Default Values | 
					
						
							|  |  |  | --------------------- | 
					
						
							| 
									
										
										
										
											2020-01-30 20:54:28 +08:00
										 |  |  | You may set a custom default for fields that are integers, booleans, floats, or | 
					
						
							|  |  |  | strings.  Just add a "default" entry in the JSON object.  The custom default | 
					
						
							|  |  |  | overrides the normal default for the type.  So for example, you could make a | 
					
						
							|  |  |  | boolean field default to true rather than false, and so forth. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | Note that the default must be valid for the field type.  So the default for an | 
					
						
							| 
									
										
										
										
											2020-01-30 20:54:28 +08:00
										 |  |  | int16 field must be an integer that fits in 16 bits, and so forth.  You may | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | specify hex or octal values, as long as they are prefixed with 0x or 0.  It is | 
					
						
							|  |  |  | currently not possible to specify a custom default for bytes or array fields. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Custom defaults are useful when an older message version lacked some | 
					
						
							|  |  |  | information.  For example, if an older request lacked a timeout field, you may | 
					
						
							|  |  |  | want to specify that the server should assume that the timeout for such a | 
					
						
							| 
									
										
										
										
											2020-01-30 20:54:28 +08:00
										 |  |  | request is 5000 ms (or some other arbitrary value). | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | Ignorable Fields | 
					
						
							|  |  |  | ---------------- | 
					
						
							|  |  |  | When we write messages using an older or newer format, not all fields may be | 
					
						
							|  |  |  | present.  The message receiver will fill in the default value for the field | 
					
						
							|  |  |  | during deserialization.  Therefore, if the source field was set to a non-default | 
					
						
							|  |  |  | value, that information will be lost. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In some cases, this information loss is acceptable.  For example, if a timeout | 
					
						
							|  |  |  | field does not get preserved, this is not a problem.  However, in other cases, | 
					
						
							|  |  |  | the field is really quite important and should not be discarded.  One example is | 
					
						
							|  |  |  | a "verify only" boolean which changes the whole meaning of the request. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | By default, we assume that information loss is not acceptable.  The message | 
					
						
							|  |  |  | serialization code will throw an exception if the ignored field is not set to | 
					
						
							|  |  |  | the default value.  If information loss for a field is OK, please set | 
					
						
							|  |  |  | "ignorable" to true for the field to disable this behavior.  When ignorable is | 
					
						
							|  |  |  | set to true, the field may be silently omitted during serialization. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Hash Sets | 
					
						
							|  |  |  | --------- | 
					
						
							|  |  |  | One very common pattern in Kafka is to load array elements from a message into | 
					
						
							|  |  |  | a Map or Set for easier access.  The message protocol makes this easier with | 
					
						
							| 
									
										
										
										
											2020-01-30 20:54:28 +08:00
										 |  |  | the "mapKey" concept. | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-01-26 06:06:18 +08:00
										 |  |  | If some of the elements of an array are annotated with "mapKey": true, the | 
					
						
							| 
									
										
										
										
											2019-01-12 08:40:21 +08:00
										 |  |  | entire array will be treated as a linked hash set rather than a list.  Elements | 
					
						
							|  |  |  | in this set will be accessible in O(1) time with an automatically generated | 
					
						
							|  |  |  | "find" function.  The order of elements in the set will still be preserved, | 
					
						
							|  |  |  | however.  New entries that are added to the set always show up as last in the | 
					
						
							|  |  |  | ordering. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Incompatible Changes | 
					
						
							|  |  |  | -------------------- | 
					
						
							|  |  |  | It's very important to avoid making incompatible changes to the message | 
					
						
							|  |  |  | protocol.  Here are some examples of incompatible changes: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | #### Making changes to a protocol version which has already been released.
 | 
					
						
							|  |  |  | Protocol versions that have been released must be regarded as done.  If there | 
					
						
							|  |  |  | were mistakes, they should be corrected in a new version rather than changing | 
					
						
							|  |  |  | the existing version. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | #### Re-ordering existing fields.
 | 
					
						
							|  |  |  | It is OK to add new fields before or after existing fields.  However, existing | 
					
						
							|  |  |  | fields should not be re-ordered with respect to each other. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | #### Changing the default of an existing field.
 | 
					
						
							|  |  |  | You must never change the default of a field which already exists.  Otherwise, | 
					
						
							|  |  |  | new clients and old servers will not agree on the default, and so forth. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | #### Changing the type of an existing field.
 | 
					
						
							|  |  |  | One exception is that an array of primitives may be changed to an array of | 
					
						
							|  |  |  | structures containing the same data, as long as the conversion is done | 
					
						
							|  |  |  | correctly.  The Kafka protocol does not do any "boxing" of structures, so an | 
					
						
							|  |  |  | array of structs that contain a single int32 is the same as an array of int32s. |