Documentation updates for working with DataBuffers

Issue: SPR-17409
2018-10-25 23:43:45 -04:00 · 2018-10-25 23:43:45 -04:00 · 4faee165db
parent 8223ed38c8
commit 4faee165db
3 changed files with 169 additions and 142 deletions
--- a/src/docs/asciidoc/core/core-databuffer-codec.adoc
+++ b/src/docs/asciidoc/core/core-databuffer-codec.adoc
@ -1,156 +1,95 @@
 [[databuffers]]
 = Data Buffers and Codecs

-The `DataBuffer` interface defines an abstraction over byte buffers.
-The main reason for introducing it (and not using the standard `java.nio.ByteBuffer` instead) is Netty.
-Netty does not use `ByteBuffer` but instead offers `ByteBuf` as an alternative.
-Spring's `DataBuffer` is a simple abstraction over `ByteBuf` that can also be used on non-Netty
-platforms (that is, Servlet 3.1+).
+Java NIO provides `ByteBuffer` but many libraries build their own byte buffer API on top,
+especially for network operations where reusing buffers and/or using direct buffers is
+beneficial for performance. For example Netty has the `ByteBuf` hierarchy, Undertow uses
+XNIO, Jetty uses pooled byte buffers with a callback to be released, and so on.
+The `spring-core` module provides a set of abstractions to work with various byte buffer
+APIs as follows:
+
+* <<databuffers-factory>> abstracts the creation of a data buffer.
+* <<databuffers-buffer>> represents a byte buffer, which may be
+<<databuffers-buffer-pooled,pooled>>.
+* <<databuffers-utils>> offers utility methods for data buffers.
+* <<Codecs>> decode or encode streams data buffer streams into higher level objects.




+[[databuffers-factory]]
 == `DataBufferFactory`

-The `DataBufferFactory` offers functionality to allocate new data buffers as well as to wrap
-existing data.
-The `allocateBuffer` methods allocate a new data buffer with a default or given capacity.
-Though `DataBuffer` implementations grow and shrink on demand, it is more efficient to give the
-capacity upfront, if known.
-The `wrap` methods decorate an existing `ByteBuffer` or byte array.
-Wrapping does not involve allocation. It decorates the given data with a `DataBuffer`
-implementation.
+`DataBufferFactory` is used to create data buffers in one of two ways:

-There are two implementation of `DataBufferFactory`: the `NettyDataBufferFactory`
-(for Netty platforms, such as Reactor Netty) and `DefaultDataBufferFactory`
-(for other platforms, such as Servlet 3.1+ servers).
+. Allocate a new data buffer, optionally specifying capacity upfront, if known, which is
+more efficient even though implementations of `DataBuffer` can grow and shrink on demand.
+. Wrap an existing `byte[]` or `java.nio.ByteBuffer`, which decorates the given data with
+a `DataBuffer` implementation and that does not involve allocation.
+
+Note that WebFlux applications do not create a `DataBufferFactory` directly but instead
+access it through the `ServerHttpResponse` or the `ClientHttpRequest` on the client side.
+The type of factory depends on the underlying client or server, e.g.
+`NettyDataBufferFactory` for Reactor Netty, `DefaultDataBufferFactory` for others.




-== The `DataBuffer` Interface
+[[databuffers-buffer]]
+== `DataBuffer`

-The `DataBuffer` interface is similar to `ByteBuffer` but offers a number of advantages.
-Similar to Netty's `ByteBuf`, the `DataBuffer` abstraction offers independent read and write
-positions.
-This is different from the JDK's `ByteBuffer`, which exposes only one position for both reading and
-writing and a separate `flip()` operation to switch between the two  I/O operations.
-In general, the following invariant holds for the read position, write position, and the capacity:
+The `DataBuffer` interface offers similar operations as `java.nio.ByteBuffer` but also
+brings a few additional benefits some of which are inspired by the Netty `ByteBuf`.
+Below is a partial list of benefits:

-====
-[literal]
-[subs="verbatim,quotes"]
--
-	0 <= read position <= write position <= capacity
--
-====
-
-When reading bytes from the `DataBuffer`, the read position is automatically updated in accordance with
-the amount of data read from the buffer.
-Similarly, when writing bytes to the `DataBuffer`, the write position is updated with the amount of
-data written to the buffer.
-Also, when writing data, the capacity of a `DataBuffer` is automatically expanded, in the same fashion as `StringBuilder`,
-`ArrayList`, and similar types.
-
-Besides the reading and writing functionality mentioned above, the `DataBuffer` also has methods to
-view a (slice of a) buffer as a `ByteBuffer`, an `InputStream`, or an `OutputStream`.
-Additionally, it offers methods to determine the index of a given byte.
-
-As mentioned earlier, there are two implementation of `DataBufferFactory`: the `NettyDataBufferFactory`
-(for Netty platforms, such as Reactor Netty) and
-`DefaultDataBufferFactory` (for other platforms, such as
-Servlet 3.1+ servers).
+* Read and write with independent positions, i.e. not requiring a call to `flip()` to
+alternate between read and write.
+* Capacity expanded on demand as with `java.lang.StringBuilder`.
+* Pooled buffers and reference counting via <<databuffers-buffer-pooled>>.
+* View a buffer as `java.nio.ByteBuffer`, `InputStream`, or `OutputStream`.
+* Determine the index, or the last index, for a given byte.



-=== `PooledDataBuffer`

-The `PooledDataBuffer` is an extension to `DataBuffer` that adds methods for reference counting.
-The `retain` method increases the reference count by one.
-The `release` method decreases the count by one and releases the buffer's memory when the count
-reaches 0.
-Both of these methods are related to reference counting, a mechanism that we explain <<databuffer-reference-counting,later>>.
+[[databuffers-buffer-pooled]]
+== `PooledDataBuffer`

-Note that `DataBufferUtils` offers useful utility methods for releasing and retaining pooled data
-buffers.
-These methods take a plain `DataBuffer` as a parameter but only call `retain` or `release` if the
-passed data buffer is an instance of `PooledDataBuffer`.
+As explained in the Javadoc for
+https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html[ByteBuffer],
+byte buffers can be direct or non-direct. Direct buffers may reside outside the Java heap
+which eliminates the need for copying for native I/O operations. That makes direct buffers
+particularly useful for receiving and sending data over a socket, but they're also more
+expensive to create and release, which leads to the idea of pooling buffers.

+`PooledDataBuffer` is an extension of `DataBuffer` that helps with reference counting which
+is essential for byte buffer pooling. How does it work? When a `PooledDataBuffer` is
+allocated the reference count is at 1. Calls to `retain()` increment the count, while
+calls to `release()` decrement it. As long as the count is above 0, the buffer is
+guaranteed not to be released. When the count is decreased to 0, the pooled buffer can be
+released, which in practice could mean the reserved memory for the buffer is returned to
+the memory pool.

-[[databuffer-reference-counting]]
-==== Reference Counting
-
-Reference counting is not a common technique in Java. It is much more common in other programming
-languages, such as Object C and C++.
-In and of itself, reference counting is not complex. It basically involves tracking the number of
-references that apply to an object.
-The reference count of a `PooledDataBuffer` starts at 1, is incremented by calling `retain`,
-and is decremented by calling `release`.
-As long as the buffer's reference count is larger than 0, the buffer is not released.
-When the number decreases to 0, the instance is released.
-In practice, this means that the reserved memory captured by the buffer is returned back to
-the memory pool, ready to be used for future allocations.
-
-In general, the last component to access a `DataBuffer` is responsible for releasing it.
-Within Spring, there are two sorts of components that release buffers: decoders and transports.
-Decoders are responsible for transforming a stream of buffers into other types (see <<codecs>>),
-and transports are responsible for sending buffers across a network boundary, typically as an HTTP message.
-This means that, if you allocate data buffers for the purpose of putting them into an outbound HTTP
-message (that is, a client-side request or server-side response), they do not have to be released.
-The other consequence of this rule is that if you allocate data buffers that do not end up in the
-body (for instance, because of a thrown exception), you have to release them yourself.
-The following snippet shows a typical `DataBuffer` usage scenario when dealing with methods that
-throw exceptions:
-
-====
-[source,java,indent=0]
-[subs="verbatim,quotes"]
----
-	DataBufferFactory factory = ...
-	DataBuffer buffer = factory.allocateBuffer(); <1>
-	boolean release = true; <2>
-	try {
-  		writeDataToBuffer(buffer); <3>
-  		putBufferInHttpBody(buffer);
-  		release = false; <4>
-	}
-	finally {
-  		if (release) {
-			DataBufferUtils.release(buffer); <5>
-		}
-	}
-
-	private void writeDataToBuffer(DataBuffer buffer) throws IOException { <3>
-		...
-	}
----
-
-<1> A new buffer is allocated.
-<2> A boolean flag indicates whether the allocated buffer should be released.
-<3> This example method loads data into the buffer. Note that the method can throw an `IOException`.
-Therefore, a `finally` block to release the buffer is required.
-<4> If no exception occurred, we switch the `release` flag to `false` as the buffer is now
-released as part of sending the HTTP body across the wire.
-<5> If an exception did occur, the flag is still set to `true`, and the buffer is released
-here.
-====
+Note that instead of operating on `PooledDataBuffer` directly, in most cases it's better
+to use the convenience methods in `DataBufferUtils` that apply release or retain to a
+`DataBuffer` only if it is an instance of `PooledDataBuffer`.



-=== `DataBufferUtils`

-The `DataBufferUtils` class contains various utility methods that operate on data buffers.
-It contains methods for reading a `Flux` of `DataBuffer` objects from an `InputStream` or NIO
-`Channel` and methods for writing a data buffer `Flux` to an `OutputStream` or `Channel`.
-`DataBufferUtils` also exposes `retain` and `release` methods that operate on plain `DataBuffer`
-instances (so that casting to a `PooledDataBuffer` is not required).
+[[databuffers-utils]]
+== `DataBufferUtils`
+
+`DataBufferUtils` offers a number of utility methods to operate on data buffers:
+
+* Join a stream of data buffers into a single buffer possibly with zero copy, e.g. via
+composite buffers, if that's supported by the underlying byte buffer API.
+* Turn `InputStream` or NIO `Channel` into `Flux<DataBuffer>`, and vice versa a
+`Publisher<DataBuffer>` into `OutputStream` or NIO `Channel`.
+* Methods to release or retain a `DataBuffer` if the buffer is an instance of
+`PooledDataBuffer`.
+* Skip or take from a stream of bytes until a specific byte count.

-Additionally, `DataBufferUtils` exposes `compose`, which merges a stream of data buffers into one.
-For instance, this method can be used to convert the entire HTTP body into a single buffer (and
-from that, a `String` or `InputStream`).
-This is particularly useful when dealing with older, blocking APIs.
-Note, however, that this puts the entire body in memory, and therefore uses more memory than a pure
-streaming solution would.



@ -158,19 +97,73 @@ streaming solution would.
 [[codecs]]
 == Codecs

-The `org.springframework.core.codec` package contains the two main abstractions for converting a
-stream of bytes into a stream of objects or vice-versa.
-The `Encoder` is a strategy interface that encodes a stream of objects into an output stream of
-data buffers.
-The `Decoder` does the reverse: It turns a stream of data buffers into a stream of objects.
-Note that a decoder instance needs to consider <<databuffer-reference-counting,reference counting>>.
+The `org.springframework.core.codec` package provides the following stragy interfaces:

-Spring comes with a wide array of default codecs (to convert from and to `String`,
-`ByteBuffer`, and byte arrays) and codecs that support marshalling libraries such as JAXB and
-Jackson (with https://github.com/FasterXML/jackson-core/issues/57[Jackson 2.9+ support for non-blocking parsing]).
-Within the context of Spring WebFlux, codecs are used to convert the request body into a
-`@RequestMapping` parameter or to convert the return type into the response body that is sent back
-to the client.
-The default codecs are configured in the `WebFluxConfigurationSupport` class. You can
-change them by overriding the `configureHttpMessageCodecs` when you inherit from that class.
-For more information about using codecs in WebFlux, see <<web-reactive#webflux-codecs>>.
+* `Encoder` to encode `Publisher<T>` into a stream of data buffers.
+* `Decoder` to decode `Publisher<DataBuffer>` into a stream of higher level objects.
+
+The `spring-core` module provides `byte[]`, `ByteBuffer`, `DataBuffer`, `Resource`, and
+`String` encoder and decoder implementations. The `spring-web` module adds Jackson JSON,
+Jackson Smile, JAXB2, Protocol Buffers and other encoders and decoders. See
+<<web-reactive.adoc#webflux-codecs,Codecs>> in the WebFlux section.
+
+
+
+
+[[databuffers-using]]
+== Using `DataBuffer`
+
+When working with data buffers, special care must be taken to ensure buffers are released
+since they may be <<databuffers-buffer-pooled,pooled>>. We'll use codecs to illustrate
+how that works but the concepts apply more generally. Let's see what codecs must do
+internally to manage data buffers.
+
+A `Decoder` is the last to read input data buffers, before creating higher level
+objects, and therefore it must release them as follows:
+
+. If a `Decoder` simply reads each input buffer and is ready to
+release it immediately, it can do so via `DataBufferUtils.release(dataBuffer)`.
+. If a `Decoder` is using `Flux` or `Mono` operators such as `flatMap`, `reduce`, and
+others that prefetch and cache data items internally, or is using operators such as
+`filter`, `skip`, and others that leave out items, then
+`doOnDiscard(PooledDataBuffer.class, DataBufferUtils::release)` must be added to the
+composition chain to ensure such buffers are released prior to being discarded, possibly
+also as a result an error or cancellation signal.
+. If a `Decoder` holds on to one or more data buffers in any other way, it must
+ensure they are released when fully read, or in case an error or cancellation signals that
+take place before the cached data buffers have been read and released.
+
+Note that `DataBufferUtils#join` offers a safe and efficient way to aggregate a data
+buffer stream into a single data buffer. Likewise `skipUntilByteCount` and
+`takeUntilByteCount` are additional safe methods for decoders to use.
+
+An `Encoder` allocates data buffers that others must read (and release). So an `Encoder`
+doesn't have much to do. However an `Encoder` must take care to release a data buffer if
+a serialization error occurs while populating the buffer with data. For example:
+
+====
+[source,java,indent=0]
+[subs="verbatim,quotes"]
+----
+	DataBuffer buffer = factory.allocateBuffer();
+	boolean release = true;
+	try {
+		// serialize and populate buffer..
+		release = false;
+	}
+	finally {
+		if (release) {
+			DataBufferUtils.release(buffer);
+		}
+	}
+	return buffer;
+----
+====
+
+The consumer of an `Encoder` is responsible for releasing the data buffers it receives.
+In a WebFlux application, the output of the `Encoder` is used to write to the HTTP server
+response, or to the client HTTP request, in which case releasing the data buffers is the
+responsibility of the code writing to the server response, or to the client request.
+
+Note that when running on Netty, there are debugging options for
+https://github.com/netty/netty/wiki/Reference-counted-objects#troubleshooting-buffer-leaks[troubleshooting buffer leaks].
--- a/src/docs/asciidoc/web/webflux-websocket.adoc
+++ b/src/docs/asciidoc/web/webflux-websocket.adoc
@ -204,6 +204,22 @@ class ExampleHandler implements WebSocketHandler {



+[[webflux-websocket-databuffer]]
+=== `DataBuffer`
+
+`DataBuffer` is the representation for a byte buffer in WebFlux. The Spring Core part of
+the reference has more on that in the section on
+<<core#databuffers,Data Buffers and Codecs>>. The key point to understand is that on some
+servers like Netty, byte buffers are pooled and reference counted, and must be released
+when consumed to avoid memory leaks.
+
+When running on Netty, applications must use `DataBufferUtils.retain(dataBuffer)` if they
+wish to hold on input data buffers in order to ensure they are not released, and
+subsequently use `DataBufferUtils.release(dataBuffer)` when the buffers are consumed.
+
+
+
+
 [[webflux-websocket-server-handshake]]
 === Handshake
 [.small]#<<web.adoc#websocket-server-handshake,Same as in the Servlet stack>>#
--- a/src/docs/asciidoc/web/webflux.adoc
+++ b/src/docs/asciidoc/web/webflux.adoc
@ -671,7 +671,7 @@ to encode and decode HTTP message content.
 application, while a `Decoder` can be wrapped with `DecoderHttpMessageReader`.
 * {api-spring-framework}/core/io/buffer/DataBuffer.html[`DataBuffer`] abstracts different
 byte buffer representations (e.g. Netty `ByteBuf`, `java.nio.ByteBuffer`, etc.) and is
-what all codecs work on. See <<core#databuffers, Data Buffers and Codecs>> in the
+what all codecs work on. See <<core#databuffers,Data Buffers and Codecs>> in the
 "Spring Core" section for more on this topic.

 The `spring-core` module provides `byte[]`, `ByteBuffer`, `DataBuffer`, `Resource`, and
@ -741,7 +741,7 @@ consistently for access to the cached form data versus reading from the raw requ


 [[webflux-codecs-multipart]]
-==== Multipart Data
+==== Multipart

 `MultipartHttpMessageReader` and `MultipartHttpMessageWriter` support decoding and
 encoding "multipart/form-data" content. In turn `MultipartHttpMessageReader` delegates to
@ -772,6 +772,24 @@ comment-only, empty SSE event or any other "no-op" data that would effectively s
 a heartbeat.


+[[webflux-codecs-buffers]]
+==== `DataBuffer`
+
+`DataBuffer` is the representation for a byte buffer in WebFlux. The Spring Core part of
+the reference has more on that in the section on
+<<core#databuffers,Data Buffers and Codecs>>. The key point to understand is that on some
+servers like Netty, byte buffers are pooled and reference counted, and must be released
+when consumed to avoid memory leaks.
+
+WebFlux applications generally do not need to be concerned with such issues, unless they
+consume or produce data buffers directly, as opposed to relying on codecs to convert to
+and from higher level objects. Or unless they choose to create custom codecs. For such
+cases please review the the information in <<core#databuffers,Data Buffers and Codecs>>,
+especially the section on <<core#databuffers-using,Using DataBuffer>>.
+
+
+
+

 [[webflux-logging]]
 === Logging