mirror of https://github.com/openssl/openssl.git
				
				
				
			Add dgram API discussion
Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18210)
This commit is contained in:
		
							parent
							
								
									de85a9de3f
								
							
						
					
					
						commit
						b80395efc9
					
				|  | @ -0,0 +1,487 @@ | |||
| Datagram BIO API revisions for sendmmsg/recvmmsg | ||||
| ================================================ | ||||
| 
 | ||||
| We need to evolve the API surface of BIO which is relevant to BIO_dgram (and the | ||||
| eventual BIO_dgram_mem) to support APIs which allow multiple datagrams to be | ||||
| sent or received simultaneously, such as sendmmsg(2)/recvmmsg(2). | ||||
| 
 | ||||
| The adopted design | ||||
| ------------------ | ||||
| 
 | ||||
| ### Design decisions | ||||
| 
 | ||||
| The adopted design makes the following design decisions: | ||||
| 
 | ||||
| - We use a sendmmsg/recvmmsg-like API. The alternative API was not considered | ||||
|   for adoption because it is an explicit goal that the adopted API be suitable | ||||
|   for concurrent use on the same BIO. | ||||
| 
 | ||||
| - We define our own structures rather than using the OS's `struct mmsghdr`. | ||||
|   The motivations for this are: | ||||
| 
 | ||||
|   - It ensures portability between OSes and allows the API to be used | ||||
|     on OSes which do not support `sendmmsg` or `sendmsg`. | ||||
| 
 | ||||
|   - It allows us to use structures in keeping with OpenSSL's existing | ||||
|     abstraction layers (e.g. `BIO_ADDR` rather than `struct sockaddr`). | ||||
| 
 | ||||
|   - We do not have to expose functionality which we cannot guarantee | ||||
|     we can support on all platforms (for example, arbitrary control messages). | ||||
| 
 | ||||
|   - It avoids the need to include OS headers in our own public headers, | ||||
|     which would pollute the environment of applications which include | ||||
|     our headers, potentially undesirably. | ||||
| 
 | ||||
| - For OSes which do not support `sendmmsg`, we emulate it using repeated | ||||
|   calls to `sendmsg`. For OSes which do not support `sendmsg`, we emulate it | ||||
|   using `sendto` to the extent feasible. This avoids the need for code consuming | ||||
|   these new APIs to define a fallback code path. | ||||
| 
 | ||||
| - We do not define any flags at this time, as the flags previously considered | ||||
|   for adoption cannot be supported on all platforms (Win32 does not have | ||||
|   `MSG_DONTWAIT`). | ||||
| 
 | ||||
| - We ensure the extensibility of our `BIO_MSG` structure in a way that preserves | ||||
|   ABI compatibility using a `stride` argument which callers must set to | ||||
|   `sizeof(BIO_MSG)`. Implementations can examine the stride field to determine | ||||
|   whether a given field is part of a `BIO_MSG`. This allows us to add optional | ||||
|   fields to `BIO_MSG` at a later time without breaking ABI. All new fields must | ||||
|   be added to the end of the structure. | ||||
| 
 | ||||
| - The BIO methods are designed to support stateless operation in which they | ||||
|   are simply calls to the equivalent system calls, where supported, without | ||||
|   changing BIO state. In particular, this means that things like retry flags are | ||||
|   not set or cleared by `BIO_sendmmsg` or `BIO_recvmmsg`. | ||||
| 
 | ||||
|   The motivation for this is that these functions are intended to support | ||||
|   concurrent use on the same BIO. If they read or modify BIO state, they would | ||||
|   need to be sychronised with a lock, undermining performance on what (for | ||||
|   `BIO_dgram`) would otherwise be a straight system call. | ||||
| 
 | ||||
| - We do not support iovecs. The motivations for this are: | ||||
| 
 | ||||
|   - Not all platforms can support iovecs (e.g. Windows). | ||||
| 
 | ||||
|   - The only way we could emulate iovecs on platforms which don't support | ||||
|     them is by copying the data to be sent into a staging buffer. This would | ||||
|     defeat all of the advantages of iovecs and prevent us from meeting our | ||||
|     zero/single-copy requirements. Moreover, it would lead to extremely | ||||
|     surprising performance variations for consumers of the API. | ||||
| 
 | ||||
|   - We do not believe iovecs are needed to meet our performance requirements | ||||
|     for QUIC. The reason for this is that aside from a minimal packet header, | ||||
|     all data in QUIC is encrypted, so all data sent via QUIC must pass through | ||||
|     an encrypt step anyway, meaning that all data sent will already be copied | ||||
|     and there is not going to be any issue depositing the ciphertext in a | ||||
|     staging buffer together with the frame header. | ||||
| 
 | ||||
|   - Even if we did support iovecs, we would have to impose a limit | ||||
|     on the number of iovecs supported, because we translate from our own | ||||
|     structures (as discussed above) and also intend these functions to be | ||||
|     stateless and not requiire locking. Therefore the OS-native iovec structures | ||||
|     would need to be allocated on the stack. | ||||
| 
 | ||||
| - Sometimes, an application may wish to learn the local interface address | ||||
|   associated with a receive operation or specify the local interface address to | ||||
|   be used for a send operation. We support this, but require this functionality | ||||
|   to be explicitly enabled before use. | ||||
| 
 | ||||
|   The reason for this is that enabling this functionality generally requires | ||||
|   that the socket be reconfigured using `setsockopt` on most platforms. Doing | ||||
|   this on-demand would require state in the BIO to determine whether this | ||||
|   functionality is currently switched on, which would require otherwise | ||||
|   unnecessary locking, undermining performance in concurrent usage of this API | ||||
|   on a given BIO. By requiring this functionality to be enabled explicitly | ||||
|   before use, this allows this initialization to be done up front without | ||||
|   performance cost. It also aids users of the API to understand that this | ||||
|   functionality is not always available and to detect when this functionality is | ||||
|   available in advance. | ||||
| 
 | ||||
| ### Design | ||||
| 
 | ||||
| The currently proposed design is as follows: | ||||
| 
 | ||||
| ```c | ||||
| typedef struct bio_msg_st { | ||||
|     void *data; | ||||
|     size_t data_len; | ||||
|     BIO_ADDR *peer, *local; | ||||
|     uint64_t flags; | ||||
| } BIO_MSG; | ||||
| 
 | ||||
| #define BIO_UNPACK_ERRNO(e)     /*...*/ | ||||
| #define BIO_IS_ERRNO(e)         /*...*/ | ||||
| 
 | ||||
| ossl_ssize_t BIO_sendmmsg(BIO *b, BIO_MSG *msg, size_t stride, | ||||
|                           size_t num_msg, uint64_t flags); | ||||
| ossl_ssize_t BIO_recvmmsg(BIO *b, BIO_MSG *msg, size_t stride, | ||||
|                           size_t num_msg, uint64_t flags); | ||||
| ``` | ||||
| 
 | ||||
| The API is used as follows: | ||||
| 
 | ||||
| - `msg` points to an array of `num_msg` `BIO_MSG` structures. | ||||
| 
 | ||||
| - Both functions have identical prototypes, and return the number of messages | ||||
|   processed in the array. If no messages were sent due to an error, `-1` is | ||||
|   returned. If an OS-level socket error occurs, a negative value `v` is | ||||
|   returned. The caller should determine that `v` is an OS-level socket error by | ||||
|   calling `BIO_IS_ERRNO(v)` and may obtain the OS-level socket error code by | ||||
|   calling `BIO_UNPACK_ERRNO(v)`. | ||||
| 
 | ||||
| - `stride` must be set to `sizeof(BIO_MSG)`. | ||||
| 
 | ||||
| - `data` points to the buffer of data to be sent or to be filled with received | ||||
|   data. `data_len` is the size of the buffer in bytes on call. If the | ||||
|   given message in the array is processed (i.e., if the return value | ||||
|   exceeds the index of that message in the array), `data_len` is updated | ||||
|   to the actual amount of data sent or received at return time. | ||||
| 
 | ||||
| - `flags` in the `BIO_MSG` structure provides per-message flags to | ||||
|   the `BIO_sendmmsg` or `BIO_recvmmsg` call. If the given message in the array | ||||
|   is processed, `flags` is written with zero or more result flags at return | ||||
|   time. The `flags` argument to the call itself provides for global flags | ||||
|   affecting all messages in the array. Currently, no per-message or global flags | ||||
|   are defined and all of these fields are set to zero on call and on return. | ||||
| 
 | ||||
| - `peer` and `local` are optional pointers to `BIO_ADDR` structures into | ||||
|   which the remote and local addresses are to be filled. If either of these | ||||
|   are NULL, the given addressing information is not requested. Local address | ||||
|   support may not be available in all circumstances, in which case processing of | ||||
|   the message fails. (This means that the function returns the number of | ||||
|   messages processed, or -1 if the message in question is the first message.) | ||||
| 
 | ||||
|   Support for `local` must be explicitly enabled before use, otherwise | ||||
|   attempts to use it fail. | ||||
| 
 | ||||
| Local address support is enabled as follows: | ||||
| 
 | ||||
| ```c | ||||
| int BIO_dgram_set_local_addr_enable(BIO *b, int enable); | ||||
| int BIO_dgram_get_local_addr_enable(BIO *b); | ||||
| int BIO_dgram_get_local_addr_cap(BIO *b); | ||||
| ``` | ||||
| 
 | ||||
| `BIO_dgram_get_local_addr_cap()` returns 1 if local address support is | ||||
| available. It is then enabled using `BIO_dgram_set_local_addr_enable()`, which | ||||
| fails if support is not available. | ||||
| 
 | ||||
| Options which were considered | ||||
| ----------------------------- | ||||
| 
 | ||||
| Options for the API surface which were considered included: | ||||
| 
 | ||||
| ### sendmmsg/recvmmsg-like API | ||||
| 
 | ||||
| This design was chosen to form the basis of the adopted design, which is | ||||
| described above. | ||||
| 
 | ||||
| ```c | ||||
| int BIO_readm(BIO *b, BIO_mmsghdr *msgvec, | ||||
|               unsigned len, int flags, struct timespec *timeout); | ||||
| int BIO_writem(BIO *b, BIO_mmsghdr *msgvec, | ||||
|               unsigned len, int flags, struct timespec *timeout); | ||||
| ``` | ||||
| 
 | ||||
| We can either define `BIO_mmsghdr` as a typedef of `struct mmsghdr` or redefine | ||||
| an equivalent structure. The former has the advantage that we can just pass the | ||||
| structures through to the syscall without copying them. | ||||
| 
 | ||||
| Note that in `BIO_mem_dgram` we will have to process and therefore understand | ||||
| the contents of `struct mmsghdr` ourselves. Therefore, initially we define a | ||||
| subset of `struct mmsghdr` as being supported, specifically no control messages; | ||||
| `msg_name` and `msg_iov` only. | ||||
| 
 | ||||
| The flags argument is defined by us. Initially we can support something like | ||||
| `MSG_DONTWAIT` (say, `BIO_DONTWAIT`). | ||||
| 
 | ||||
| #### Implementation Questions | ||||
| 
 | ||||
| If we go with this, there are some issues that arise: | ||||
| 
 | ||||
| - Are `BIO_mmsghdr`, `BIO_msghdr` and `BIO_iovec` simple typedefs | ||||
|   for OS-provided structures, or our own independent structure | ||||
|   definitions? | ||||
| 
 | ||||
|   - If we use OS-provided structures: | ||||
| 
 | ||||
|     - We would need to include the OS headers which provide these | ||||
|       structures in our public API headers. | ||||
| 
 | ||||
|     - If we choose to support these functions when OS support is not available | ||||
|       (see discussion below), We would need to define our own structures in this | ||||
|       case (a “polyfill” approach). | ||||
| 
 | ||||
|   - If we use our own structures: | ||||
| 
 | ||||
|     - We would need to translate these structures during every call. | ||||
| 
 | ||||
|       But we would need to have storage inside the BIO_dgram for *m* `struct | ||||
|       msghdr`, *m\*v* iovecs, etc. Since we want to support multithreaded use | ||||
|       these allocations probably will need to be on the stack, and therefore | ||||
|       must be limited. | ||||
| 
 | ||||
|       Limiting *m* isn't a problem, because `sendmmsg` returns the number | ||||
|       of messages sent, so the existing semantics we are trying to match | ||||
|       lets us just send or receive fewer messages than we were asked to. | ||||
| 
 | ||||
|       However, it does seem like we will need to limit *v*, the number of iovecs | ||||
|       per message. So what limit should we give to *v*, the number of iovecs? We | ||||
|       will need a fixed stack allocation of OS iovec structures and we can | ||||
|       allocate from this stack allocation as we iterate through the `BIO_msghdr` | ||||
|       we have been given. So in practice we could just only send messages | ||||
|       until we reach our iovec limit, and then return. | ||||
| 
 | ||||
|       For example, suppose we allocate 64 iovecs internally: | ||||
| 
 | ||||
|       ```c | ||||
|       struct iovec vecs[64]; | ||||
|       ``` | ||||
| 
 | ||||
|       If the first message passed to a call to `BIO_writem` has 64 iovecs | ||||
|       attached to it, no further messages can be sent and `BIO_writem` | ||||
|       returns 1. | ||||
| 
 | ||||
|       If three messages are sent, with 32, 32, and 1 iovecs respectively, | ||||
|       the first two messages are sent and `BIO_writem` returns 2. | ||||
| 
 | ||||
|       So the only important thing we would need to document in this API | ||||
|       is the limit of iovecs on a single message; in other words, the | ||||
|       number of iovecs which must not be exceeded if a forward progress | ||||
|       guarantee is to be made. e.g. if we allocate 64 iovecs internally, | ||||
|       `BIO_writem` with a single message with 65 iovecs will never work | ||||
|       and this becomes part of the API contract. | ||||
| 
 | ||||
|       Obviously these quantities of iovecs are unrealistically large. | ||||
|       iovecs are small, so we can afford to set the limit high enough | ||||
|       that it shouldn't cause any problems in practice. We can increase | ||||
|       the limit later without a breaking API change, but we cannot decrease | ||||
|       it later. So we might want to start with something small, like 8. | ||||
| 
 | ||||
| - We also need to decide what to do for OSes which don't support at least | ||||
|   `sendmsg`/`recvmsg`. | ||||
| 
 | ||||
|   - Don't provide these functions and require all users of these functions to | ||||
|     have an alternate code path which doesn't rely on them? | ||||
| 
 | ||||
|     - Not providing these functions on OSes that don't support | ||||
|       at least sendmsg/recvmsg is a simple solution but adds | ||||
|       complexity to code using BIO_dgram. (Though it does communicate | ||||
|       to code more realistic performance expectations since it | ||||
|       knows when these functions are actually available.) | ||||
| 
 | ||||
|   - Provide these functions and emulate the functionality: | ||||
| 
 | ||||
|     - However there is a question here as to how we implement | ||||
|       the iovec arguments on platforms without `sendmsg`/`recvmsg`. (We cannot | ||||
|       use `writev`/`readv` because we need peer address information.) Logically | ||||
|       implementing these would then have to be done by copying buffers around | ||||
|       internally before calling `sendto`/`recvfrom`, defeating the point of | ||||
|       iovecs and providing a performance profile which is surprising to code | ||||
|       using BIO_dgram. | ||||
| 
 | ||||
|     - Another option could be a variable limit on the number of iovecs, | ||||
|       which can be queried from BIO_dgram. This would be a constant set | ||||
|       when libcrypto is compiled. It would be 1 for platforms not supporting | ||||
|       `sendmsg`/`recvmsg`. This again adds burdens on the code using | ||||
|       BIO_dgram, but it seems the only way to avoid the surprising performance | ||||
|       pitfall of buffer copying to emulate iovec support. There is a fair risk | ||||
|       of code being written which accidentially works on one platform but not | ||||
|       another, because the author didn't realise the iovec limit is 1 on some | ||||
|       platforms. Possibly we could have an “iovec limit” variable in the | ||||
|       BIO_dgram which is 1 by default, which can be increased by a call to a | ||||
|       function BIO_set_iovec_limit, but not beyond the fixed size discussed | ||||
|       above. It would return failure if not possible and this would give client | ||||
|       code a clear way to determine if its expectations are met. | ||||
| 
 | ||||
| ### Alternate API | ||||
| 
 | ||||
| Could we use a simplified API? For example, could we have an API that returns | ||||
| one datagram where BIO_dgram uses `readmmsg` internally and queues the returned | ||||
| datagrams, thereby still avoiding extra syscalls but offering a simple API. | ||||
| 
 | ||||
| The problem here is we want to support “single-copy” (where the data is only | ||||
| copied as it is decrypted). Thus BIO_dgram needs to know the final resting place | ||||
| of encrypted data at the time it makes the `readmmsg` call. | ||||
| 
 | ||||
| One option would be to allow the user to set a callback on BIO_dgram it can use | ||||
| to request a new buffer, then have an API which returns the buffer: | ||||
| 
 | ||||
| ```c | ||||
| int BIO_dgram_set_read_callback(BIO *b, | ||||
|                                 void *(*cb)(size_t len, void *arg), | ||||
|                                 void *arg); | ||||
| int BIO_dgram_set_read_free_callback(BIO *b, | ||||
|                                      void (*cb)(void *buf, | ||||
|                                                 size_t buf_len, | ||||
|                                                 void *arg), | ||||
|                                      void *arg); | ||||
| int BIO_read_dequeue(BIO *b, void **buf, size_t *buf_len); | ||||
| ``` | ||||
| 
 | ||||
| The BIO_dgram calls the specified callback when it needs to generate internal | ||||
| iovecs for its `readmmsg` call, and the received datagrams can then be popped by | ||||
| the application and freed as it likes. (The read free callback above is only | ||||
| used in rare circumstances, such as when calls to `BIO_read` and | ||||
| `BIO_read_dequeue` are alternated, or when the BIO_dgram is destroyed prior to | ||||
| all read buffers being dequeued; see below.) For convenience we could have an | ||||
| extra call to allow a buffer to be pushed back into the BIO_dgram's internal | ||||
| queue of unused read buffers, which avoids the need for the application to do | ||||
| its own management of such recycled buffers: | ||||
| 
 | ||||
| ```c | ||||
| int BIO_dgram_push_read_buffer(BIO *b, void *buf, size_t buf_len); | ||||
| ``` | ||||
| 
 | ||||
| On the write side, the application provides buffers and can get a callback when | ||||
| they are freed. BIO_write_queue just queues for transmission, and the `sendmmsg` | ||||
| call is made when calling `BIO_flush`. (TBD: whether it is reasonable to | ||||
| overload the semantics of BIO_flush in this way.) | ||||
| 
 | ||||
| ```c | ||||
| int BIO_dgram_set_write_done_callback(BIO *b, | ||||
|                                       void (*cb)(const void *buf, | ||||
|                                                  size_t buf_len, | ||||
|                                                  int status, | ||||
|                                                  void *arg), | ||||
|                                       void *arg); | ||||
| int BIO_write_queue(BIO *b, const void *buf, size_t buf_len); | ||||
| int BIO_flush(BIO *b); | ||||
| ``` | ||||
| 
 | ||||
| The status argument to the write done callback will be 1 on success, some | ||||
| negative value on failure, and some special negative value if the BIO_dgram is | ||||
| being freed before the write could be completed. | ||||
| 
 | ||||
| For send/receive addresses, we import the `BIO_(set|get)_dgram_(origin|dest)` | ||||
| APIs proposed in the sendmsg/recvmsg PR (#5257). `BIO_get_dgram_(origin|dest)` | ||||
| should be called immediately after `BIO_read_dequeue` and | ||||
| `BIO_set_dgram_(origin|dest)` should be called immediately before | ||||
| `BIO_write_queue`. | ||||
| 
 | ||||
| This approach allows `BIO_dgram` to support myriad options via composition of | ||||
| successive function calls in a “builder” style rather than via a single function | ||||
| call with an excessive number of arguments or pointers to unwieldy ever-growing | ||||
| argument structures, requiring constant revision of the central read/write | ||||
| functions of the BIO API. | ||||
| 
 | ||||
| Note that since `BIO_set_dgram_(origin|dest)` sets data on outgoing packets and | ||||
| `BIO_get_dgram_(origin|dest)` gets data on incoming packets, it doesn't follow | ||||
| that these are accessing the same data (they are not setters and getters of a | ||||
| variables called "dgram origin" and "dgram destination", even though they look | ||||
| like setters and getters of the same variables from the name.) We probably want | ||||
| to separate these as there is no need for a getter for outgoing packet | ||||
| destination, for example, and by separating these we allow the possibility of | ||||
| multithreaded use (one thread reads, one thread writes) in the future. Possibly | ||||
| we should choose less confusing names for these functions. Maybe | ||||
| `BIO_set_outgoing_dgram_(origin|dest)` and | ||||
| `BIO_get_incoming_dgram_(origin|dest)`. | ||||
| 
 | ||||
| Pros of this approach: | ||||
| 
 | ||||
|   - Application can generate one datagram at a time and still get the advantages | ||||
|     of sendmmsg/recvmmsg (fewer syscalls, etc.) | ||||
| 
 | ||||
|     We probably want this for our own QUIC implementation built on top of this | ||||
|     anyway. Otherwise we will need another piece to do basically the same thing | ||||
|     and agglomerate multiple datagrams into a single BIO call. Unless we only | ||||
|     want use `sendmmsg` constructively in trivial cases (e.g. where we send two | ||||
|     datagrams from the same function immediately after one another... doesn't | ||||
|     seem like a common use case.) | ||||
| 
 | ||||
|   - Flexible support for single-copy (zero-copy). | ||||
| 
 | ||||
| Cons of this approach: | ||||
| 
 | ||||
|   - Very different way of doing reads/writes might be strange to existing | ||||
|     applications. *But* the primary consumer of this new API will be our own | ||||
|     QUIC implementation so probably not a big deal. We can always support | ||||
|     `BIO_read`/`BIO_write` as a less efficient fallback for existing third party | ||||
|     users of BIO_dgram. | ||||
| 
 | ||||
| #### Compatibility interop | ||||
| 
 | ||||
| Suppose the following sequence happens: | ||||
| 
 | ||||
| 1. BIO_read (legacy call path) | ||||
| 2. BIO_read_dequeue (`recvmmsg` based call path with callback-allocated buffer) | ||||
| 3. BIO_read (legacy call path) | ||||
| 
 | ||||
| For (1) we have two options | ||||
| 
 | ||||
| a. Use `recvmmsg` and add the received datagrams to an RX queue just as for the | ||||
|    `BIO_read_dequeue` path. We use an OpenSSL-provided default allocator | ||||
|    (`OPENSSL_malloc`) and flag these datagrams as needing to be freed by OpenSSL, | ||||
|    not the application. | ||||
| 
 | ||||
|    When the application calls `BIO_read`, a copy is performed and the internal | ||||
|    buffer is freed. | ||||
| 
 | ||||
| b. Use `recvfrom` directly. This means we have a `recvmmsg` path and a | ||||
|    `recvfrom` path depending on what API is being used. | ||||
| 
 | ||||
|    The disadvantage of (a) is it yields an extra copy relative to what we have now, | ||||
|    whereas with (b) the buffer passed to `BIO_read` gets passed through to the | ||||
|    syscall and we do not have to copy anything. | ||||
| 
 | ||||
|    Since we will probably need to support platforms without | ||||
|    `sendmmsg`/`recvmmsg` support anyway, (b) seems like the better option. | ||||
| 
 | ||||
| For (2) the new API is used. Since the previous call to BIO_read is essentially | ||||
| “stateless” (it's just a simple call to `recvfrom`, and doesn't require mutation | ||||
| of any internal BIO state other than maybe the last datagram source/destination | ||||
| address fields), BIO_dgram can go ahead and start using the `recvmmsg` code | ||||
| path. Since the RX queue will obviously be empty at this point, it is | ||||
| initialised and filled using `recvmmsg`, then one datagram is popped from it. | ||||
| 
 | ||||
| For (3) we have a legacy `BIO_read` but we have several datagrams still in the | ||||
| RX queue. In this case we do have to copy - we have no choice. However this only | ||||
| happens in circumstances where a user of BIO_dgram alternates between old and | ||||
| new APIs, which should be very unusual. | ||||
| 
 | ||||
| Subsequently for (3) we have to free the buffer using the free callback. This is | ||||
| an unusual case where BIO_dgram is responsible for freeing read buffers and not | ||||
| the application (the only other case being premature destruction, see below). | ||||
| But since this seems a very strange API usage pattern, we may just want to fail | ||||
| in this case. | ||||
| 
 | ||||
| Probably not worth supporting this. So we can have the following rule: | ||||
| 
 | ||||
| - After the first call to `BIO_read_dequeue` is made on a BIO_dgram, all | ||||
|   subsequent calls to ordinary `BIO_read` will fail. | ||||
| 
 | ||||
| Of course, all of the above applies analogously to the TX side. | ||||
| 
 | ||||
| #### BIO_dgram_pair | ||||
| 
 | ||||
| We will also implement from scratch a BIO_dgram_pair. This will be provided as a | ||||
| BIO pair which provides identical semantics to the BIO_dgram above, both for the | ||||
| legacy and zero-copy code paths. | ||||
| 
 | ||||
| #### Thread safety | ||||
| 
 | ||||
| It is a functional assumption of the above design that we would never want to | ||||
| have more than one thread doing TX on the same BIO and never have more than one | ||||
| thread doing RX on the same BIO. | ||||
| 
 | ||||
| If we did ever want to do this, multiple BIOs on the same FD is one possibility | ||||
| (for the BIO_dgram case at least). But I don't believe there is any general | ||||
| intention to support multithreaded use of a single BIO at this time (unless I am | ||||
| mistaken), so this seems like it isn't an issue. | ||||
| 
 | ||||
| If we wanted to support multithreaded use of the same FD using the same BIO, we | ||||
| would need to revisit the set-call-then-execute-call API approach above | ||||
| (`BIO_(set|get)_dgram_(origin|dest)`) as this would pose a problem. But I mainly | ||||
| mention this only for completeness. Our recent learnt lessons on cache | ||||
| contention suggest that this probably wouldn't be a good idea anyway. | ||||
| 
 | ||||
| #### Other questions | ||||
| 
 | ||||
| BIO_dgram will call the allocation function to get buffers for `recvmmsg` to | ||||
| fill. We might want to have a way to specify how many buffers it should offer to | ||||
| `recvmmsg`, and thus how many buffers it allocates in advance. | ||||
| 
 | ||||
| #### Premature destruction | ||||
| 
 | ||||
| If BIO_dgram is freed before all datagrams are read, the read buffer free | ||||
| callback is used to free any unreturned read buffers. | ||||
		Loading…
	
		Reference in New Issue