mirror of https://github.com/apache/kafka.git
				
				
				
			KAFKA-15442: add a section in doc for tiered storage (#14382)
Added 6.11: Tiered Storage section and notable changes ini v3.6.0 Reviewers: Satish Duggana <satishd@apache.org>, Gantigmaa Selenge <gselenge@redhat.com>
This commit is contained in:
		
							parent
							
								
									2a41beb0f4
								
							
						
					
					
						commit
						ac39342d47
					
				|  | @ -3859,6 +3859,98 @@ listeners=CONTROLLER://:9093 | |||
| 
 | ||||
| # Other configs ...</pre> | ||||
| 
 | ||||
| 
 | ||||
| <h3 class="anchor-heading"><a id="tiered_storage" class="anchor-link"></a><a href="#kraft">6.11 Tiered Storage</a></h3> | ||||
| 
 | ||||
| <h4 class="anchor-heading"><a id="tiered_storage_overview" class="anchor-link"></a><a href="#tiered_storage_overview">Tiered Storage Overview</a></h4> | ||||
| 
 | ||||
| <p>Kafka data is mostly consumed in a streaming fashion using tail reads. Tail reads leverage OS's page cache to serve the data instead of disk reads. | ||||
|   Older data is typically read from the disk for backfill or failure recovery purposes and is infrequent.</p> | ||||
| 
 | ||||
| <p>In the tiered storage approach, Kafka cluster is configured with two tiers of storage - local and remote. | ||||
|   The local tier is the same as the current Kafka that uses the local disks on the Kafka brokers to store the log segments. | ||||
|   The new remote tier uses external storage systems, such as HDFS or S3, to store the completed log segments. | ||||
|   Please check <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage">KIP-405</a> for more information. | ||||
| </p> | ||||
| 
 | ||||
| <p><b>Note: Tiered storage is considered as an early access feature, and is not recommended for use in production environments</b></p> | ||||
| 
 | ||||
| <h4 class="anchor-heading"><a id="tiered_storage_config" class="anchor-link"></a><a href="#tiered_storage_config">Configuration</a></h4> | ||||
| 
 | ||||
| <h5 class="anchor-heading"><a id="tiered_storage_config_broker" class="anchor-link"></a><a href="#tiered_storage_config_broker">Broker Configurations</a></h5> | ||||
| 
 | ||||
| <p>By default, Kafka server will not enable tiered storage feature. <code>remote.log.storage.system.enable</code> | ||||
|   is the property to control whether to enable tiered storage functionality in a broker or not. Setting it to "true" enables this feature. | ||||
| </p> | ||||
| 
 | ||||
| <p><code>RemoteStorageManager</code> is an interface to provide the lifecycle of remote log segments and indexes. Kafka server | ||||
|   doesn't provide out-of-the-box implementation of RemoteStorageManager. Configuring <code>remote.log.storage.manager.class.name</code> | ||||
|   and <code>remote.log.storage.manager.class.path</code> to specify the implementation of RemoteStorageManager. | ||||
| </p> | ||||
| 
 | ||||
| <p><code>RemoteLogMetadataManager</code> is an interface to provide the lifecycle of metadata about remote log segments with strongly consistent semantics. | ||||
|   By default, Kafka provides an implementation with storage as an internal topic. This implementation can be changed by configuring | ||||
|   <code>remote.log.metadata.manager.class.name</code> and <code>remote.log.metadata.manager.class.path</code>. | ||||
|   When adopting the default kafka internal topic based implementation, <code>remote.log.metadata.manager.listener.name</code> | ||||
|   is a mandatory property to specify which listener the clients created by the default RemoteLogMetadataManager implementation. | ||||
| </p> | ||||
| 
 | ||||
| 
 | ||||
| <h5 class="anchor-heading"><a id="tiered_storage_config_topic" class="anchor-link"></a><a href="#tiered_storage_config_topic">Topic Configurations</a></h5> | ||||
| 
 | ||||
| <p>After correctly configuring broker side configurations for tiered storage feature, there are still configurations in topic level needed to be set. | ||||
|   <code>remote.storage.enable</code> is the switch to determine if a topic wants to use tiered storage or not. By default it is set to false. | ||||
|   After enabling <code>remote.storage.enable</code> property, the next thing to consider is the log retention. | ||||
|   When tiered storage is enabled for a topic, there are 2 additional log retention configurations to set: | ||||
| 
 | ||||
| <ul> | ||||
|   <li><code>local.retention.ms</code></li> | ||||
|   <li><code>retention.ms</code></li> | ||||
|   <li><code>local.retention.bytes</code></li> | ||||
|   <li><code>retention.bytes</code></li> | ||||
| </ul> | ||||
| 
 | ||||
|   The configuration prefixed with <code>local</code> are to specify the time/size the "local" log file can accept before moving to remote storage, and then get deleted. | ||||
|   If unset, The value in <code>retention.ms</code> and <code>retention.bytes</code> will be used. | ||||
| </p> | ||||
| 
 | ||||
| <h4 class="anchor-heading"><a id="tiered_storage_config_ex" class="anchor-link"></a><a href="#tiered_storage_config_ex">Configurations Example</a></h4> | ||||
| 
 | ||||
| <p>Here is a sample configuration to enable tiered storage feature in broker side: | ||||
| <pre> | ||||
| # Sample Zookeeper/Kraft broker server.properties listening on PLAINTEXT://:9092 | ||||
| remote.log.storage.system.enable=true | ||||
| # Please provide the implementation for remoteStorageManager. This is the mandatory configuration for tiered storage. | ||||
| # remote.log.storage.manager.class.name=org.apache.kafka.server.log.remote.storage.NoOpRemoteStorageManager | ||||
| # Using the "PLAINTEXT" listener for the clients in RemoteLogMetadataManager to talk to the brokers. | ||||
| remote.log.metadata.manager.listener.name=PLAINTEXT | ||||
| </pre> | ||||
| </p> | ||||
| 
 | ||||
| <p>After broker is started, creating a topic with tiered storage enabled, and a small log time retention value to try this feature: | ||||
| <pre>bin/kafka-topics.sh --create --topic tieredTopic --bootstrap-server localhost:9092 --config remote.storage.enable=true --config local.retention.ms=1000 | ||||
| </pre> | ||||
| </p> | ||||
| 
 | ||||
| <p>Then, after the active segment is rolled, the old segment should be moved to the remote storage and get deleted. | ||||
| </p> | ||||
| 
 | ||||
| <h4 class="anchor-heading"><a id="tiered_storage_limitation" class="anchor-link"></a><a href="#tiered_storage_limitation">Limitations</a></h4> | ||||
| 
 | ||||
| <p>While the early access release of Tiered Storage offers the opportunity to try out this new feature, it is important to be aware of the following limitations: | ||||
| <ul> | ||||
|   <li>No support for clusters with multiple log directories (i.e. JBOD feature)</li> | ||||
|   <li>No support for compacted topics</li> | ||||
|   <li>Cannot disable tiered storage at the topic level</li> | ||||
|   <li>Deleting tiered storage enabled topics is required before disabling tiered storage at the broker level</li> | ||||
|   <li>Admin actions related to tiered storage feature are only supported on clients from version 3.0 onwards</li> | ||||
| </ul> | ||||
| </p> | ||||
| 
 | ||||
| <p>For more information, please check <a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes">Tiered Storage Early Access Release Note</a>. | ||||
| </p> | ||||
| 
 | ||||
| 
 | ||||
| </script> | ||||
| 
 | ||||
| <div class="p-ops"></div> | ||||
|  |  | |||
|  | @ -169,6 +169,14 @@ | |||
|                         <li><a href="#kraft_zk_migration">ZooKeeper to KRaft Migration</a></li> | ||||
|                     </ul> | ||||
|                 </li> | ||||
|                 <li><a href="#tiered_storage">6.11 Tiered Storage</a> | ||||
|                     <ul> | ||||
|                         <li><a href="#tiered_storage_overview">Tiered Storage Overview</a></li> | ||||
|                         <li><a href="#tiered_storage_config">Configuration</a></li> | ||||
|                         <li><a href="#tiered_storage_config_ex">Configurations Example</a></li> | ||||
|                         <li><a href="#tiered_storage_limitation">Limitations</a></li> | ||||
|                     </ul> | ||||
|                 </li> | ||||
|             </ul> | ||||
|         </li> | ||||
|         <li><a href="#security">7. Security</a> | ||||
|  |  | |||
|  | @ -50,6 +50,11 @@ | |||
|             <code>replication.policy.internal.topic.separator.enabled</code> | ||||
|             property. If upgrading from 3.0.x or earlier, it may be necessary to set this property to <code>false</code>; see the property's | ||||
|             <a href="#mirror_connector_replication.policy.internal.topic.separator.enabled">documentation</a> for more details.</li> | ||||
|         <li>Early access of tiered storage feature is available, and it is not recommended for use in production environments. | ||||
|             Welcome to test it and provide any feedback to us. | ||||
|             For more information about the early access tiered storage feature, please check <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage">KIP-405</a> and | ||||
|             <a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes">Tiered Storage Early Access Release Note</a>. | ||||
|         </li> | ||||
|     </ul> | ||||
| 
 | ||||
| <h4><a id="upgrade_3_5_0" href="#upgrade_3_5_0">Upgrading to 3.5.0 from any version 0.8.x through 3.4.x</a></h4> | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue