Commit Graph

2334 Commits

Author SHA1 Message Date
Vicky Papavasileiou fe72187cb1
KAFKA-13524: Add IQv2 query handling to the caching layer (#11682)
Currently, IQv2 forwards all queries to the underlying store. We add this bypass to allow handling of key queries in the cache. If a key exists in the cache, it will get answered from there.
As part of this PR, we realized we need access to the position of the underlying stores. So, I added the method getPosition to the public API and ensured all state stores implement it. Only the "leaf" stores (Rocks*, InMemory*) have an actual position, all wrapping stores access their wrapped store's position.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, John Roesler <vvcephei@apache.org>
2022-01-26 09:36:39 -06:00
Vicky Papavasileiou 868cbcb8e5
MINOR: Fix bug of empty position in windowed and session stores #11713
Reviewers: John Roesler <vvcephei@apache.org>
2022-01-25 13:46:20 -06:00
John Roesler 96fa468106
MINOR: fix NPE in iqv2 (#11702)
There is a brief window between when the store is registered and when
it is initialized when it might handle a query, but there is no context.
We treat this condition just like a store that hasn't caught up to the
desired position yet.

Reviewers: Guozhang Wang <guozhang@apache.org>, Matthias J. Sax <mjsax@apache.org>, A. Sophie Blee-Goldman <ableegoldman@apache.org>, Patrick Stuedi <pstuedi@apache.org>
2022-01-25 13:23:46 -06:00
A. Sophie Blee-Goldman 9d602a01be
KAFKA-12648: invoke exception handler for MissingSourceTopicException with named topologies (#11686)
Followup to #11600 to invoke the streams exception handler on the MissingSourceTopicException, without killing/replacing the thread

Reviewers: Guozhang Wang <guozhang@confluent.io>, Bruno Cadonna <cadonna@confluent.io>
2022-01-25 10:37:35 -08:00
A. Sophie Blee-Goldman 265d3199ec
KAFKA-12648: fixes for query APIs with named topologies (#11609)
Fixes some issues with the NamedTopology version of the IQ methods that accept a topologyName argument, and adds tests for all.

Reviewers:  Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-01-25 05:49:23 -08:00
Sayantanu Dey d13d09fb68
KAFKA-13590:rename InternalTopologyBuilder#topicGroups (#11704)
Renamed the often confusing and opaque #topicGroups API to #subtopologyToTopicsInfo

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-01-24 21:03:37 -08:00
Aleksandr Sorokoumov 7d9b9847f1
KAFKA-6502: Update consumed offsets on corrupted records. (#11683)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-01-20 09:26:38 -08:00
A. Sophie Blee-Goldman 529dde904a
KAFKA-12648: handle MissingSourceTopicException for named topologies (#11600)
Avoid throwing a MissingSourceTopicException inside the #assign method when named topologies are used, and just remove those topologies which are missing any of their input topics from the assignment.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Bruno Cadonna <cadonna@confluent.io>
2022-01-18 11:49:23 -08:00
Walker Carlson c182a431d2
MINOR: prefix topics if internal config is set (#11611)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-01-10 16:08:48 -08:00
Guozhang Wang 9078451e37
MINOR: Add num threads logging upon shutdown (#11652)
1. Add num of threads logging upon shutdown.
2. Prefix the shutdown thread with client id.

Reviewers: John Roesler <vvcephei@apache.org>
2022-01-06 11:28:27 -08:00
Richard 7567cbc857
KAFKA-13476: Increase resilience timestamp decoding Kafka Streams (#11535)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-01-05 21:38:10 -08:00
John Roesler b424553101
KAFKA-13553: Add PAPI Window and Session store tests for IQv2 (#11650)
During some recent reviews, @mjsax pointed out that StateStore layers
are constructed differently the stores are added via the PAPI vs. the DSL.

This PR adds PAPI construction for Window and Session stores to the
IQv2StoreIntegrationTest so that we can ensure IQv2 works on every
possible state store.

Reviewer: Guozhang Wang <guozhang@apache.org>
2022-01-05 23:16:33 -06:00
John Roesler 7ef8701cca
KAFKA-13553: add PAPI KV store tests for IQv2 (#11624)
During some recent reviews, @mjsax pointed out that StateStore layers
are constructed differently the stores are added via the PAPI vs. the DSL.

This PR adds KeyValueStore PAPI construction to the
IQv2StoreIntegrationTest so that we can ensure IQv2 works on every
possible state store.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, Guozhang Wang <guozhang@apache.org>
2022-01-05 21:04:37 -06:00
Patrick Stuedi b8f1cf14c3
KAFKA-13494: WindowKeyQuery and WindowRangeQuery (#11567)
Implement WindowKeyQuery and WindowRangeQuery as
proposed in KIP-806

Reviewer: John Roesler <vvcephei@apache.org>
2022-01-02 22:17:38 -06:00
John Roesler 018fb88efa
KAFKA-13557: Remove swapResult from the public API (#11617)
Follow-on from #11582 .
Removes a public API method in favor of an internal utility method.

Reviewer: Matthias J. Sax <mjsax@apache.org>
2021-12-20 19:04:08 -06:00
John Roesler 5747788659
KAFKA-13525: Implement KeyQuery in Streams IQv2 (#11582)
Implement the KeyQuery as proposed in KIP-796

Reviewers: Vicky Papavasileiou <vpapavasileiou@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <guozhang@apache.org>
2021-12-20 12:22:05 -06:00
Walker Carlson 247c271353
MINOR: retry when deleting offsets for named topologies (#11604)
When this was made I didn't expect deleteOffsetsResult to be set if an exception was thrown. But it is and to retry we need to reset it to null. Changing the KafkaStreamsNamedTopologyWrapper for remove topology when resetting offsets to retry upon GroupSubscribedToTopicException and swallow/complete upon GroupIdNotFoundException

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@ache.>
2021-12-16 19:39:55 -08:00
Vicky Papavasileiou b38f6ba5cc
KAFKA-13479: Implement range and scan queries (#11598)
Implement the RangeQuery as proposed in KIP-805

Reviewers: John Roesler <vvcephei@apache.org>
2021-12-16 11:09:01 -06:00
Walker Carlson 04787334a5
MINOR: Update log and method name in TopologyMetadata (#11589)
Update an unclear log message and method name in TopologyMetadata

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>, Luke Chen <showuon@confluent.io>
2021-12-15 19:43:40 -08:00
John Roesler acd1f9c563
KAFKA-13522: add position tracking and bounding to IQv2 (#11581)
* Fill in the Position response in the IQv2 result.
* Enforce PositionBound in IQv2 queries.
* Update integration testing approach to leverage consistent queries.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, Vicky Papavasileiou <vpapavasileiou@confluent.io>, Guozhang Wang <guozhang@apache.org>
2021-12-11 01:00:59 -06:00
A. Sophie Blee-Goldman 1e459271d7
KAFKA-12648: fix IllegalStateException in ClientState after removing topologies (#11591)
Fix for one of the causes of failure in the NamedTopologyIntegrationTest: org.apache.kafka.streams.errors.StreamsException: java.lang.IllegalStateException: Must initialize prevActiveTasks from ownedPartitions before initializing remaining tasks.

This exception could occur if a member sent in a subscription where all of its ownedPartitions were from a named topology that is no longer recognized by the group leader, eg because it was just removed from the client. We should filter each ClientState based on the current topology only so the assignor only processes the partitions/tasks it can identify. The member with the out-of-date tasks will eventually clean them up when the #removeNamedTopology API is invoked on them

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-12-10 14:26:27 -08:00
A. Sophie Blee-Goldman d5eb3c10ec
HOTFIX: fix failing StreamsMetadataStateTest tests (#11590)
Followup to #11562 to fix broken tests in StreamsMetadataStateTest

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-12-09 16:19:56 -08:00
Vicky Papavasileiou e1dba7af57
MINOR: Cleanup for #11513 (#11585)
Clean up some minor things that were left over from PR #11513

Reviewer: John Roesler <vvcephei@apache.org>
2021-12-09 13:23:01 -06:00
Tamara Skokova 133b515b5e
KAFKA-13507: GlobalProcessor ignores user specified names (#11573)
Use the name specified via consumed parameter in InternalStreamsBuilder#addGlobalStore method for initializing the source name and processor name. If not specified, the names are generated.

Reviewers: Luke Chen <showuon@gmail.com>, Bill Bejeck <bbejeck@apache.org>
2021-12-09 09:42:00 -05:00
Tolga H. Dur e20f102298
KAFKA-12648: extend IQ APIs to work with named topologies (#11562)
In the new NamedTopology API being worked on, state store names and their uniqueness requirement is going to be scoped only to the owning topology, rather than to the entire app. In other words, two different named topologies can have different state stores with the same name.

This is going to cause problems for some of the existing IQ APIs which require only a name to resolve the underlying state store. We're now going to need to take in the topology name in addition to the state store name to be able to locate the specific store a user wants to query

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-12-09 03:54:28 -08:00
Vicky Papavasileiou 7acd12d6e3
KAFKA-13506: Write and restore position to/from changelog (#11513)
Introduces changelog headers to pass position information
to standby and restoring stores. The headers are guarded by an internal
config, which defaults to `false` for backward compatibility. Once IQv2
is finalized, we will make that flag a public config.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, John Roesler <vvcephei@apache.org>
2021-12-08 11:58:35 -06:00
Bruno Cadonna 68c3018a5a
MINOR: Fix internal topic manager tests (#11574)
When the unit tests of the internal topic manager test
are executed on a slow machine (like sometimes in automatic builds)
they sometimes fail with a timeout exception instead of the expected
exception. To fix this behavior, this commit replaces the use of
system time with mock time.

Reviewer: John Roesler <vvcephei@apache.org>
2021-12-07 18:23:52 +01:00
Walker Carlson 965ec40c0a
KAFKA-12648: Make changing the named topologies have a blocking option (#11479)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-12-03 11:32:55 -08:00
John Roesler 14c2449050
KAFKA-13491: IQv2 framework (#11557)
Implements the major part of the IQv2 framework as proposed in KIP-796.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, Vicky Papavasileiou <vpapavasileiou@confluent.io>, Bruno Cadonnna <cadonna@apache.org>
2021-12-03 12:53:31 -06:00
Patrick Stuedi 62f73c30d3
KAFKA-13498: track position in remaining state stores (#11541)
Reviewers: Vicky Papavasileiou <vpapavasileiou@confluent.io>, John Roesler<vvcephei@apache.org>
2021-12-01 11:49:10 -06:00
Bruno Cadonna 4fed0001ec
MINOR: Fix system test StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance (#11532)
Log messages were changed in the AssignorConfiguration (#11490) that are
also used for verification in system test
StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance.

This commit fixes the test and adds comments to the log messages
that point to the test that needs to be updated in case of
changes to the log messages.

Reviewers: John Roesler <vvcephei@apache.org>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2021-11-25 10:48:09 +01:00
Patrick Stuedi 23e9818e62
KAFKA-13480: Track Position in KeyValue stores (#11514)
Add position tracking to KeyValue stores in support of KIP-796

Reviewers: John Roesler <vvcephei@apache.org>
2021-11-24 18:28:00 -06:00
Jorge Esteban Quilcate Otoya 1e0916580f
KAFKA-13117: migrate TupleForwarder and CacheFlushListener to new Record API (#11481)
* Migrate TupleForwarder and CacheFlushListener to new Processor API
* Update the affected Processors

Reviewers: John Roesler <vvcephei@apache.org>
2021-11-22 21:34:59 -06:00
Bruno Cadonna 9285820df0
MINOR: Set mock correctly in RocksDBMetricsRecorderTest (#11462)
With a nice mock in RocksDBMetricsRecorderTest#shouldCorrectlyHandleHitRatioRecordingsWithZeroHitsAndMisses() and RocksDBMetricsRecorderTest#shouldCorrectlyHandleAvgRecordingsWithZeroSumAndCount() were green although getTickerCount() was never called. The tests were green because EasyMock returns 0 for a numerical return value by default if no expectation is specified. Thus, commenting out the expectation for getTickerCount() did not change the result of the test.

This commit changes the mock to a default mock and fixes the expectation to expect getAndResetTickerCount(). Now, commenting out the expectation leads to a test failure.

Reviewers: Luizfrf3 <lf.fonseca@hotmail.com>, Guozhang Wang <wangguoz@gmail.com>
2021-11-18 18:14:36 +01:00
Luke Chen 1b4cffdcb7
KAFKA-13439: Deprecate eager rebalance protocol in kafka stream (#11490)
Deprecate eager rebalancing protocol in kafka streams and log warning message when upgrade.from is set to 2.3 or lower. Also add a note in upgrade doc to prepare users for the removal of eager rebalancing support

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-11-17 03:05:19 -08:00
Matthias J. Sax 30d1989db1
MINOR: update Kafka Streams standby task config (#11404)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Antony Stubbs <antony@confluent.io>, James Galasyn <jim.galasyn@confluent.io>
2021-11-16 17:34:49 -08:00
Patrick Stuedi ffbef88cd7
Add recordMetadata() to StateStoreContext (#11498)
Implements KIP-791

Reviewers: John Roesler <vvcephei@apache.org>
2021-11-16 10:51:41 -06:00
mkandaswamy babc54333c
MINOR: Improve KafkaStreamsTest: testInitializesAndDestroysMetricsReporters (#11494)
Add additional asserts for KafkaStreamsTest: testInitializesAndDestroysMetricsReporters to help diagnose if it flakily fails in the future.
- MockMetricsReporter gets initialized only once during KafkaStreams construction, so make assert check stricter by ensuring initDiff is one.
- Assert KafkaStreams is not running before we validate whether MockMetricsMetricsReporter close count got incremented after streams close.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2021-11-16 10:50:15 +01:00
A. Sophie Blee-Goldman 908a6d2ad7
KAFKA-12648: introduce TopologyConfig and TaskConfig for topology-level overrides (#11272)
Most configs that are read and used by Streams today originate from the properties passed in to the KafkaStreams constructor, which means they get applied universally across all threads, tasks, subtopologies, and so on. The only current exception to this is the topology.optimization config which is parsed from the properties that get passed in to StreamsBuilder#build. However there are a handful of configs that could also be scoped to the topology level, allowing users to configure each NamedTopology independently of the others, where it makes sense to do so.

This PR refactors the handling of these configs by interpreting the values passed in via KafkaStreams constructor as the global defaults, which can then be overridden for individual topologies via the properties passed in when building the NamedTopology. More topology-level configs may be added in the future, but this PR covers the following:

max.task.idle.ms
task.timeout.ms
buffered.records.per.partition
default.timestamp.extractor.class
default.deserialization.exception.handler

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Walker Carlson <wcarlson@confluent.io>, Luke Chen <showuon@confluent.io>
2021-11-10 11:27:59 -08:00
Jorge Esteban Quilcate Otoya 807c5b4d28
KAFKA-10543: Convert KTable joins to new PAPI (#11412)
* Migrate KTable joins to new Processor API.
* Migrate missing KTableProcessorSupplier implementations.
* Replace KTableProcessorSupplier with new Processor API implementation.

Reviewers: John Roesler <vvcephei@apache.org>
2021-11-08 14:48:54 -06:00
Victoria Xia 01e6a6ebf2
KAFKA-13261: Add support for custom partitioners in foreign key joins (#11368)
Implements KIP-775.

Co-authored-by: Tomas Forsman <tomas-forsman@users.noreply.github.com>
2021-11-03 10:55:24 -07:00
Luizfrf3 252a40ea1f
KAFKA-8941: Add RocksDB Metrics that Could not be Added due to RocksD… (#11441)
This PR adds some RocksDB metrics that could not be added in KIP-471 due to RocksDB version. The new metrics are extracted using Histogram data provided by RocksDB API, and the old ones were extracted using Tickers. The new metrics added are memtable-flush-time-(avg|min|max) and compaction-time-(avg|min|max).

Reviewer: Bruno Cadonnna <cadonna@apache.org>
2021-11-03 12:18:28 +01:00
A. Sophie Blee-Goldman 22aa9d2ce1
KAFKA-12648: fill in javadocs for the StreamsException class with new guarantees (#11436)
Minor followup to #11405 / KIP-783 to write down the new guarantees we're providing about the meaning of a StreamsException in the javadocs of that class

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-10-27 10:47:57 -07:00
Andrew Patterson 86e83de742
kafka-12994: Migrated SlidingWindowsTest to new API (#11379)
As raised in KAFKA-12994, All tests that use the old API should be either eliminated or migrated to the new API in order to remove the @SuppressWarnings("deprecation") annotations.

This PR migrates the SlidingWindowsTest to the new API.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-10-21 16:50:37 -07:00
A. Sophie Blee-Goldman b534124224
KAFKA-12648: Wrap all exceptions thrown to handler as StreamsException & add TaskId field (#11405)
To help users distinguish which task an exception was thrown from, and which NamedTopology if it exists, we add a TaskId field to the StreamsException class. We then make sure that all exceptions thrown to the handler are wrapped as StreamsExceptions, to help the user simplify their handling code as they know they will always need to unwrap the thrown exception exactly once.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, John Roesler <vvcephei@apache.org>
2021-10-21 16:21:20 -07:00
Jorge Esteban Quilcate Otoya 68223d3227
KAFKA-10539: Convert KStreamImpl joins to new PAPI (#11356)
Part of the migration to new Processor API, this PR converts KStream to KStream joins.

Depends #11315

Reviewers: John Roesler <vvcephei@apache.org>
2021-10-18 15:34:08 -05:00
Lucas Bradstreet da38a1df27
MINOR: "partition" typos and method doc arg fix (#11298)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Luke Chen <showuon@gmail.com>
2021-10-18 10:44:11 +02:00
Matthias J. Sax f58d79d549
KAFKA-13345: Use "delete" cleanup policy for windowed stores if duplicates are enabled (#11380)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Luke Chen <showuon@gmail.com>
2021-10-14 22:06:30 -07:00
Matthias J. Sax 7fbe482289
HOTFIX: suppress deprecation warnings to fix compilation errors (#11400)
Reviewers: Bill Bejeck <bill@confluent.io>
2021-10-14 18:58:20 -07:00
Luke Chen 65b01a0464
KAFKA-13212: add support infinite query for session store (#11234)
Add support for infinite range query for SessionStore.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2021-10-12 16:14:38 -07:00
Luke Chen d1415866cc
KAFKA-13021: disallow grace called after grace set via new API (#11188)
Disallow calling grace() if it was already set via ofTimeDifferenceAndGrace/WithNoGrace(). Add the check to disallow grace called after grace set via new API, and add tests for them.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-10-12 16:05:22 -07:00
Guozhang Wang e31a53a2cd
KAFKA-13319: Do not commit empty offsets on producer (#11362)
We observed on the broker side that txn-offset-commit request with empty topics are received. After checking the source code I found there's on place on Streams which is unnecessarily sending empty offsets. This PR cleans up the streams layer logic a bit to not send empty offsets, and at the same time also guard against empty offsets at the producer layer as well.

Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>
2021-10-12 13:33:23 -07:00
Luke Chen 769882d910
MINOR: remove unneeded size and add lock coarsening to inMemoryKeyValueStore (#11370)
Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-10-06 18:32:10 -07:00
Jorge Esteban Quilcate Otoya ce83e5be66
KAFKA-10540: Migrate KStream aggregate operations (#11315)
As part of the migration of KStream/KTable operations to the new
Processor API https://issues.apache.org/jira/browse/KAFKA-8410,
this PR includes the migration of KStream aggregate/reduce operations.

Reviewers: John Roesler <vvcephei@apache.org>
2021-09-30 11:40:40 -05:00
A. Sophie Blee-Goldman 6d7a785956
MINOR: expand logging and improve error message during partition count resolution (#11364)
Recently a user hit this TaskAssignmentException due to a bug in their regex that meant no topics matched the pattern subscription, which in turn meant that it was impossible to resolve the number of partitions of the downstream repartition since there was no upstream topic to get the partition count for. Debugging this was pretty difficult and ultimately came down to stepping through the code line by line, since even with TRACE logging we only got a partial picture.

We should expand the logging to make sure the TRACE logging hits both conditional branches, and improve the error message with a suggestion for what to look for should someone hit this in the future

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2021-09-29 18:19:43 -07:00
Luke Chen 361b7845c6
KAFKA-13309: fix InMemorySessionStore#fetch/backwardFetch order issue (#11337)
In #9139, we added backward iterator on SessionStore. But there is a bug that when fetch/backwardFetch the key range, if there are multiple records in the same session window, we can't return the data in the correct order.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman<apache.org>
2021-09-28 17:31:29 -07:00
vamossagar12 7e57300148
KAFKA-12486: Enforce Rebalance when a TaskCorruptedException is throw… (#11076)
This PR aims to utilize HighAvailabilityTaskAssignor to avoid downtime on corrupted tasks. The idea is that, when we hit TaskCorruptedException on an active task, a rebalance is triggered after we've wiped out the corrupted state stores. This will allow the assignor to temporarily redirect this task to another client who can resume work on the task while the original owner works on restoring the state from scratch.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-09-28 16:50:16 -07:00
Andrew Patterson 24a335b338
KAFKA-12994: Migrate TimeWindowsTest to new API (#11215)
As raised in KAFKA-12994, All tests that use the old API should be either eliminated or migrated to the new API in order to remove the @SuppressWarnings("deprecation") annotations. This PR will migrate over all the relevant tests in TimeWindowsTests.java

Reviewers: Anna Sophie Blee-Goldman
2021-09-28 15:58:49 -07:00
Walker Carlson 4eb386f6e0
KAFKA-13296: warn if previous assignment has duplicate partitions (#11347)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Luke Chen <showuon@gmail.com>
2021-09-25 11:41:52 -07:00
Jorge Esteban Quilcate Otoya d15969af28
KAFKA-10544: Migrate KTable aggregate and reduce (#11316)
As part of the migration of KStream/KTable operations to the new Processor API https://issues.apache.org/jira/browse/KAFKA-8410, this PR includes the migration of KTable aggregate/reduce operations.

Reviewers: John Roesler <vvcephei@apache.org>
2021-09-22 14:25:41 -05:00
Luke Chen b61ec0003f
KAFKA-13211: add support for infinite range query for WindowStore (#11227)
Add support for infinite range query for WindowStore. Story JIRA: https://issues.apache.org/jira/browse/KAFKA-13210

Reviewers: Patrick Stuedi <pstuedi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2021-09-22 09:14:58 -07:00
andy0x01 5a6f19b2a1
KAFKA-13246: StoreQueryIntegrationTest#shouldQueryStoresAfterAddingAndRemovingStreamThread now waits for the client state to go to REBALANCING/RUNNING after adding/removing a thread and waits for state RUNNING before querying the state store. (#11334)
KAFKA-13246: StoreQueryIntegrationTest#shouldQueryStoresAfterAddingAndRemovingStreamThread does not gate on stream state well

The test now waits for the client to transition to REBALANCING/RUNNING after adding/removing a thread as well as to transition to RUNNING before querying the state store.

Reviewers: singingMan <@3schwartz>, Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>
2021-09-21 11:18:15 -05:00
Guozhang Wang 1b0294dfc4
MINOR: Let the list-store return null in putifabsent (#11335)
This is to make sure that even if logging is disabled, we would still return null in order to workaround the deserialization issue for stream-stream left/outer joins.

Reviewers: Matthias J. Sax <mjsax@apache.org>
2021-09-17 12:05:21 -07:00
Guozhang Wang a0c7e6d8b4
KAFKA-13216: Use a KV with list serde for the shared store (#11252)
This is an alternative approach in parallel to #11235. After several unsuccessful trials to improve its efficiency i've come up with a larger approach, which is to use a kv-store instead as the shared store, which would store the value as list. The benefits of this approach are:

Only serde once that compose <timestamp, byte, key>, at the outer metered stores, with less byte array copies.
Deletes are straight-forward with no scan reads, just a single call to delete all duplicated <timestamp, byte, key> values.
Using a KV store has less space amplification than a segmented window store.
The cons though:

Each put call would be a get-then-write to append to the list; also we would spend a few more bytes to store the list (most likely a singleton list, and hence just 4 more bytes).
It's more complicated definitely.. :)
The main idea is that since the shared store is actively GC'ed by the expiration logic, not based on time retention, and since that the key format is in <timestamp, byte, key>, the range expiration query is quite efficient as well.

Added testing covering for the list stores (since we are still use kv-store interface, we cannot leverage on the get() calls in the stream-stream join, instead we use putIfAbsent and range only). Another minor factoring piggy-backed is to let toList to always close iterator to avoid leaking.

Reviewers: Sergio Peña <sergio@confluent.io>, Matthias J. Sax <mjsax@apache.org>
2021-09-16 16:44:32 -07:00
Luke Chen 9628c1278e
KAFKA-13264: fix inMemoryWindowStore backward fetch not in reversed order (#11292)
When introducing backward iterator for WindowStroe in #9138, we forgot to make "each segment" in reverse order (i.e. in descendingMap) in InMemoryWindowStore. Fix it and add integration tests for it.

Currently, in Window store, we store records in [segments -> [records] ].

For example:
window size = 500,
input records:

key: "a", value: "aa", timestamp: 0 ==> will be in [0, 500] window
key: "b", value: "bb", timestamp: 10 ==> will be in [0, 500] window
key: "c", value: "cc", timestamp: 510 ==> will be in [500, 1000] window

So, internally, the "a" and "b" will be in the same segment, and "c" in another segments.
segments: [0 /* window start */, records], [500, records].
And the records for window start 0 will be "a" and "b".
the records for window start 500 will be "c".

Before this change, we did have a reverse iterator for segments, but not in "records". So, when doing backwardFetchAll, we'll have the records returned in order: "c", "a", "b", which should be "c", "b", "a" obviously.

Reviewers: Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2021-09-13 14:40:54 -07:00
Oliver Hutchison a03bda61e0
KAFKA-13249: Always update changelog offsets before writing the checkpoint file (#11283)
When using EOS checkpointed offsets are not updated to the latest offsets from the changelog because the maybeWriteCheckpoint method is only ever called when commitNeeded=false. This change will force the update if enforceCheckpoint=true .

I have also added a test which verifies that both the state store and the checkpoint file are completely up to date with the changelog after the app has shutdown.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2021-09-13 14:15:22 -07:00
Josep Prat 286126f9a5
KAFKA-13243: KIP-773 Differentiate metric latency measured in ms and ns (#11302)
KAFKA-13243: KIP-773 Differentiate metric latency measured in ms and ns

Implementation of KIP-773

Deprecates inconsistent metrics bufferpool-wait-time-total,
io-waittime-total, and iotime-total.
Introduces new metrics bufferpool-wait-time-ns-total,
io-wait-time-ns-total, and io-time-ns-total with the same semantics as
before.
Adds metrics (old and new) in ops.html.
Adds upgrade guide for these metrics.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Tom Bentley <tbentley@redhat.com>
2021-09-08 18:00:58 +01:00
Tomer Wizman fb77da941a
KAFKA-12766 - Disabling WAL-related Options in RocksDB (#11250)
Description
Streams disables the write-ahead log (WAL) provided by RocksDB since it replicates the data in changelog topics. Hence, it does not make much sense to set WAL-related configs for RocksDB.

Proposed solution
Ignore any WAL-related configuration and state in the log that we are ignoring them.

Co-authored-by: Tomer Wizman <tomer.wizman@personetics.com>
Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewers: Boyang Chen <boyang@apache.org>, Bruno Cadonna <cadonna@apache.org>
2021-09-08 13:57:08 +02:00
Christo Lolov 6472e79092
KAFKA-12994 Migrate JoinWindowsTest and SessionWindowsTest to new API (#11214)
As detailed in KAFKA-12994, unit tests using the old API should be either removed or migrated to the new API.
This PR migrates relevant tests in JoinWindowsTest.java and SessionWindowsTest.java.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-09-07 19:50:18 -07:00
Jorge Esteban Quilcate Otoya 5f89ce5f20
KAFKA-13201: Convert KTable suppress to new PAPI (#11213)
Migrate Suppress as part of the migration of KStream/KTable
 operations to the new Processor API (KAFKA-8410)

Reviewers: John Roesler <vvcephei@apache.org>
2021-09-07 17:17:44 -05:00
CHUN-HAO TANG 89ee72b5b5
KAFKA-13088: Replace EasyMock with Mockito for ForwardingDisabledProcessorContextTest (#11051)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-09-06 08:18:52 -07:00
Yanwen(Jason) Lin 7c64077b34
KAFKA-13032: add NPE checker for KeyValueMapper (#11241)
Currently KStreamMap and KStreamFlatMap classes will throw NPE if the call to KeyValueMapper#apply() return null. This commit checks whether the result of KeyValueMapper#apply() is null and throws a more meaningful error message for better debugging.

Two unit tests are also added to check if we successfully captured nulls.

Reviewers: Josep Prat <josep.prat@aiven.io>,  Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2021-09-06 14:03:16 +02:00
Josep Prat 4835c64f89
KAFKA-12887 Skip some RuntimeExceptions from exception handler (#11228)
Instead of letting all RuntimeExceptions go through and be processed by the uncaught exception handler, IllegalStateException and IllegalArgumentException are not passed through and fail fast. In this PR when setting the uncaught exception handler we check if the exception is in an "exclude list", if so, we terminate the client, otherwise we continue as usual.

Added test checking this new case. Added integration test checking that user defined exception handler is not used when an IllegalStateException is thrown.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-09-01 09:58:36 -07:00
Rohan a5ce43781e
MINOR: add units to metrics descriptions + test fix post KAFKA-13229 (#11286)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-08-31 11:44:45 -07:00
Rohan 01ab888dbd
KAFKA-13229: add total blocked time metric to streams (KIP-761) (#11149)
* Add the following producer metrics:
flush-time-total: cumulative sum of time elapsed during in flush.
txn-init-time-total: cumulative sum of time elapsed during in initTransactions.
txn-begin-time-total: cumulative sum of time elapsed during in beginTransaction.
txn-send-offsets-time-total: cumulative sum of time elapsed during in sendOffsetsToTransaction.
txn-commit-time-total: cumulative sum of time elapsed during in commitTransaction.
txn-abort-time-total: cumulative sum of time elapsed during in abortTransaction.

* Add the following consumer metrics:
commited-time-total: cumulative sum of time elapsed during in committed.
commit-sync-time-total: cumulative sum of time elapsed during in commitSync.

* Add a total-blocked-time metric to streams that is the sum of:
consumer’s io-waittime-total
consumer’s iotime-total
consumer’s committed-time-total
consumer’s commit-sync-time-total
restore consumer’s io-waittime-total
restore consumer’s iotime-total
admin client’s io-waittime-total
admin client’s iotime-total
producer’s bufferpool-wait-time-total
producer's flush-time-total
producer's txn-init-time-total
producer's txn-begin-time-total
producer's txn-send-offsets-time-total
producer's txn-commit-time-total
producer's txn-abort-time-total

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-08-30 15:39:25 -07:00
dengziming 1d22b0d706
KAFKA-10774; Admin API for Describe topic using topic IDs (#9769)
Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>
2021-08-28 09:00:36 +01:00
Walker Carlson 49aed781d8
KAFKA-13128: extract retry checker and update with retriable exception causing flaky StoreQueryIntegrationTest (#11275)
Add a new case to the list of possible retriable exceptions for the flaky tests to take care of threads starting up

Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman
2021-08-27 20:12:12 -07:00
Andy Lapidas 84b111f968
KAFKA-12963: Add processor name to error (#11262)
This PR adds the processor name to the ClassCastException exception text in process()

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-27 20:06:49 -07:00
A. Sophie Blee-Goldman d9bb988954
MINOR: remove unused Properties from GraphNode#writeToTopology (#11263)
The GraphNode#writeToTopology method accepts a Properties input parameter, but never uses it in any of its implementations. We can remove this parameter to clean things up and help make it clear that writing nodes to the topology doesn't involve the app properties.

Reviewers: Bruno Cadonna <cadonna@confluent.io>
2021-08-26 17:19:03 -07:00
Luke Chen 844c1259a9
MINOR: Optimize the OrderedBytes#upperRange for not all query cases (#11181)
Currently in OrderedBytes#upperRange method, we'll check key bytes 1 by 1, to see if there's a byte value >= first timestamp byte value, so that we can skip the following key bytes, because we know compareTo will always return 0 or 1. However, in most cases, the first timestamp byte is always 0, more specifically the upperRange is called for both window store and session store. For former, the suffix is in timestamp, Long.MAX_VALUE and for latter the suffix is in Long.MAX_VALUE, timestamp. For Long.MAX_VALUE the first digit is not 0, for timestamp it could be 0 or not, but as long as it is up to "now" (i.e. Aug. 23rd) then the first byte should be 0 since the value is far smaller than what a long typed value could have. So in practice for window stores, that suffix's first byte has a large chance to be 0, and hence worth optimizing for.

This PR optimizes the not all query cases by not checking the key byte 1 by 1 (because we know the unsigned integer will always be >= 0), instead, put all bytes and timestamp directly. So, we won't have byte array copy in the end either.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-08-26 14:37:34 -07:00
A. Sophie Blee-Goldman 53277a92a6
HOTFIX: decrease session timeout in flaky NamedTopologyIntegrationTest (#11259)
Decrease session timeout back to 10s to improve test flakiness

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-08-25 21:52:32 -07:00
Phil Hardwick a594747d75
HOTFIX: Fix null pointer when getting metric value in MetricsReporter (#11248)
The alive stream threads metric relies on the threads field as a monitor object for
its synchronized block. When the alive stream threads metric is registered it isn't
initialised so any call to get the metric value before it is initialised will result
in a null pointer exception.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Walker Carlson <wcarlson@confluent.io>
2021-08-23 13:21:38 -07:00
John Roesler 45ecaa19f8
MINOR: Set session timeout back to 10s for Streams system tests (#11236)
We increased the default session timeout to 30s in KIP-735:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout

Since then, we are observing sporadic system test failures
due to rebalances taking longer than the test timeout.
Rather than increase the test wait times, we can just override
the session timeout to a value more appropriate in the testing
domain.

Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-20 11:27:54 -05:00
Luke Chen 2bfd0ae2e9
MINOR: update the branch(split) doc and java doc and tests (#11195)
Reviewers: Ivan Ponomarev <iponomarev@mail.ru>, Matthias J. Sax <matthias@confluent.io>
2021-08-10 13:37:59 -07:00
Guozhang Wang 35ebdd7ed0
MINOR: Fix flaky shouldCreateTopicWhenTopicLeaderNotAvailableAndThenTopicNotFound (#11155)
1. This test is taking two iterations since the firs iteration is designed to fail due to unknow topic leader. However both the timeout and the backoff are set to 50ms, while the actual SYSTEM time is used. This means we are likely to timeout before executing the second iteration. I thought about using a mock time but dropped that idea as it may forgo the purpose of this test, instead I set the backoff time to 10ms so that we are almost unlikely to hit this error anymore.

2. Found a minor issue while fixing this which is that when we have non-empty not-ready topics, but the topics-to-create is empty (which is possible as the test shouldCreateTopicWhenTopicLeaderNotAvailableAndThenTopicNotFound itself illustrates), we still call an empty create-topic function. Though it does not incur any round-trip it would still waste some cycles, so I branched it off and hence also simplified some unit tests.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@confluent.io>
2021-08-10 12:27:44 -07:00
Walker Carlson 9565a529e0
KAFKA-12779: rename namedTopology in TaskId to topologyName #11192
Update to KIP-740.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Israel Ekpo <israelekpo@gmail.com>
2021-08-09 15:19:21 -07:00
John Roesler f16a9499ec
MINOR: Increase smoke test production time (#11190)
We've seen a few failures recently due to the driver finishing
the production of data and verifying the results before the
whole cluster is even running.

Reviewers: Leah Thomas <lthomas@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <mjsax@apache.org>
2021-08-09 14:19:35 -05:00
A. Sophie Blee-Goldman 6854eb8332
KAFKA-12648: Pt. 3 - addNamedTopology API (#10788)
Pt. 1: #10609
Pt. 2: #10683
Pt. 3: #10788

In Pt. 3 we implement the addNamedTopology API. This can be used to update the processing topology of a running Kafka Streams application without resetting the app, or even pausing/restarting the process. It's up to the user to ensure that this API is called on every instance of an application to ensure all clients are able to run the newly added NamedTopology. 

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-08-06 00:18:27 -07:00
Walker Carlson be8820cdac
MINOR: Port fix to other StoreQueryIntegrationTests (#11153)
Port the fix from #11129 to the other store-query tests.

Reviewers: John Roesler <vvcephei@apache.org>
2021-08-05 16:39:33 -05:00
Guozhang Wang e5df7fd90b
MINOR: Should commit a task if the consumer position advanced as well (#11151)
Reviewers: John Roesler <vvcephei@apache.org>
2021-08-05 13:41:46 -07:00
Walker Carlson d414de3779
MINOR: use relative counts for restores (#11176)
Use a relative count from using 0 for totalNumbRestores to prevent flakiness.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-05 00:32:25 -07:00
A. Sophie Blee-Goldman 5d52de2ccf
KAFKA-12648: minor followup from Pt. 2 and some new tests (#11146)
Addresses the handful of remaining feedback from Pt. 2, plus adds two new tests: one verifying a multi-topology application with a FKJ and its internal topics, another to make sure IQ works with named topologies (though note that there is a bit more work left for IQ to be fully supported, will be tackled after Pt. 3

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-07-30 10:56:57 -07:00
Patrick Stuedi 22541361b7
Add support for infinite endpoints for range queries (#11120)
Add support to open endpoint range queries in key-value stores

Implements: KIP-763

Reviewers: Almog Gavra <almog@confluent.io>, Luke Chen <showuon@gmail.com>, John Roesler <vvcephei@apache.org>
2021-07-29 21:52:16 -05:00
Patrick Stuedi c074b67395
Fix for flaky test in StoreQueryIntegrationTest (#11129)
Fix a bug in StoreQueryIntegrationTest::shouldQueryOnlyActivePartitionStoresByDefault that causes the test to fail in the case of a client rebalancing. The changes in this PR make sure the test keeps re-trying after a rebalancing operation, instead of failing.

Reviewers: Luke Chen <showuon@gmail.com>, John Roesler <vvcephei@apache.org>
2021-07-29 16:26:00 -05:00
Luke Chen a2aa3b9ea5
close TopologyTestDriver to release resources (#11143)
Close TopologyTestDriver to release resources

Reviewers: Bill Bejeck <bbejeck@apache.org>
2021-07-29 15:50:57 -04:00
Jorge Esteban Quilcate Otoya 3190ebd1e6
KAFKA-10542: Migrate KTable mapValues, passthrough, and source to new Processor API (#11099)
As part of the migration of KStream/KTable operations to the new Processor API (KAFKA-8410), this PR includes the migration of KTable:
* mapValues,
* passthrough,
* and source operations.

Reviewers: John Roesler <vvcephei@apache.org>
2021-07-28 20:58:15 -05:00
Walker Carlson 58402a6fe8
MINOR: add helpful error message (#11139)
I noticed that replace thread actions would not be logged unless the user added a log in the handler. I think it would be very useful for debugging.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-07-28 11:38:31 -07:00
A. Sophie Blee-Goldman 4710a49146
KAFKA-12648: Pt. 2 - Introduce TopologyMetadata to wrap InternalTopologyBuilders of named topologies (#10683)
Pt. 1: #10609
Pt. 2: #10683
Pt. 3: #10788

The TopologyMetadata is next up after Pt. 1 #10609. This PR sets up the basic architecture for running an app with multiple NamedTopologies, though the APIs to add/remove them dynamically are not implemented until Pt. 3

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-07-28 11:18:56 -07:00
Luke Chen 818cbfba6d
KAFKA-13125: close KeyValueIterator instances in internals tests (part 2) (#11107)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-26 16:26:02 -07:00
Luke Chen ded66d92a4
KAFKA-13124: close KeyValueIterator instance in internals tests (part 1) (#11106)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-26 16:25:22 -07:00
Luke Chen f9aeebed05
KAFKA-13123: close KeyValueIterator instances in example code and tests (#11105)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-26 16:23:04 -07:00
A. Sophie Blee-Goldman 246a8afb63
MINOR: factor state checks into descriptive methods and clarify javadocs (#11123)
Just a bit of minor cleanup that (a) does some prepwork for another PR I'm working on, (b) updates the javadocs & exception messages to report a more useful error to the user and describe what they actually need to do, and (c) hopefully makes these state checks more future-proof by defining methods for each kind of check in one place that can be easily updated instead of tracking down every individual check.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Luke Chen <showuon@gmail.com>
2021-07-26 15:53:30 -07:00
A. Sophie Blee-Goldman 8b1eca1c58
KAFKA-13126: guard against overflow when computing `joinGroupTimeoutMs` (#11111)
Setting the max.poll.interval.ms to MAX_VALUE causes overflow when computing the joinGroupTimeoutMs and results in the JoinGroup timeout being set to the request.timeout.ms instead, which is much lower.

This can easily make consumers drop out of the group, since they must rejoin now within 30s (by default) yet have no obligation to almost ever call poll() given the high max.poll.interval.ms, especially when each record takes a long time to process or the `max.poll.records` is also very large. We just need to check for overflow and fix it to Integer.MAX_VALUE when it occurs.

Reviewers: Luke Chen <showuon@gmail.com>, John Roesler <vvcephei@apache.org>
2021-07-23 16:22:41 -07:00
A. Sophie Blee-Goldman 7fbc6b73aa
KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback (#11114)
There were a few followup things to address from #10926, most importantly a number of updates to the javadocs. Also includes a few missing verification checks.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Israel Ekpo
2021-07-23 16:14:37 -07:00
A. Sophie Blee-Goldman 03edcdd972
KAFKA-13128: wait for all keys to be fully processed in #shouldQueryStoresAfterAddingAndRemovingStreamThread (#11113)
This test is flaky due to waiting on all records to be processed for only a single key before issuing IQ lookups and asserting whether data was found.

Reviewers:  Phil Hardwick, Walker Carlson <wcarlson@confluent.io>
2021-07-23 14:56:46 -07:00
A. Sophie Blee-Goldman d99562a145
HOTFIX: Set session interval back to 10s for StreamsCooperativeRebalanceUpgradeTest (#11103)
This test is hitting pretty frequent timeouts after bouncing a node and waiting for it to come back and fully rejoin the group. It seems to now take 45s for the initial JoinGroup to succeed, which I suspect is due to the new default session.interval.ms (which was recently changed to 45s). Let's try fixing this config to the old value of 10s and see if that helps it rejoin in time.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2021-07-22 19:27:07 +02:00
Phil Hardwick 02dc615c1e
KAFKA-13096: Ensure queryable store providers is up to date after adding stream thread (#10921)
When a new thread is added the queryable store providers continues to use the store providers it was given when KafkaStreams was instantiated. This means IQ will start performing lookups against an out-of-date list of threads, and may eventually become completely broken. We must make sure the QueryableStoreProvider is updated when threads are added and removed.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-21 13:23:15 -07:00
leah 89286668eb
MINOR: add serde configs to properly set serdes in failing StreamsStaticMembershipTest (#11093)
After changing the default serde to be null, some system tests started failing. This test didn't explicitly pass in a serde and didn't set the default config so when the test was trying to setup the source node it wasn't able to find any config to use and threw a config exception.

 Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@confluent.io>
2021-07-21 12:54:59 -07:00
Luke Chen ad59e3b622
MINOR: update doc to reflect the grace period change (#11100)
We removed default 24 hours grace period in KIP-633, and deprecate some grace methods, but we forgot to update the stream docs.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2021-07-21 13:26:41 +02:00
Bruno Cadonna 9b3687e0ac
HOTFIX: Modify system test config to reduce time to stable task assignment. (#11090)
Currently, we verify the startup of a Streams client by checking the transition
from REBALANCING to RUNNING and if the client processed some records
in the EOS system test. However, if the Streams client only
has standby tasks assigned as it can happen if the client is catching 
up by using warm-up replicas, the client will never process
records within the timeout of the startup verification. Hence, the test 
will fail although everything is fine. This commit fixes this by reducing
the time to the next probing rebalance and by increasing the number of 
max warm-up replicas. In such a way, the catch up of the client and the 
following processing of records should still be within the startup verification 
timeout of the client.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-21 07:58:14 +02:00
Walker Carlson eeb788d1b9
KAFKA-13010: Retry getting tasks incase of rebalance for TaskMetadata tests (#11083)
If there is a cooperative rebalance the tasks might not be assigned to a thread at all for a very short timeframe, causing this test to fail. We can just retry getting the metadata until the group has finished rebalancing and all tasks are assigned

Reviewers: Bruno Cadonna <cadonna@apache.org>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Josep Prat <josep.prat@aiven.io>
2021-07-20 14:56:07 -07:00
CHUN-HAO TANG 47a0974f5a
KAFKA-13082: Replace EasyMock with Mockito for ProcessorContextTest (#11045)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-20 11:28:11 -07:00
Matthias J. Sax 3e38038278
HOTFIX: Init stream-stream left/outer join emit interval correctly (#11055)
Follow up to #10917

The fix from #10917 intended to reduce the emit frequency to save the creation cost of RocksDB iterators. However, we incorrectly initialized the "timer" with timestamp zero, and thus, the timer was always in the past and we did try to emit left/outer join result too often.

This PR fixes the initialization of the emit interval timer to current wall-clock time to effectively 'enable' the fix from #10917.

Reviewers: Sergio Peña <sergio@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-07-16 13:30:19 -07:00
Guozhang Wang 13b2df733a
MINOR: Default GRACE with Old API should set as 24H minus window-size / inactivity-gap (#10953)
In 2.8 and before, we computed the default grace period with Math.max(maintainDurationMs - sizeMs, 0); in method gracePeriodMs() in TimeWindows, SessionWindows, and JoinWindows. That means that the default grace period has never been 24 hours but 24 hours - window size. Since gracePeriodMs() is used to compute the retention time of the changelog topic for the corresponding window state store and the segments for the window state store it is important to keep the same computation for the deprecated methods. Otherwise, Streams app that run with 2.8 and before might not be compatible with Streams 3.0 because the retention time of the changelog topics created with older Streams apps will be smaller than the assumed retention time for Streams apps in 3.0. For example, with a window size of 10 hours, an old Streams app would have created a changelog topic with retention time 10 hours (window size) + 14 hours (default grace period, 24 hours - 10 hours). A 3.0 Streams app would assume a retention time of 10 hours (window size) + 24 hours (deprecated default grace period as currently specified on trunk). In the presence of failures, where a state store needs to recreated, records might get lost, because before the failure the state store of a 3.0 Streams app contained 10 hours + 24 hours of records whereas the changelog topic that was created with the old Streams app would only contain 10 hours + 14 hours of records.

All this happened due to us always stating that the default grace period was 24 hours although it was not completely correct and a connected and unfortunate misunderstanding when we removed deprecated windows APIs (#10378).

Co-authors: Bruno Cadonna <cadonna@apache.org>
Reviewers: Luke Chen <showuon@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <cadonna@apache.org>
2021-07-16 11:22:26 -07:00
Guozhang Wang 3e3264760b
KAFKA-10847: Remove internal config for enabling the fix (#10941)
Also update the upgrade guide indicating about the grace period KIP and its indication on the fix with throughput impact.

Reviewers: Luke Chen <showuon@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2021-07-15 10:58:15 -07:00
vamossagar12 f413435585
KAFKA-12925: adding presfixScan operation for missed implementations (#10877)
The new prefixScan API may still throw UnsupportedVersionOperationException due to some missing implementations in vast store hierarchy of Streams, this PR adds those missing overrides and expands the test coverage.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-14 15:55:50 -07:00
John Gray 5e5d5bff3b
KAFKA-13037: "Thread state is already PENDING_SHUTDOWN" log spam
Demote this from INFO to debug since it's absolutely spamming the logs

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-13 21:00:23 -07:00
CHUN-HAO TANG f3017683a2
KAFKA-13075: Consolidate RocksDBStoreTest and RocksDBKeyValueStoreTest (#11034)
Consolidate the RocksDBStoreTest and RocksDBKeyValueStoreTest files into a single test class for the RocksDBStore.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-13 13:34:40 -07:00
John Roesler bfdef11b97
KAFKA-12360: Document new time semantics (#11003)
Update the docs for task idling, since the semantics have
changed in 3.0.

Reviewers: Jim Galasyn <jim.galasyn@confluent.io>, Luke Chen <showuon@gmail.com>, Boyang Chen <boyang@apache.org>
2021-07-12 16:16:29 -05:00
Bruno Cadonna 332db13047
HOTFIX: Fix verification of version probing (#10943)
Fixes and improves version probing in system test test_version_probing_upgrade().
2021-07-12 18:50:25 +02:00
A. Sophie Blee-Goldman 66e8b8b413
MINOR: StreamsPartitionAssignor should log the individual members of each client (#10996)
Log the specific StreamThreads participating in the rebalance for each client in the Streams application

Reviewers: Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>
2021-07-08 11:11:39 -07:00
dengziming 08757d0d19
MINOR: Add default serde in stream test to fix QA ERROR (#10958)
We changed the default serde in Streams to be null in #10813, but forgot to add some in tests, for example TestTopicsTest and TopologyTestDriverTest.

Reviewers: David Jacot <djacot@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2021-07-02 20:02:20 +02:00
Matthias J. Sax a095e1fd8c
KAFKA-10847: improve throughput of stream-stream join with spurious left/outer join fix (#10917)
The fix to avoid spurious left/outer stream-stream join results, showed
very low throughput for RocksDB, due to excessive creation of iterators.
Instead of trying to emit left/outer stream-stream join result for every
input record, this PR adds tracking of the lower timestamp bound of
left/outer join candidates, and only tries to emit them (and create an
iterator) if they are potentially old enough.

Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <guozhang@confluent.io>, Sergio Peña <sergio@confluent.io>
2021-07-01 15:46:22 -07:00
leah 4fd71a7ef1
KAFKA-9559: Change default serde to be `null` (#10813)
Implements KIP-741

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-01 14:58:29 -07:00
Justine Olshan cee2e975d1
KAFKA-13011; Update deleteTopics Admin API (#10892)
This patch adds two new apis to support topic deletion using topic IDs or names. It uses a new class `TopicCollection` to keep a collection of topics defined either by names or IDs. Finally, it modifies `DeleteTopicsResult` to support both names and IDs and deprecates the old methods which have become ambiguous. Eventually we will want to deprecate the old `deleteTopics` apis as well, but this patch does not do so.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-06-30 23:20:21 -07:00
Israel Ekpo b3905d9f71
KAFKA-8613: New APIs for Controlling Grace Period for Windowed Operations (#10926)
Implements KIP-633.

Grace-period is an important parameter and its best to make it the user's responsibility to set it expliclity. Thus, we move off to provide a default and make it a mandatory parameter when creating a window.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Luke Chen <showuon@gmail.com>, Matthias J. Sax <matthias@confluent.io>
2021-06-30 17:09:19 -07:00
Bruno Cadonna 19edbda164
Avoid increasing app ID when test is executed multiple times (#10939)
The integration test TaskMetadataIntegrationTest will increase
the length of the app ID when its test methods are called multiple
times in one execution. This is for example the case if you
repeatedly run the test until failure in IntelliJ IDEA. This might
also lead to exceptions because the state directory depends on the
app ID and directory names have a length limit.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-06-30 09:55:27 +02:00
Bruno Cadonna dc6805df73
MINOR: Improve test of log messages for dropped records (#10920)
Reviewers: Luke Chen <showuon@gmail.com>,  Boyang Chen <boyang@apache.org>
2021-06-29 13:00:35 +02:00
Juan Gonzalez-Zurita cfcabc368c
KAFKA-12718: SessionWindows are closed too early (#10824)
Session windows should not be close directly when "window end" time is reached, but "window close" time should be "window-end + gap + grace-period".

Reviewer: Matthias J. Sax <matthias@confluent.io>
2021-06-28 15:39:49 -07:00
Matthias J. Sax 2540b77769
KAFKA-12909: add missing tests (#10893)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-06-28 15:32:16 -07:00
Matthias J. Sax 670630ae5b
KAFKA-12951: restore must terminate for tx global topic (#10894)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Luke Chen <showuon@gmail.com>, Gasparina Damien <d.gasparina@gmail.com>
2021-06-28 14:10:25 -07:00
Josep Prat 6655a09e99
KAFKA-12849: KIP-744 TaskMetadata ThreadMetadata StreamsMetadata as API (#10840)
Implementation of KIP-744.

Creates new Interfaces for TaskMetadata, ThreadMetadata, and
StreamsMetadata, providing internal implementations for each of them.

Deprecates current TaskMetadata, ThreadMetadata under o.a.k.s.processor,
and SreamsMetadata under a.o.k.s.state.

Updates references on internal classes from deprecated classes to new interfaces.

Deprecates methods on KafkaStreams returning deprecated ThreadMeatada and
StreamsMetadata, and provides new ones returning the new interfaces.

Update Javadocs referencing to deprecated classes and methods to point
to the right ones.

Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@apache.org>
2021-06-25 18:31:49 +02:00
Geordie b0cfd1f4ca
KAFKA-12336 Custom stream naming does not work while calling stream[K… (#10190)
Custom stream naming does not work while calling stream[K, V](topicPattern: Pattern)

Reviewers: Bill Bejeck <bbejeck@apache.org>
2021-06-24 12:07:22 -04:00
Lee Dongjin 7da881ffbe
KAFKA-12928: Add a check whether the Task's statestore is actually a directory (#10862)
Throw an exception if a state directory exists as a regular file

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Luke Chen <showuon@gmail.com>
2021-06-22 16:35:30 -07:00
John Roesler c3475081c5
KAFKA-10546: Deprecate old PAPI (#10869)
* Deprecate the old Processor API
* Suppress warnings on all internal usages of the old API
  (which will be migrated in other child tickets of KAFKA-8410)
* Add new KStream#process methods, since KAFKA-10603 has not seen any action.
2021-06-22 09:17:11 -05:00
Bruno Cadonna 2ad9350cc1
MINOR: Remove obsolete variables for metric sensors (#10912)
This is a clean-up that we missed for "KIP-743: Remove config value 0.10.0-2.4 of Streams built-in metrics version config"

Reviewers: Luke Chen <showuon@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2021-06-22 14:10:28 +02:00
Bruno Cadonna b38cadca23
MINOR: Remove log warning for RocksDB 6+ upgrade (#10911)
Reviewers: Boyang Chen <boyang@apache.org>
2021-06-22 13:17:39 +02:00
Ismael Juma d27a84f70c
KAFKA-12945: Remove port, host.name and related configs in 3.0 (#10872)
They have been deprecated since 0.10.0. Full list of removes configs:
* port
* host.name
* advertised.port
* advertised.host.name

Also adjust tests to take the removals into account. Some tests were
no longer relevant and have been removed.

Finally, took the chance to:
* Clean up unnecessary usage of `KafkaConfig$.MODULE$` in
related files.
* Add missing `Test` annotations to `AdvertiseBrokerTest` and
make necessary changes for the tests to pass.

Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>
2021-06-17 05:32:34 -07:00
Matthias J. Sax 96767a60db
KAFKA-12909: disable spurious left/outer stream-stream join fix for old JoinWindows API (#10861)
We changed the behavior of left/outer stream-stream join via KAFKA-10847.
To avoid a breaking change during an upgrade, we need to disable this
fix by default.

We only enable the fix if users opt-in expliclity by changing their
code. We leverage KIP-633 (KAFKA-8613) that offers a new JoinWindows
API with mandatory grace-period to enable the fix.

Reviewers: Sergio Peña <sergio@confluent.io>, Israel Ekpo <israelekpo@gmail.com>, Guozhang Wang <guozhang@confluent.io>
2021-06-16 09:25:16 -07:00
Colin Patrick McCabe 135de5801e
KAFKA-12877: Make flexibleVersions mandatory (#10804)
Many Kafka protocol JSON files were accidentally configured to not use
flexible versions, since it was not on by default.  This PR requires
JSON files to specify a flexibleVersions value. If the JSON file does
not specify the flexibleVersions value, display an error message
suggesting the correct value to use for new messages.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-06-15 16:04:30 -07:00
Matthias J. Sax 01967e48a2
KAFKA-12914: StreamSourceNode should return `null` topic name for pattern subscription (#10846)
Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-06-15 00:59:55 -07:00
John Roesler 987391958d
MINOR: enable EOS during smoke test IT (#10870)
This IT has been failing on trunk recently. Enabling EOS during the integration test
makes it easier to be sure that the test's assumptions are really true during verification
and should make the test more reliable.

I also noticed that in the actual system test file, we are using the deprecated property
name "beta" instead of "v2".

Reviewers: Boyang Chen <boyang@apache.org>
2021-06-13 21:35:02 -05:00
Luke Chen 4724083a32
KAFKA-8940: decrease session timeout to make test faster and reliable (#10871)
While there might still be some issue about the test as described here by @ableegoldman , but I found the reason why this test failed quite frequently recently. It's because we increased the session timeout to 45 sec in KIP-735.

The reason why increasing session timeout affected this test is because in this test, we will keep adding new stream clients and remove old one, to maintain only 3 stream clients alive. The problem here is, when old stream closed, we won't trigger rebalance immediately due to the stream clients are all static members as described in KIP-345, which means, we will trigger trigger group rebalance only when session.timeout expired. That said, when old client closed, we'll have at least 45 sec with some tasks not working.

Also, in this test, we have 2 timeout conditions to fail this test before verification passed:

1. 6 minutes timeout
2. polling 30 times (each with 5 seconds) without getting any data. (that is, 5 * 30 = 150 sec without consuming any data)

For (1), in my test under 45 session timeout, we'll create 8 stream clients, which means, we'll have 5 clients got closed. And each closed client need 45 sec to trigger rebalance, so we'll have 45 * 5 = 225 sec (~4 mins) of the time having some tasks not working.
For (2), during new client created and old client closed, it need some time to do rebalance. With 45 session timeout, we only got ~100 sec left. In slow jenkins env, it might reach the 30 retries without getting any data timeout.

Therefore, decreasing session timeout can make this test completes faster and more reliable.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-06-13 09:49:05 -07:00
Josep Prat 787b4fe955
MINOR: clean up unneeded `@SuppressWarnings` (#10855)
Reviewers: Luke Chen <showuon@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2021-06-13 19:00:14 +08:00
Luke Chen 8eecb91419
KAFKA-9295: revert session timeout to default value (#10736)
Revert the hard-coded increased session timeout now that the default is 45s

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-06-10 13:57:25 -07:00
wycccccc 69d507590e
KAFKA-12924 Replace EasyMock and PowerMock with Mockito in streams metrics tests (#10850)
Reviewers: John Roesler <vvcephei@apache.org>, Ismael Juma <ijuma@apache.org>
2021-06-10 15:21:46 -05:00
Lee Dongjin 7cd09b6a14
KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup (#9414)
1. Update StateDirectory#clean
  - Delete application's statestore directory in cleanup process if it is empty.
2. Add Tests
  - StateDirectoryTest#shouldDeleteAppDirWhenCleanUpIfEmpty: asserting the empty application directory is deleted with StateDirectory#clean.
  - StateDirectoryTest#shouldNotDeleteAppDirWhenCleanUpIfNotEmpty: asserting the non-empty application directory is not deleted with StateDirectory#clean and appropriate log message is generated.
  - Add Integration test: StateDirectoryIntegrationTest
3. Improve EOSUncleanShutdownIntegrationTest: test all available cases regarding cleanup process on unclean shutdown.

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <guozhang@apache.org>
2021-06-10 14:59:57 -05:00
Josep Prat d496103864
MINOR: Small optimizations and removal of unused code in Streams (#10856)
Remove unused methods in internal classes
Mark fields that can be final as final
Remove unneeded generic type annotation
Convert single use fields to local final variables
Use method reference in lambdas when it's more readable

Reviewers: Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <cadonna@apache.org>
2021-06-10 16:51:31 +02:00
wycccccc 8b6752d183
KAFKA-12905: Replace EasyMock and PowerMock with Mockito for NamedCacheMetricsTest (#10835)
* Development of EasyMock and PowerMock has stagnated while Mockito continues to be actively developed. With the new Java cadence, it's a problem to depend on libraries that do bytecode generation and are not actively maintained. In addition, Mockito is also easier to use.KAFKA-7438

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2021-06-10 12:18:19 +02:00
Matthias J. Sax 953ec98100
MINOR: Improve Kafka Streams JavaDocs with regard to record metadata (#10810)
Reviewers: Luke Chen <howuon@gmail.com>, Josep Prat <josep.prat@aiven.io>, John Roesler <john@confluent.io>
2021-06-09 22:51:36 -07:00
Jason Gustafson a75b5c635b
KAFKA-12874; Increase default consumer session timeout to 45s (#10803)
This patch increases the default consumer session timeout to 45s as documented in KIP-735: https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout.

Reviewers: Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>, David Jacot <djacot@confluent.io>
2021-06-09 15:09:31 -07:00
Matthias J. Sax b5f7ce8b7b
KAFKA-12815: Update JavaDocs of ValueTransformerWithKey (#10731)
Reviewers: Luke Chen <howuon@gmail.com>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-06-07 22:04:33 -07:00
A. Sophie Blee-Goldman 48379bd6e5
KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure (#10609)
This PR includes adding the NamedTopology to the Subscription/AssignmentInfo, and to the StateDirectory so it can place NamedTopology tasks within the hierarchical structure with task directories under the NamedTopology parent dir.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-06-07 15:38:12 -07:00
Guozhang Wang b2d463aa12
KAFKA-8897 Follow-up: Consolidate the global state stores (#10646)
1. When register state stores, add the store to globalStateStores before calling any blocking calls that may throw errors, so that upon closing we would close the stores as well.
2. Remove the other list as a local field, and call topology.globalStateStores when needed to get the list.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2021-06-04 08:30:14 -07:00
Viswanathan Ranganathan 93dca8ebd9
KAFKA-12749: Changelog topic config on suppressed KTable lost (#10664)
Refactored logConfig to be passed appropriately when using shutDownWhenFull or emitEarlyWhenFull. Removed the constructor that doesn't accept a logConfig parameter so you're forced to specify it explicitly, whether it's empty/unspecified or not.

Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewers: Walker Carlson <wcarlson@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2021-06-03 20:00:19 +02:00
Vito Jeng c2c08b41f2
MINOR: Apply try-with-resource to KafkaStreamsTest (#10668)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Matthias J. Sax <matthias@confluent.io>
2021-06-02 21:13:09 -07:00
Bruno Cadonna cfe642edee
KAFKA-12519: Remove built-in Streams metrics for versions 0.10.0-2.4 (#10765)
As specified in KIP-743, this PR removes the built-in metrics
in Streams that are superseded by the refactoring proposed in KIP-444.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Luke Chen <showuon@gmail.com>
2021-06-01 14:05:08 +02:00
John Roesler f207bac20c
KAFKA-8410: KTableProcessor migration groundwork (#10744)
* Lay the groundwork for migrating KTable Processors to the new PAPI.
* Migrate the KTableFilter processor to prove that the groundwork works.

This is an effort to help break up #10507 into multiple PRs.

Reviewers: Boyang Chen <boyang@apache.org>
2021-05-28 14:59:35 -05:00
Ismael Juma e4b3a3cdeb
MINOR: Adjust parameter ordering of `waitForCondition` and `retryOnExceptionWithTimeout` (#10759)
New parameters in overloaded methods should appear later apart from
lambdas that should always be last.
2021-05-27 06:25:00 -07:00
A. Sophie Blee-Goldman bbe170af70
MINOR: deprecate TaskMetadata constructor and add KIP-740 notes to upgrade guide (#10755)
Quick followup to KIP-740 to actually deprecate this constructor, and update the upgrade guide with what we changed in KIP-740. I also noticed the TaskId#parse method had been modified previously, and should be re-added to the public TaskId class. It had no tests, so now it does

Reviewers: Matthias J. Sax <mjsax@confluent.io>, Luke Chen <showuon@gmail.com>
2021-05-26 10:35:12 -07:00
Matthias J. Sax 77573d88c8
MINOR: add window verification to sliding-window co-group test (#10745)
Reviewers: Luke Chen <showuon@gmail.com>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-05-25 23:43:33 -07:00
Luke Chen efb7cda178
MINOR: update java doc for deprecated methods (#10722)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-23 18:33:01 -07:00
Ismael Juma 47796d2f87
MINOR: Fix deprecation warnings in SlidingWindowedCogroupedKStreamImplTest (#10703)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-22 14:22:42 -07:00
Boyang Chen ae8b784537
KAFKA-12499: add transaction timeout verification (#10482)
This PR tries to add the check for transaction timeout for a comparison against commit interval of streams. If transaction timeout is smaller than commit interval, stream should crash and inform user to update their commit interval to be larger or equal to the given transaction timeout, or vise versa.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-05-21 15:05:39 -07:00
Josep Prat b46e17b1d7
KAFKA-12808: Remove Deprecated Methods under StreamsMetrics (#10724)
Removal of methods already deprecated since 2.5.
Adapt test to use the new alternative method.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-05-21 12:37:35 +02:00
A. Sophie Blee-Goldman b56d9e4416
KAFKA-12779: KIP-740, Clean up public API in TaskId and fix TaskMetadata#taskId() (#10735)
As described in KIP-740, we clean up the public TaskId class and introduce new APIs to return it from TaskMetadata

Reviewers: Guozhang Wang <guozhang@confluent.io>
2021-05-20 15:01:23 -07:00
Josep Prat 26b5352260
KAFKA-12814: Remove Deprecated Method StreamsConfig getConsumerConfigs (#10737)
Removes method deprecated via KIP-276.

Reviewer: Matthias J. Sax <matthias@confluent.io>
2021-05-20 14:32:51 -07:00
Josep Prat e23ede1ece
KAFKA-12809: Remove deprecated methods of Stores factory (#10729)
Removes methods deprecated via KIP-319 and KIP-358.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-19 16:07:35 -07:00
Josep Prat 0af37730fc
KAFKA-12813: Remove deprecated schedule method in ProcessorContext (#10730)
Removes methods deprecated via KIP-358.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-19 16:03:01 -07:00
Matthias J. Sax 476eccb968
KAFKA-12815: Preserve context for KTable.transformValues when getting value from upstream state store (#10720)
Reviewers: Victoria Xia <victoria.xia@confluent.io>, John Roesler <john@confluent.io>
2021-05-19 14:58:46 -07:00
Josep Prat b58da356be
KAFKA-12810: Remove deprecated TopologyDescription.Source#topics (#10727)
Removes methods that were deprecated via KIP-321.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-19 11:49:26 -07:00
Luke Chen e11f249327
KAFKA-9295: increase session timeout to fix flaky KTableKTableForeignKeyInnerJoinMultiIntegrationTest (#10715)
Increase session timeout to fix flaky KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-18 23:00:43 -07:00
A. Sophie Blee-Goldman 3a42baa260
HOTFIX: undo renaming of public part of Subtopology API (#10713)
In #10676 we renamed the internal Subtopology class that implemented the TopologyDescription.Subtopology interface. By mistake, we also renamed the interface itself, which is a public API. This wasn't really the intended point of that PR, so rather than do a retroactive KIP, let's just reverse the renaming.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-05-18 12:59:20 -07:00
A. Sophie Blee-Goldman cc6f4c49a9
KAFKA-12574: remove internal Producer config and auto downgrade logic (#10675)
Minor followup to #10573. Removes this internal Producer config which was only ever used to avoid a very minor amount of work to downgrade the consumer group metadata in the txn commit request in Kafka Streams

Reviewers: Ismael Juma <ismael@juma.me.uk>, Matthias J. Sax <mjsax@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-05-17 10:25:35 -07:00
vamossagar12 b9acc492a5
KAFKA-12313: KIP-725: Streamlining configs for Windowed Deserialisers (#10542)
This PR aims to streamline the configurations for WindowedDeserialisers as described in KIP-725. It deprecates default.windowed.key.serde.inner and default.windowed.value.serde.inner configs in StreamConfig and adds windowed.inner.class.serde. 

Reviewers: Anna Sophie Blee-Goldman<ableegoldman@apache.org>
2021-05-17 10:17:31 -07:00
Walker Carlson f2785f3c4f
KAFKA-12754: Improve endOffsets for TaskMetadata (#10634)
Improve endOffsets for TaskMetadata by updating immediately after polling a new batch

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-14 12:17:31 -07:00
A. Sophie Blee-Goldman 4153e754f1
MINOR: prevent cleanup() from being called while Streams is still shutting down (#10666)
Currently KafkaStreams#cleanUp only throw an IllegalStateException if the state is RUNNING or REBALANCING, however the application could be in the process of shutting down in which case StreamThreads may still be running. We should also throw if the state is PENDING_ERROR or PENDING_SHUTDOWN

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-05-13 16:16:35 -07:00
Daniyar Yeralin 6d1ae8bc00
KAFKA-8326: Introduce List Serde (#6592)
Introduce List serde for primitive types or custom serdes with a serializer and a deserializer according to KIP-466

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias J. Sax <mjsax@conflunet.io>, John Roesler <roesler@confluent.io>, Michael Noll <michael@confluent.io>
2021-05-13 15:54:00 -07:00
A. Sophie Blee-Goldman 4b2736570c
KAFKA-12648: MINOR - Add TopologyMetadata.Subtopology class for subtopology metadata (#10676)
Introduce a Subtopology class to wrap the topicGroupId and namedTopology metadata.

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-05-13 11:25:18 -07:00
Vito Jeng fae0784ce3
KAFKA-5876: KIP-216 Part 4, Apply InvalidStateStorePartitionException for Interactive Queries (#10657)
KIP-216, part 4 - apply InvalidStateStorePartitionException

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-10 17:29:58 -07:00
Guozhang Wang 25f4ee879c
KAFKA-12747: Fix flakiness in shouldReturnUUIDsWithStringPrefix (#10643)
Consecutive UUID generation could result in same prefix.

Reviewers: Josep Prat <josep.prat@aiven.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-10 12:32:51 -07:00
Jorge Esteban Quilcate Otoya 8f8f914efc
KAFKA-12536: Add Instant-based methods to ReadOnlySessionStore (#10390)
Implements: KIP-666 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-666%3A+Add+Instant-based+methods+to+ReadOnlySessionStore)

Reviewers: John Roesler <vvcephei@apache.org>
2021-05-07 13:24:41 -05:00
Sergio Peña 45d7440c15
KAFKA-10847: Set StreamsConfig on InternalTopologyDriver before writing topology (#10640)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2021-05-06 17:27:23 -07:00
Bruno Cadonna 90fc875e24
KAFKA-8897: Upgrade RocksDB to 6.19.3 (#10568)
This PR upgrades RocksDB to 6.19.3. After the upgrade the Gradle build exited with code 134 due to SIGABRT signals ("Pure virtual function called!") coming from the C++ part of RocksDB. This error was caused by RocksDB state stores not properly closed in Streams' code. This PR adds the missing closings and updates the RocksDB option adapter.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2021-05-06 15:29:26 -07:00
Jorge Esteban Quilcate Otoya 12a1e68aeb
KAFKA-12451: Remove deprecation annotation on long-based read operations in WindowStore (#10296)
Complete https://cwiki.apache.org/confluence/display/KAFKA/KIP-667%3A+Remove+deprecated+methods+from+ReadOnlyWindowStore by removing deprecation annotation on long-based read operations in WindowStore.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-05-06 11:25:23 -07:00
Bruno Cadonna 94be57d610
MINOR: Fix formatting in RelationalSmokeTest (#10639)
Fixes formatting in RelationalSmokeTest.

Reviewers: Leah Thomas <lthomas@confluent.io>
2021-05-06 17:25:02 +02:00
leah 03690d7a1f
MINOR: Stop using hamcrest in system tests (#10631)
We currently use hamcrest imports to check the outputs of the RelationalSmokeTest, but with the new gradle updates, the proper hamcrest imports are no longer included in the test jar.

This is a bit of a workaround to remove the hamcrest usage so we can get system tests up and running again. Potential follow-up could be to update the way we create the test-jar to pull in the proper dependencies.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-05-06 12:05:36 +02:00
Luke Chen 80aea23beb
KAFKA-9295: increase startup timeout for flaky test in KTableKTableForeignKeyInnerJoinMultiIntegrationTest (#10635)
Try to address the extreme flakiness of shouldInnerJoinMultiPartitionQueryable since the recent test cleanup. Since we need to wait for 3 streams reach RUNNING state, it makes sense to increase the waiting time to make the test more reliable.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-05 21:58:27 -07:00
Matthias J. Sax 6a5992a814
KAFKA-8531: Change default replication factor config (#10532)
Implements KIP-733

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-05-05 16:16:06 -07:00
Sergio Peña d915ce58d2
KAFKA-10847: Set shared outer store to an in-memory store when in-memory stores are supplied (#10613)
When users supply in-memory stores for left/outer joins, then the internal shared outer store must be switch to in-memory store too. This will allow users who want to keep all stores in memory to continue doing so.

Added unit tests to validate topology and left/outer joins work fine with an in-memory shared store.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-05-05 10:21:43 -07:00
vamossagar12 9a71468cb0
KAFKA-10767: Adding test cases for all, reverseAll and reverseRange for ThreadCache (#9779)
The test cases for ThreaCache didn't have the corresponding unit tests for all, reverseAll and reverseRange methods. This PR aims to add the same.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-05-05 12:26:51 +02:00
Jorge Esteban Quilcate Otoya 45f24c4195
KAFKA-12450: Remove deprecated methods from ReadOnlyWindowStore (#10294)
Implement first part of https://cwiki.apache.org/confluence/display/KAFKA/KIP-667%3A+Remove+deprecated+methods+from+ReadOnlyWindowStore.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-05-04 09:23:30 -07:00
Sergio Peña 62221edaff
KAFKA-10847: Add internal flag to disable KAFKA-10847 fix (#10612)
Adds an internal flag that can be used to disable the fixes in KAFKA-10847. It defaults to true if the flag is not set or has an invalid boolean value.

The flag is named __enable.kstreams.outer.join.spurious.results.fix__. This flag is considered internal only. It is a temporary flag that will be used to help users to disable the join fixes while they do a transition from the previous semantics of left/outer joins. The flag may be removed in future releases.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-05-03 14:10:05 -07:00
Vito Jeng 816f5c3b86
KAFKA-5876: KIP-216 Part 3, Apply StreamsNotStartedException for Interactive Queries (#10597)
KIP-216 Part 3: Throw StreamsNotStartedException if KafkaStreams state is CREATED

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-03 13:53:35 -07:00
Guozhang Wang bee3cf7d98
MINOR: Remove unused Utils.delete (#10622)
Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-01 08:32:14 -07:00
Guozhang Wang 3ec6317ee6
KAFKA-12683: Remove deprecated UsePreviousTimeOnInvalidTimestamp (#10557)
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-01 08:31:41 -07:00
A. Sophie Blee-Goldman 16b2ce7da7
KAFKA-12648: basic skeleton API for NamedTopology (#10615)
Just the API for NamedTopology.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-04-30 22:46:00 -07:00
Valery Kokorev e454becb33
KAFKA-12396: added null check for state stores key (#10548)
Reviewers: Bruno Cadonna <bruno@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2021-04-29 19:47:26 -07:00
A. Sophie Blee-Goldman 9dbf2226cd
MINOR: clean up some remaining locking stuff in StateDirectory (#10608)
Minor followup to #10342 that I noticed while working on the NamedTopology stuff. Cleans up a few things:

We no longer need locking for the global state directory either, since it's contained within the top-level state directory lock. Definitely less critical than the task directory locking, since it's less vulnerable to IOExceptions given that it's just locked and unlocked once during the application lifetime, but nice to have nonetheless
Clears out misc. usages of the LOCK_FILE_NAME that no longer apply. This has the awesome side effect of finally being able to actually delete obsolete task directories, whereas previously we had to leave behind the empty directory due to a ridiculous Windows bug (though I'm sure they would claim "it's not a bug it's a feature" 😉 )
Lazily delete old-and-now-unused lock files in the StateDirectory#taskDirIsEmpty method to clean up the state directory for applications that upgraded from an older version that still used task locking

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-04-29 12:30:48 -07:00
Sergio Peña bf359f8e29
KAFKA-10847: Fix spurious results on left/outer stream-stream joins (#10462)
Fixes the issue with https://issues.apache.org/jira/browse/KAFKA-10847.

To fix the above problem, the left/outer stream-stream join processor uses a buffer to hold non-joined records for some time until the window closes, so they are not processed if a join is found during the join window time. If the window of a record closes and a join was not found, then this should be emitted and processed by the consequent topology processor.

A new time-ordered window store is used to temporary hold records that do not have a join and keep the records keys ordered by time. The KStreamStreamJoin has a reference to this new store . For every non-joined record seen, the processor writes it to this new state store without processing it. When a joined record is seen, the processor deletes the joined record from the new state store to prevent further processing.

Records that were never joined at the end of the window + grace period are emitted to the next topology processor. I use the stream time to check for the expiry time for determinism results . The KStreamStreamJoin checks for expired records and emit them every time a new record is processed in the join processor.

The new state store is shared with the left and right join nodes. The new store needs to serialize the record keys using a combined key of <joinSide-recordKey>. This key combination helps to delete the records from the other join if a joined record is found. Two new serdes are created for this, KeyAndJoinSideSerde which serializes a boolean value that specifies the side where the key is found, and ValueOrOtherValueSerde that serializes either V1 or V2 based on where the key was found.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-04-28 17:57:28 -07:00