Rewrites how continuous data frame transforms calculates and handles buckets that require an update. Instead of storing the whole set in memory, it pages through the updates using a 2nd cursor. This lowers memory consumption and prevents problems with limits at query time (max_terms_count). The list of updates can be re-retrieved in a failure case (#43662)
* With the removal of the incompatible snapshots list in RepositoryData
the get snapshots and get all snapshots methods are equivalent so I
removed one of them
* The assertion added in #44214 is tripped by tests running dedicated
test clusters per test needlessly.This breaks existing tests like the one in #44245.
* Closes#44245
This change makes the process of verifying the signature of
official plugins FIPS 140 compliant by defaulting to use the
BouncyCastle FIPS provider and adding a dependency to bcpg-fips
that implement parts of openPGP in a FIPS compliant manner.
In already FIPS 140 enabled environments that use the
BouncyCastle FIPS provider, the bcfips dependency is redundant
but doesn't cause an issue as it will be added only in the classpath
of the cli-tools
* Fix port range allocation with large worker IDs
Relates to #43983
The IDs gradle uses are incremented for the lifetime of the daemon which
can result in port ranges that are outside the valid range.
This change implements a modulo based formula to wrap the port ranges
when the IDs get too large.
Adresses #44134 but #44157 is also required to be able to close it.
Today if a master-eligible node is converted to a master-ineligible node it may
remain in the voting configuration, meaning that the master node may count its
publish responses as an indication that it has properly persisted the cluster
state. However master-ineligible nodes do not properly persist the cluster
state, so it is not safe to count these votes.
This change adjusts `CoordinationState` to take account of this from a safety
point of view, and also adjusts the `Coordinator` to prevent such nodes from
joining the cluster. Instead, it triggers a reconfiguration to remove from the
voting configuration a node that now appears to be master-ineligible before
processing its join.
* Remove Redundant Setting of OP_WRITE Interest
* We shouldn't have to set OP_WRITE interest before running into a partial write. Since setting OP_WRITE is handled by the `eventHandler.postHandling` logic, I think we can simply remove this operation and simplify/remove tests that were testing the setting of the write interest
There are currently 3 variants of TransportAction.execute. The
implementations of these require additional ctor arguments to all
TransportAction implementations. While the non test uses can be
converted to using NodeClient to execute other actions, using that for
test cases would be cumbersome and defeat the purpose of unit tests
testing an action's implementation directly. This commit adds a public
test-only utility method for test to use to call execute. This method
will continue to be available when the execute implementations are
collapsed and made package private.
relates #43881
Fixes a bug in the PKI authentication. This manifests when there
are multiple PKI realms configured in the chain, with different
principal parse patterns. There are a few configuration scenarios
where one PKI realm might parse the principal from the Subject
DN (according to the `username_pattern` realm setting) but
another one might do the truststore validation (according to
the truststore.* realm settings).
This is caused by the two passes through the realm chain, first to
build the authentication token and secondly to authenticate it, and
that the X509AuthenticationToken sets the principal during
construction.
* Add l1norm and l2norm distances for vectors
Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947
* Address Christoph's feedback
- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims
* Made examples more realistic
For this test, we randomize the CLUSTER_MAX_CLAUSE_COUNT on test setup
(@BeforeClass) between 50 and 100. Some queries in the test generate 56 clauses
which hasn't been an issue before LUCENE-8811, but we slightly need to increase
the minimal possible clause count now.
Closes#44192
This will help in investigations where the real memory circuit breaker is tripped to better understand
on what the actual memory is used, i.e. whether it's a temporary thing (e.g. requests) in contrast to
more permanently allocated memory (e.g. accounting).
* Safer Shard Snapshot Delete
* We shouldn't delete the snapshot meta file before we update the index
in the shard folder. If we fail to update the index-N after deleting the
existing index-N is broken because the snap- blob it references is gone.
* Fix ShrinkIndexIT
* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes#44164
With connection management now being non-blocking, we can make NodeConnectionsService
avoid the use of MANAGEMENT threads that are blocked during the connection attempts.
I had to fiddle a bit with the tests as testPeriodicReconnection was using both the mock Threadpool
from the DeterministicTaskQueue as well as the real ThreadPool initialized at the test class level,
which resulted in races.
* Improoce how log is tailed in testclusters on failure
- only print last few lines
- print all errors and warnings
- compact repeating errors and warnings
Now that ML job configs are stored in an index rather than
cluster state, availability of the .ml-config index is very
important to the operation of ML. When a cluster starts up
the ML persistent tasks will be considered for node
assignment very early on. It is best in this case if
assignment is deferred until after the .ml-config index is
available.
The introduction of data frame analytics jobs has made this
problem worse, because anomaly detection jobs already waited
for the primary shards of the .ml-state, .ml-anomalies-shared
and .ml-meta indices to be available before doing node
assignment, and by coincidence this would probably lead to
the primary shards of .ml-config also being searchable. But
data frame analytics jobs had no other index checks prior to
this change.
This fixes problem 2 of #44156
Simplifies AbstractSimpleTransportTestCase to use JVM-local ports and also adds an assertion so
that cases like #44134 can be more easily debugged. The likely reason for that one is that a test,
which was repeated again and again while always spawning a fresh Gradle worker (due to Gradle
daemon) kept increasing Gradle worker IDs, causing an overflow at some point.
Two new settings were introduced in #43669 (bb130f5) to control the
behaviour of the Document Level Security BitSet cache.
This change adds documentation for these 2 settings.