Using a static `diff` or epsilon just doesn't work for this test as the
scores can be very large, but relatively close.
Maybe there is a simpler way, but my mind wasn't wanting to "math" very
much.
For example, the seed that this previously failed on had scores like
`1.726524E9` and `1.7265239E9`, which, given their size, are really
close together (within 128). But a static epsilon wouldn't capture that.
closes: https://github.com/elastic/elasticsearch/issues/128485