elasticsearch/libs/native
Chris Hegarty b486d90ea2
Add low-level optimized Neon, AVX2, and AVX 512 float32 vector operations (#130635)
This commit adds low-level optimized Neon, AVX2, and AVX 512 float32 vector operations; cosine, dot product, and square distance.

The changes in this PR give approximately 2x performance increase for float32 vector operations across Linux/ Mac AArch64 and Linux x64 (both AVX2 and AVX 512).

The performance increase comes mostly from being able to score the vectors off-heap (rather than copying on-heap before scoring). The low-level native scorer implementations show only approx ~3-5% improvement over the existing Panama Vector implementation. However, the native scorers allow to score off-heap. The use of Panama Vector with MemorySegments runs into a performance bug in Hotspot, where the bound is not optimally hoisted out of the hot loop (has been reported and acknowledged by OpenJDK) .

This vector ops will be used by higher-level vector scorers in #130541
2025-07-04 16:49:35 +01:00
..
libraries
src
build.gradle