Late-interaction models are powerful rerankers. While their size and
overall cost doesn't lend itself for HNSW indexing, utilizing them as
second order "brute-force" reranking can provide excellent boosts in
relevance. At generally lower inference times than large cross-encoders.
This commit exposes a new experimental `rank_vectors` field that allows
for maxSim operations. This unlocks the initial, and most common use of
late-interaction dense-models.
For example, this is how you would use it via the API:
```
PUT index
{
"mappings": {
"properties": {
"late_interaction_vectors": {
"type": "rank_vectors"
}
}
}
}
```
Then to index:
```
POST index/_doc
{
"late_interaction_vectors": [[0.1, ...],...]
}
```
For querying, scoring can be exposed with scripting:
```
POST index/_search
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "maxSimDotProduct(params.query_vector, 'my_vector')",
"params": {
"query_vector": [[0.42, ...], ...]
}
}
}
}
}
```
Of course, the initial ranking should be done before re-scoring or
combining via the `rescore` parameter, or simply passing whatever first
phase retrieval you want as the inner query in `script_score`.