A vector similarity search in Hippo calculates the distance between query vectors and vectors in the table, and returns the most similar results. With specified scalar filter condition, users can perform a hybrid search.
curl -u shiva:shiva -XGET 'localhost:8902/hippo/v1/{table}/_search?pretty' -H 'Content-Type: application/json' -d'{
"output_fields": ["book_id"],
"search_params": {
"anns_field": "book_intro",
"topk": 2,
"params": {
"nprobe": 10
},
"embedding_index": "ivf_flat_index"
},
"vectors": [ [0.1,0.2], [0.3, 0.4] ],
"round_decimal": 2,
"only_explain" : false
}';
Result:
{
"num_queries" : 2,
"top_k" : 2,
"results" : [
{
"query" : 0,
"fields_data" : [
{
"field_name" : "book_id",
"field_values" : [1,2]
}
],
"scores" : [1.45,4.25]
},
{
"query" : 1,
"fields_data" : [
{
"field_name" : "book_id",
"field_values" : [1,2]
}
],
"scores" : [0.85,3.25]
}
]
}
Parameter description:
Parameters | Description | Required |
---|---|---|
table | Table name, such as "book" created in this example | Yes |
database_name | Database where the table is located | No, defaults to "default" database |
output_fields | Name of the field to return | Yes |
anns_field | Name of the field to search on | Yes |
topk | The number of records to return | Yes |
params | Search parameter specific to vector index | No |
embedding_index | Vector index used in current search | No, uses the first activated vector index built on the field specified in "anns_field" by default |
vectors | Query vector | Yes |
Kindly note
- Hippo will return the results in reverse order based on the similarity. The first result is the most similar one.
- The meaning of scores returned as shown in above example is dependent on the metric type used in index-building process:
- For L2, scores represent the distance between vectors. The shorter the distance between vectors is, the more similar they are. Thus, the scores will be returned in ascending order.
- For Cosine or IP, scores represent the similarity. Thus the scores will be returned in descending order.
When "only_explain" is set to true, it is recommended that users can use set for formatting. The example is shown below:
curl -u shiva:shiva -XGET 'localhost:8902/hippo/v1/{table}/_search?pretty' -H 'Content-Type: application/json' -d'{
"output_fields": ["book_id"],
"search_params": {
"anns_field": "book_intro",
"topk": 2,
"params": {
"nprobe": 10
},
"embedding_index": "ivf_flat_index"
},
"vectors": [ [0.1,0.2], [0.3, 0.4] ],
"round_decimal": 2,
"only_explain" : true
}'|sed 's/\\n/\ /g';
The output example when using explain is:
{
"num_queries" : 2,
"top_k" : 2,
"explanations" : [
{
"tablet" : "default#byte_table#4e3caf34c22e41349b6ad8a846781ce0@0",
"explanation" : "client schema : ( book_id INT64 NOT NULL, PRIMARY KEY () ) embedding scan path : ( vector column book_intro embedding index 0 ) execution plan type : naive plan current engine meta : current schema 0, ( 0:book_id INT64 NOT NULL, 1:word_count INT64 NULLABLE, 2:book_intro BINARY_VECTOR(8) NULLABLE, 3:__schemaversion__ INT64 NULLABLE virtual, 4:__rowid__ INT64 NULLABLE virtual, PRIMARY KEY (book_id) ) current index number 1 PUBLIC index -1 : (book_id INT64 NOT NULL) "
}
]
}
Here is also an output example using "with_profile". With "only_explain" and "with_profile", users can check conduct detailed analysis on a specified query.
{
"num_queries" : 2,
"top_k" : 2,
"results" : [
{
"query" : 0,
"fields_data" : [
{
"field_name" : "book_id",
"field_values" : [
1,
2
]
}
],
"scores" : [
0,
1
]
},
{
"query" : 1,
"fields_data" : [
{
"field_name" : "book_id",
"field_values" : [
40,
36
]
}
],
"scores" : [
1,
2
]
}
],
"profiles" : [
{
"tablet" : 0,
"ann search" : {
"cost" : "3 Milliseconds",
"optimize" : {
"cost" : "0 Milliseconds"
},
"init iterators" : {
"cost" : "0 Milliseconds"
},
"naive embedding search" : {
"cost" : "3 Milliseconds",
"ann index search" : {
"cost" : "3 Milliseconds"
},
"batch get data from db" : {
"cost" : "0 Milliseconds"
},
"decode, eval and fill matched results" : {
"cost" : "0 Milliseconds"
}
}
}
}
]
}