Vector Similarity Search

A vector similarity search in Hippo calculates the distance between query vectors and vectors in the table, and returns the most similar results. With specified scalar filter condition, users can perform a hybrid search.

curl -u shiva:shiva -XGET 'localhost:8902/hippo/v1/{table}/_search?pretty' -H 'Content-Type: application/json' -d'{
  "output_fields": ["book_id"],
  "search_params": {
    "anns_field": "book_intro",
    "topk": 2,
    "params": {
      "nprobe": 10
    },
    "embedding_index": "ivf_flat_index"
  },
  "vectors": [ [0.1,0.2], [0.3, 0.4] ],
  "round_decimal": 2,
  "only_explain" : false
}';

Result:

{
  "num_queries" : 2,
  "top_k" : 2,
  "results" : [
    {
      "query" : 0,
      "fields_data" : [
        {
          "field_name" : "book_id",
          "field_values" : [1,2]
        }
      ],
      "scores" : [1.45,4.25]
    },
    {
      "query" : 1,
      "fields_data" : [
        {
          "field_name" : "book_id",
          "field_values" : [1,2]
        }
      ],
      "scores" : [0.85,3.25]
    }
  ]
}

Parameter description:

ParametersDescriptionRequired
tableTable name, such as "book" created in this exampleYes
database_nameDatabase where the table is locatedNo, defaults to "default" database
output_fieldsName of the field to returnYes
anns_fieldName of the field to search onYes
topkThe number of records to returnYes
paramsSearch parameter specific to vector indexNo
embedding_indexVector index used in current searchNo, uses the first activated vector index built on the field specified in "anns_field" by default
vectorsQuery vectorYes
Table 35 Vector Similarity Search (Python API)

📘

Kindly note

  1. Hippo will return the results in reverse order based on the similarity. The first result is the most similar one.
  2. The meaning of scores returned as shown in above example is dependent on the metric type used in index-building process:
    • For L2, scores represent the distance between vectors. The shorter the distance between vectors is, the more similar they are. Thus, the scores will be returned in ascending order.
    • For Cosine or IP, scores represent the similarity. Thus the scores will be returned in descending order.

When "only_explain" is set to true, it is recommended that users can use set for formatting. The example is shown below:

curl -u shiva:shiva -XGET 'localhost:8902/hippo/v1/{table}/_search?pretty' -H 'Content-Type: application/json' -d'{
  "output_fields": ["book_id"],
  "search_params": {
    "anns_field": "book_intro",
    "topk": 2,
    "params": {
      "nprobe": 10
    },
    "embedding_index": "ivf_flat_index"
  },
  "vectors": [ [0.1,0.2], [0.3, 0.4] ],
  "round_decimal": 2,
  "only_explain" : true
}'|sed 's/\\n/\ /g';

The output example when using explain is:

{
  "num_queries" : 2,
  "top_k" : 2,
  "explanations" : [
    {
      "tablet" : "default#byte_table#4e3caf34c22e41349b6ad8a846781ce0@0",
      "explanation" : "client schema : (     book_id INT64 NOT NULL,     PRIMARY KEY () ) embedding scan path : (     vector column book_intro     embedding index 0 ) execution plan type : naive plan current engine meta : current schema 0, (     0:book_id INT64 NOT NULL,     1:word_count INT64 NULLABLE,     2:book_intro BINARY_VECTOR(8) NULLABLE,     3:__schemaversion__ INT64 NULLABLE virtual,     4:__rowid__ INT64 NULLABLE virtual,     PRIMARY KEY (book_id) ) current index number 1 PUBLIC index -1 : (book_id INT64 NOT NULL) "
    }
  ]
}

Here is also an output example using "with_profile". With "only_explain" and "with_profile", users can check conduct detailed analysis on a specified query.

{
  "num_queries" : 2,
  "top_k" : 2,
  "results" : [
    {
      "query" : 0,
      "fields_data" : [
        {
          "field_name" : "book_id",
          "field_values" : [
            1,
            2
          ]
        }
      ],
      "scores" : [
        0,
        1
      ]
    },
    {
      "query" : 1,
      "fields_data" : [
        {
          "field_name" : "book_id",
          "field_values" : [
            40,
            36
          ]
        }
      ],
      "scores" : [
        1,
        2
      ]
    }
  ],
  "profiles" : [
    {
      "tablet" : 0,
      "ann search" : {
        "cost" : "3 Milliseconds",
        "optimize" : {
          "cost" : "0 Milliseconds"
        },
        "init iterators" : {
          "cost" : "0 Milliseconds"
        },
        "naive embedding search" : {
          "cost" : "3 Milliseconds",
          "ann index search" : {
            "cost" : "3 Milliseconds"
          },
          "batch get data from db" : {
            "cost" : "0 Milliseconds"
          },
          "decode, eval and fill matched results" : {
            "cost" : "0 Milliseconds"
          }
        }
      }
    }
  ]
}