Vector Similarity Search

A vector similarity search in Hippo calculates the distance between query vectors and vectors in the table, and returns the most similar results. With specified scalar filter condition, users can perform a hybrid search.

curl -u shiva:shiva -XGET 'localhost:8902/hippo/v1/{table}/_search?pretty' -H 'Content-Type: application/json' -d'{
  "output_fields": ["book_id"],
  "search_params": {
    "anns_field": "book_intro",
    "topk": 2,
    "params": {
      "nprobe": 10
    },
    "embedding_index": "ivf_flat_index"
  },
  "vectors": [ [0.1,0.2], [0.3, 0.4] ],
  "round_decimal": 2,
  "only_explain" : false
}';

Result:

{
  "num_queries" : 2,
  "top_k" : 2,
  "results" : [
    {
      "query" : 0,
      "fields_data" : [
        {
          "field_name" : "book_id",
          "field_values" : [1,2]
        }
      ],
      "scores" : [1.45,4.25]
    },
    {
      "query" : 1,
      "fields_data" : [
        {
          "field_name" : "book_id",
          "field_values" : [1,2]
        }
      ],
      "scores" : [0.85,3.25]
    }
  ]
}

Parameter description:

Parameters	Description	Required
table	Table name, such as "book" created in this example	Yes
database_name	Database where the table is located	No, defaults to "default" database
output_fields	Name of the field to return	Yes
anns_field	Name of the field to search on	Yes
topk	The number of records to return	Yes
params	Search parameter specific to vector index	No
embedding_index	Vector index used in current search	No, uses the first activated vector index built on the field specified in "anns_field" by default
vectors	Query vector	Yes

Table 35 Vector Similarity Search (Python API)

📘
Kindly note

Hippo will return the results in reverse order based on the similarity. The first result is the most similar one.

The meaning of scores returned as shown in above example is dependent on the metric type used in index-building process:

For L2, scores represent the distance between vectors. The shorter the distance between vectors is, the more similar they are. Thus, the scores will be returned in ascending order.

For Cosine or IP, scores represent the similarity. Thus the scores will be returned in descending order.

When "only_explain" is set to true, it is recommended that users can use set for formatting. The example is shown below:

curl -u shiva:shiva -XGET 'localhost:8902/hippo/v1/{table}/_search?pretty' -H 'Content-Type: application/json' -d'{
  "output_fields": ["book_id"],
  "search_params": {
    "anns_field": "book_intro",
    "topk": 2,
    "params": {
      "nprobe": 10
    },
    "embedding_index": "ivf_flat_index"
  },
  "vectors": [ [0.1,0.2], [0.3, 0.4] ],
  "round_decimal": 2,
  "only_explain" : true
}'|sed 's/\\n/\ /g';

The output example when using explain is:

{
  "num_queries" : 2,
  "top_k" : 2,
  "explanations" : [
    {
      "tablet" : "default#byte_table#4e3caf34c22e41349b6ad8a846781ce0@0",
      "explanation" : "client schema : (     book_id INT64 NOT NULL,     PRIMARY KEY () ) embedding scan path : (     vector column book_intro     embedding index 0 ) execution plan type : naive plan current engine meta : current schema 0, (     0:book_id INT64 NOT NULL,     1:word_count INT64 NULLABLE,     2:book_intro BINARY_VECTOR(8) NULLABLE,     3:__schemaversion__ INT64 NULLABLE virtual,     4:__rowid__ INT64 NULLABLE virtual,     PRIMARY KEY (book_id) ) current index number 1 PUBLIC index -1 : (book_id INT64 NOT NULL) "
    }
  ]
}

Here is also an output example using "with_profile". With "only_explain" and "with_profile", users can check conduct detailed analysis on a specified query.

{
  "num_queries" : 2,
  "top_k" : 2,
  "results" : [
    {
      "query" : 0,
      "fields_data" : [
        {
          "field_name" : "book_id",
          "field_values" : [
            1,
            2
          ]
        }
      ],
      "scores" : [
        0,
        1
      ]
    },
    {
      "query" : 1,
      "fields_data" : [
        {
          "field_name" : "book_id",
          "field_values" : [
            40,
            36
          ]
        }
      ],
      "scores" : [
        1,
        2
      ]
    }
  ],
  "profiles" : [
    {
      "tablet" : 0,
      "ann search" : {
        "cost" : "3 Milliseconds",
        "optimize" : {
          "cost" : "0 Milliseconds"
        },
        "init iterators" : {
          "cost" : "0 Milliseconds"
        },
        "naive embedding search" : {
          "cost" : "3 Milliseconds",
          "ann index search" : {
            "cost" : "3 Milliseconds"
          },
          "batch get data from db" : {
            "cost" : "0 Milliseconds"
          },
          "decode, eval and fill matched results" : {
            "cost" : "0 Milliseconds"
          }
        }
      }
    }
  ]
}

📘Kindly note

📘
Kindly note