Statistics

Hippo maintains data distribution statistics internally and will choose an optimal solution when querying data in the light of the related statistics. With below command, users can check or update statistics. Kindly note, statistics update is resource consuming, and Hippo will update it accordingly when data update reaches a certain amount. Generally there is no need to trigger this operation manually.

curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_analyze_db?pretty' -H 'Content-Type: application/json' -d'{
  "is_update" : false,
  "columns" : ["word_count", "book_id"],
  "wait_for_completion" : true,
  "timeout" : "10m"
}';

Result:

{
  "job_id" : "0725c69b15734f3fa82abcf7fdf5fd9b",
  "job_status" : "SHIVA_JOB_SUCCESS",
  "task_results" : [
    {
      "id" : "d53e6662c3474a7193fe099a82012175",
      "status" : "TASK_SUCCESS",
      "server" : "172.29.203.189:27841",
      "execute_time" : 0.0,
      "column statistics" : [
        {
          "column_id" : 0,
          "num_distinct_keys" : 300,
          "num_keys" : 300,
          "null_frequency" : 0.0,
          "correlation" : 0.0
        },
        {
          "column_id" : 1,
          "num_distinct_keys" : 100,
          "num_keys" : 300,
          "null_frequency" : 0.0,
          "correlation" : 0.0
        }
      ]
    }
  ]
}

Parameter description:

ParametersDescriptionRequired
database_nameDatabase where the destination table is locatedNo, defaults to "default" database
table_nameDestination table nameYes
is_updateWhether to trigger statistics update operationNo, defaults to false
columnsColumns to check statisticsNo, defaults to all, only takes effect when is_update is set to true
wait_for_completionWhether to wait until the job is doneNo, defaults to true
timeoutOperation timeoutNo, defaults to 5 mins

Table 32 Statistics (Restful API)