Hippo maintains data distribution statistics internally and will choose an optimal solution when querying data in the light of the related statistics. With below command, users can check or update statistics. Kindly note, statistics update is resource consuming, and Hippo will update it accordingly when data update reaches a certain amount. Generally there is no need to trigger this operation manually.
curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_analyze_db?pretty' -H 'Content-Type: application/json' -d'{
"is_update" : false,
"columns" : ["word_count", "book_id"],
"wait_for_completion" : true,
"timeout" : "10m"
}';
Result:
{
"job_id" : "0725c69b15734f3fa82abcf7fdf5fd9b",
"job_status" : "SHIVA_JOB_SUCCESS",
"task_results" : [
{
"id" : "d53e6662c3474a7193fe099a82012175",
"status" : "TASK_SUCCESS",
"server" : "172.29.203.189:27841",
"execute_time" : 0.0,
"column statistics" : [
{
"column_id" : 0,
"num_distinct_keys" : 300,
"num_keys" : 300,
"null_frequency" : 0.0,
"correlation" : 0.0
},
{
"column_id" : 1,
"num_distinct_keys" : 100,
"num_keys" : 300,
"null_frequency" : 0.0,
"correlation" : 0.0
}
]
}
]
}
Parameter description:
Parameters | Description | Required |
---|---|---|
database_name | Database where the destination table is located | No, defaults to "default" database |
table_name | Destination table name | Yes |
is_update | Whether to trigger statistics update operation | No, defaults to false |
columns | Columns to check statistics | No, defaults to all, only takes effect when is_update is set to true |
wait_for_completion | Whether to wait until the job is done | No, defaults to true |
timeout | Operation timeout | No, defaults to 5 mins |
Table 32 Statistics (Restful API)