Introduction

Overview

Unstructured data management is more important than ever due to the rise of big data. Managing and gleaning business value from unstructured data is of utmost importance to enterprises today. Advancements in machine learning, as well as deep learning, technologies now enable organizations to efficiently address unstructured data and improve quality assurance efforts.

In the field of artificial intelligence or machine learning, embeddings and vector databases have become increasingly important for tackling a wide range of problems. These techniques are used to represent data in a compact, high-dimensional vector space, which can then be manipulated and analyzed more easily.

Transwarp Hippo (“Hippo”) is a proprietary enterprise cloud native distributed vector database, supporting storing, indexing and managing massive vector datasets, delivering accelerated solutions for many areas, such as vector similarity search and clustering of dense vectors. Hippo ensures high availability, high performance and easy scale-out/in, supports vector search index, and delivers a set of functionalities including data sharding, partitioning, data persistence, incremental data ingestion, vector/scalar filtering in hybrid search, enabling enterprises to perform real-time query, search, and candidate generation against massive vector data.

Hippo Architecture

Diagram

Figure 1 Hippo Architecture

Figure 1 Hippo Architecture

Major Components

ComponentsFunctionalitiesBrief Introduction
TDDMSTranswarp distributed data management systemTDDMS is a proprietary enterprise distributed data management system. It achieves strong / eventual consistency among replicas and best-matching distribution for data. Besides, TDDMS can automatically manage data redistribution when performing scale-out on storage and ensure data availability without interrupting on-going data storage services when one of storage hardware breaks down.
Vector EngineVector search engineVector Engine is a proprietary search engine designed by Transwarp. It supports vector search on massive data and similarity search with high accuracy and high performance.
Sophon Model CubeModel cubeSophon Model Cube (“SMC”) unifies all stages of LLM lifecycle, including model release, evaluation and deployment. SMC supports managing multimodal LLMs with high maintainability and operability, supports image model, text model and hybrid model, and enables model evaluation and model experience, achieving the maximum value of LLM models.
Table 1 Hippo Major Components

Roles in Hippo

  1. Hippo Master
  • Also called Shiva Master
  • Stores metadata of TDDMS
  • Hippo Master is a Raft group, which has multiple Master nodes
  • 3 or 5 Master nodes are recommended
  1. Hippo Tablet Server
  • Also called Shiva Tablet Server
  • Stores TDDMS data
  • At least 5 Tablet Server nodes are recommended
  1. Hippo Webserver
  • Built-in lightweight monitoring component in Hippo, which is used for monitoring cluster/index status
  • Provides REST API GUI
  • At least one Webserver is required
  1. Hippo HTTP Server
  • Handles HTTP Server requests
  • Supports integrating with Python/Java/HTTP
  • At least one HTTP Server is recommended

Advantages

AdvantagesDetails
Cloud native systemHippo is deployed based on our proprietary cloud native operating system, which enables the strong abilities of scale out/in, multi-tenancy and resource management.
Distributed deploymentHippo supports distributed deployment, ensures strong consistency via Raft algorithm, and enables failover and data recovery.
Multi-model architectureHippo can integrate with other services deployed in Transwarp Data Hub (“TDH”) platform to achieve federated search.
High performance searchParallel search can be well realized due to the multiprocessing architecture and GPU acceleration supported by Hippo. Furthermore, multiple indexes, specific performance tuning techniques for search speed and memory usage and algorithm optimization at register level are also achieved for organizations to perform analytics against different business scenarios.
Multiple APIs integrationHippo currently supports Python, Restful, Java API.
Table 2 Hippo Advantages

Management Components

ComponentsFunctionalityBrief Introduction
Transwarp AquilaIntelligent operation and maintenance analysis platformAquila is a one-stop platform deployed in TDH for cluster monitoring, service monitoring, and database query monitoring. Aquila provides an integrated O&M portal for each of data platforms, offering security audit, log retrieval, performance monitoring, alert warning, online operation and maintenance, root cause analysis and other functions.
Transwarp ManagerBig data management platformManager is a component specifically used to deploy, manage, and operate Hippo clusters, as well as other services deployed in Manager. It supports one-click installation, one-click upgrade and graphical operation and maintenance of products, and provides health detection function to help users simplify the operation and maintenance process.
Transwarp Cloud Operating SystemCloud native OSTCOS is a proprietary cloud operating system designed based on Docker and Kubernetes. It offers unified resource scheduling framework. With container orchestration, TCOS can perform unified scheduling on compute, storage, network and other resources.
Table 3 Management Components