# HugeGraph入门

## 一、HugeGraph简介

> 最近在搞好友推荐方便的工作，选择了图数据的方法，使用并学习了HugeGraph，再次记录一下。

[HugeGraph](https://hugegraph.github.io/hugegraph-doc/)是百度在2018年中旬开源的一款图数据库（Graph Database）系统，可以存储海量的顶点（Vertex）和边（Edge）。实现了Apache ThinkerPop 3框架，支持Gremlin图查询语言。HugeGraph支持多用户并行操作，用户可输入Gremlin查询语句，并及时得到图查询结果。也可以再用户程序中调用hugeGraph API进行图分析或查询。

## 二、HugeGraph特性

HugeGraph支持在线及离线环境下的图操作，支持批量导入数据，支持高效的负责关联关系分析，并且能够与大数据平台无缝集成。

HugeGraph具备如下特点：

* 基于ThinkerPop 3框架实现，支持Gremlin查询语言；
* 支持从TXT、CSV、JSON等格式的文件中批量导入数据；
* 具备独立的Schema元数据信息，方便第三方系统集成；
* 具备可视化操作界面，降低用户使用门槛；
* 存储系统采用插件方式，支持RocksDB、Cassandra、ScyllaDB、HBase及Mysql等多种后端；
* 优化的图接口，最短路径、K步连通子图、K步到达邻接点等；
* 支持属性图，顶点和边均可添加属性，支持丰富的属性类型；
* 可以对边和顶点的属性建立索引，支持精确查询、范围查询、全文检索；
* 支持4种顶点ID策略，之间ID、自动生成ID、用户自定义字符串ID和用户自定义数据ID；
* 支持与Hadoop、Spark GraphX等大数据系统集成，支持Bulk Load操作。

## 三、HugeGraph框架模块

* [HugeGraph-Server](https://hugegraph.github.io/hugegraph-doc/quickstart/hugegraph-server.html): HugeGraph-Server是HugeGraph项目的核心部分，包含Core、Backend、API等子模块；&#x20;
  * Core：图引擎实现，向下连接Backend模块，向上支持API模块；
  * Backend：实现将图数据存储到后端，支持的后端包括：Memory、Cassandra、ScyllaDB、RocksDB、HBase以及Mysql，用户根据实际情况选择一种即可；
  * API：内置REST Server，向用户提供RESTful API，同时兼容Gremlin查询。
* [HugeGraph-Client](https://hugegraph.github.io/hugegraph-doc/quickstart/hugegraph-client.html)：HugeGraph-Client提供了RESTful API的客户端，用于连接HugeGraph-Server，目前仅实现Java版，其他语言用户可自行实现；
* [HugeGraph-Studio](https://hugegraph.github.io/hugegraph-doc/quickstart/hugegraph-studio.html)：HugeGraph-Studio是HugeGraph的Web可视化工具，可用于执行Gremlin语句及展示图；
* [HugeGraph-Loader](https://hugegraph.github.io/hugegraph-doc/quickstart/hugegraph-loader.html) ：HugeGraph-Loader是基于HugeGraph-Client的数据导入工具，将普通文本数据转化为图形的顶点和边并插入图形数据库中；&#x20;
* [HugeGraph-Spark](https://hugegraph.github.io/hugegraph-doc/quickstart/hugegraph-spark.html)：基于Spark GraphX的图分析工具 ，HugeGraph-Spark能在图上做并行计算，例如PageRank算法等；
* [HugeGraph-Tools](https://hugegraph.github.io/hugegraph-doc/quickstart/hugegraph-tools.html)：HugeGraph-Tools是HugeGraph的部署和管理工具，包括管理图、备份/恢复、Gremlin执行等功能。

**总结：**&#x90E8;署HugeGraph需要HugeGraph-Server，在网页上操作图需要HugeGraph-Studio，在java项目中操作图需要HugeGraph-Client，其他三个视情况需要的时候再部署使用。

## 四、HugeGraph安装部署

### 4.1 安装HugeGraph-Server（必须）

**依赖：**

JDK1.8

使用使用的是RocksDB存储则需要GCC >= 4.3.0 ，下面的步骤假设使用RocksDB作为存储

**步骤1：**

```
# 下载tar包
wget https://github.com/hugegraph/hugegraph/releases/download/v${version}/hugegraph-${version}.tar.gz
tar -zxvf hugegraph-${version}.tar.gz
```

**步骤2：**

修改 hugegraph.properties

```
backend=rocksdb
serializer=binary
rocksdb.data_path=.
rocksdb.wal_path=.
```

**步骤3：**

初始化数据库（仅第一次启动时需要）

```
cd hugegraph-${version}
bin/init-store.sh
```

**步骤4：**

启动server

```
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
```

**步骤5:**

查看服务状态：

```
jps
6475 HugeGraphServer
# curl请求restuflAPI,结果返回200，代表server启动正常
echo `curl -o /dev/null -s -w %{http_code} "http://localhost:8080/graphs/hugegraph/graph/vertices"`
```

**步骤6：**

```
# 停止server
$cd hugegraph-${version}
$bin/stop-hugegraph.sh
```

**问题1：启动超时报错**

```
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://10.118.32.32:8080/graphs)................The operation timed out when attempting to connect to http://10.118.32.32:8080/graphs
See /home/work/soft/hugegraph-tools-1.2.0/services/hugegraph-0.8.0/logs/hugegraph-server.log for HugeGraphServer log output.
```

先设置export JAVA\_HOME，jdk版本需要>=1.8

### 4.2 安装HugeGraph-Studio

**步骤1：**

```
# 下载tar包并解压
wget https://github.com/hugegraph/hugegraph-studio/releases/download/v${version}/hugegraph-studio-${version}.tar.gz
tar zxvf hugegraph-studio-${version}.tar.gz
```

**步骤2：**

修改配置文件`hugegraph-studio.properties`

* 将配置项`studio.server.host`的值`localhost`修改成机器名或 IP，这是 HugeGraphStudio 对外提供服务的`host`，如果只需要本地访问则保持不变即可；
* 将配置项`studio.server.port`的值`8088`修改成想要的端口，这是 HugeGraphStudio 对外提供服务的`port`；
* 将配置项`graph.server.host`的值`localhost`修改成 HugeGraphServer 的`host`，HugeGraphStudio 通过此项和`graph.server.port`与 HugeGraphServer 建立连接；
* 将配置项`graph.server.port`的值`8080`修改成 HugeGraphServer 的`port`，HugeGraphStudio 通过`graph.server.host`和此项与 HugeGraphServer 建立连接；
* 将配置项`graph.name`的值`hugegraph`修改成要连接的 HugeGraphServer 的图名，目前只允许连接一个图。

```
# 启动HugeGraph-Studio
$ cd hugegraph-studio-${version}
$ bin/hugegraph-studio.sh
```

**步骤3：**

浏览器打开：[http://localhost:8088](http://localhost:8088/)即可访问。

## 五、HugeGraph，Neo4j，Titan三种图数据库性能对比

官网给了一个性能测试的报告：<https://hugegraph.github.io/hugegraph-doc/performance/hugegraph-benchmark-0.5.6.html>

总结起来就是：

* 批量插入性能：HugeGraph(RocksDB) > Neo4j > Titan(thrift+Cassandra)
* 遍历性能：Neo4j > HugeGraph(RocksDB) > Titan(thrift+Cassandra)
* 图常用分析方法性能：FS场景，HugeGraph性能优于Neo4j和Titan，K-neighbor和K-out场景，HugeGraph能够实现在5度范围内秒级返回结果
* 社区聚类算法性能 Neo4j > HugeGraph > Titan

当时选择HugeGraph的原因一是需求需要导大量的数据，涉及大约十几亿的插入，所以需要找一个插入性能高的，并且好友关系变动的时候也需要异步更新图关系。而是HugeGraph虽然是新秀，但是中文官方文档很简介清楚，利于学习使用。

> 参考：
>
> <https://hugegraph.github.io/hugegraph-doc/>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jun-wang.gitbook.io/learnjava/ji-shu-xue-xi/web-zhong-jian-jian-xue-xi/hugegraph-ru-men.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.