使用 titan + hbase + elasticsearch

hbase: hadoop 集群

es: es 集群

配置 ‘conf/titan-hbase-elasticsearch.properties’ 增加 hbase 和 es 的 table 和 index-name 名:

storage.hbase.table=hbase_es_test
index.search.index-name=hbase_es_test

打开 gremlin.sh :

$ ./bin/gremlin.sh

显示如下:

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: aurelius.titan
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
15:51:46 INFO  org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph  - HADOOP_GREMLIN_LIBS is set to: /Users/liuchaozhen/Downloads/titan-1.0.0-hadoop1/lib
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.tinkergraph
gremlin>

加载配置项,创建 graph (不知道为啥出现了一个hdfs的异常,没有处理这个异常,但不影响后续操作)

gremlin> graph = TitanFactory.open('conf/titan-hbase-es.properties')
15:52:36 WARN  org.apache.hadoop.hbase.util.DynamicClassLoader  - Failed to identify the fs of dir hdfs://local
host:9000/hbase/lib, ignored
java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectExc
eption: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
        at org.apache.hadoop.ipc.Client.call(Client.java:1118)
		...
15:53:20 WARN  com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration  - Local setting cache.
db-cache-time=180000 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (10000).  Use the Managemen
tSystem interface instead of the local configuration to control this setting.
15:53:20 WARN  com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration  - Local setting cache.
db-cache-clean-wait=20 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (50).  Use the Management
System interface instead of the local configuration to control this setting.
==>standardtitangraph[hbase:[10.10.113.192, 10.10.113.193, 10.10.113.194]]
gremlin>

获取图的 traversal:

gremlin> g = graph.traversal()
==>graphtraversalsource[standardtitangraph[hbase:[10.10.113.192, 10.10.113.193, 10.10.113.194]], standard]

好,下面开始查询。

Select

Select all

gremlin> g.V().hasLabel('category').valueMap()

限制显示个数使用 limit(n):

gremlin> g.V().hasLabel('category').limit(5).valueMap()

统计个数使用 count():

gremlin> g.V().hasLabel('Disease').count()

查询某键值

gremlin> g.V().hasLabel('category').values('name')

查询节点某 label 的数量

gremlin> g.V().hasLabel('category').count()

查询多个键值

gremlin> g.V().hasLabel('category').values('name', 'description')

显示查询结果的长度

Select calculated column

gremlin> g.V().hasLabel('category').values('name').map{ it.get().length()}

Distict

gremlin> g.V().hasLabel('category').values('name').map {it.get().length()}.dedup()

最大最小值

gremlin> g.V().hasLabel('category').values('name').map {it.get().length()}.max()

Filtering (过滤)

相同

SQL:

SELECT ProductName, UnitsInStock FROM Products WHERE UnitsInStock = 0

Gremlin:

gremlin> g.V().has('product', 'unitsInStock', 0).valueMap('name', 'unitsInStock')

不同

SQL:

SELECT ProductName, UnitsOnOrder FROM Products WHERE NOT(UnitsOnOrder = 0)

Gremlin:

gremlin> g.V().has('product', 'unitsOnOrder', neq(0)).valueMap('name', 'unitsOnOrder')

指定范围

SQL:

SELECT ProductName, UnitPrice FROM Products WHERE UnitPrice >= 5 AND UnitPrice < 10
gremlin> g.V().has('product', 'unitPrice', between(5f, 10f)).valueMap('name', 'unitPrice')

多个过滤条件

Mulltiple filter conditions

SQL:

SELECT ProductName, UnitsInStock FROM Products WHERE Discontinued = 1 AND UnitsInStock <> 0

Gremlin:

gremlin> g.V().has('product', 'discontinued', true).has('unitsInStock', neq(0)).ValueMap('name', 'unitsInStock')

排序 Ordering

SQL:

SELECT ProductName, UnitPrice FROM Products ORDER BY UnitPrice ASC
SELECT ProductName, UnitPrice FROM Products ORDER BY UnitPrice DESC

Gremlin:

gremlin> g.V().hasLabel('product').order().by('unitPrice', incr).valueMap('name', 'unitPrice')
gremlin> g.V().hasLabel('product').order().by('unitPrice', decr).valueMap('name', 'unitPrice')

分页

gremlin> g.V().hasLabel("product').order().by('unitPrice', incr).range(5, 10).valueMap('name', 'UnitPrice')

参考:

http://sql2gremlin.com