graph - How to speed up Two Hop query in TitanDB with Cassandra -


i testing titandb + cassandra now. graph schema this:

vertex: user(userid), ip(ip), session_id(sessionid), device(deviceid) edge: user->ip, user->session_id, user->device data size: vertex 100million, edge: 1 billion index: vertex-centric index on kinds of edge . index userid, ip, sessionid, , deviceid.

set vertext partition ip, device , session_id. total 32 partition.

cassandra hosts:aws ec2 i2 (2xlage) x 24 . currently, every host hold 30g data.

usecase: give userid edgelabel, find out related users edge's out vertex. example: g.v().has(t.label, 'user').has('user_id', '12345').out('user_ip').in().valuemap();

but kinds of query pretty slow, sometimes, hundreds seconds. 1 user can have many related ip (hundreds), these ips, can lots of users (thousands).

does titan parallel query kind of query against partition of backend storage?? try use limit:

g.v().has(t.label, 'user').has('user_id', '12345').out('user_ip').limit(50).in().limit(100).valuemap()

it's slow. hope kinds of query can done in 5seconds. how titan limit() works? result first, 'limit' ??

how increase performance it? can give advice?

one quick perfomance gain using titan's vertex centric indices allows make quick leaps 1 vertex another. example try this:

mgmt = graph.openmanagement() userid = mgmt.getpropertykey('userid') userip = mgmt.getedgelabel('user_ip') mgmt.buildedgeindex(userip, 'useridbyuserip', direction.both, order.decr, time) mgmt.commit() 

to create simple vertex centric index.

if want lookup multiple user ips multiple user vertices try using titan-hadoop. however, more involved process.


Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -