vendredi 18 avril 2014

neo4j - calcul coût total chemin en monogramme, prenant la directionnalité de la relation compte - Stack Overflow


Using a cypher query on neo4j, in a directed, cyclic graph I need a BFS query and a target node sorting per depth level.


For the within-depth sorting, a custom "total path cost function" should be used, calculated based on



  • all relation attributes r.followrank between start and end node.

  • relation directionality (followrank if it points towards end node, or 0 if not)


At any search depth level n, a node connected to a high ranked node at level n-m, m>0 should be ranked higher than a node connected to a low ranked node at level n-m. Reverse directionality should result in a 0 rank (which means, the node and its subtree are still part of the ranking).


I'm using neo4j community-1.9.M01. The approach I've taken so far was to extract an array of followranks for the shortest path to each end node


I thought I've come up with a great first idea for this query but it seems to break down at multiple points.


My query is:


START strt=node(7)
MATCH p=strt-[*1..]-tgt
WHERE not(tgt=strt)
RETURN ID(tgt), extract(r in rels(p): r.followrank*length(strt-[*0..]-()-[r]->() )) as rank, extract(n in nodes(p): ID(n));

which outputs


==> +-----------------------------------------------------------------+
==> | ID(tgt) | rank | extract(n in nodes(p): ID(n)) |
==> +-----------------------------------------------------------------+
==> | 14 | [1.0] | [7,14] |
==> | 15 | [1.0,1.0] | [7,14,15] |
==> | 11 | [1.0,1.0,1.0] | [7,14,15,11] |
==> | 8 | [1.0,1.0,1.0,1.0,0.0] | [7,14,15,11,7,8] |
==> | 9 | [1.0,1.0,1.0,1.0,0.0] | [7,14,15,11,7,9] |
==> | 10 | [1.0,1.0,1.0,1.0,0.0] | [7,14,15,11,7,10] |
==> | 12 | [1.0,1.0,1.0,0.0] | [7,14,15,11,12] |
==> | 8 | [0.0] | [7,8] |
==> | 9 | [0.0] | [7,9] |
==> | 10 | [0.0] | [7,10] |
==> | 11 | [1.0] | [7,11] |
==> | 15 | [1.0,1.0] | [7,11,15] |
==> | 14 | [1.0,1.0,1.0] | [7,11,15,14] |
==> | 8 | [1.0,1.0,1.0,1.0,0.0] | [7,11,15,14,7,8] |
==> | 9 | [1.0,1.0,1.0,1.0,0.0] | [7,11,15,14,7,9] |
==> | 10 | [1.0,1.0,1.0,1.0,0.0] | [7,11,15,14,7,10] |
==> | 12 | [1.0,0.0] | [7,11,12] |
==> +-----------------------------------------------------------------+
==> 17 rows
==> 38 ms

It looks similar to what I need, but the issues are



  1. nodes 8, 9, 10, 11 have the same relation direction to 7! The inverse query result ...*length(strt-[*0..]-()-[r]->() )... looks even stranger - see the queries right below.

  2. I don't know how to normalize the results of the length() expression to 1.


Directionality:


START strt=node(7)
MATCH strt<-[r]-m
RETURN ID(m), r.followrank;
==> +----------------------+
==> | ID(m) | r.followrank |
==> +----------------------+
==> | 8 | 1 |
==> | 9 | 1 |
==> | 10 | 1 |
==> | 11 | 1 |
==> +----------------------+
==> 4 rows
==> 0 ms
START strt=node(7)
MATCH strt-[r]->m
RETURN ID(m), r.followrank;
==> +----------------------+
==> | ID(m) | r.followrank |
==> +----------------------+
==> | 14 | 1 |
==> +----------------------+
==> 1 row
==> 0 ms

Inverse query:


START strt=node(7)
MATCH p=strt-[*1..]-tgt
WHERE not(tgt=strt)
RETURN ID(tgt), extract(rr in rels(p): rr.followrank*length(strt-[*0..]-()<-[rr]-() )) as rank, extract(n in nodes(p): ID(n));
==> +-----------------------------------------------------------------+
==> | ID(tgt) | rank | extract(n in nodes(p): ID(n)) |
==> +-----------------------------------------------------------------+
==> | 14 | [1.0] | [7,14] |
==> | 15 | [1.0,1.0] | [7,14,15] |
==> | 11 | [1.0,1.0,1.0] | [7,14,15,11] |
==> | 8 | [1.0,1.0,1.0,1.0,3.0] | [7,14,15,11,7,8] |
==> | 9 | [1.0,1.0,1.0,1.0,3.0] | [7,14,15,11,7,9] |
==> | 10 | [1.0,1.0,1.0,1.0,3.0] | [7,14,15,11,7,10] |
==> | 12 | [1.0,1.0,1.0,2.0] | [7,14,15,11,12] |
==> | 8 | [3.0] | [7,8] |
==> | 9 | [3.0] | [7,9] |
==> | 10 | [3.0] | [7,10] |
==> | 11 | [1.0] | [7,11] |
==> | 15 | [1.0,1.0] | [7,11,15] |
==> | 14 | [1.0,1.0,1.0] | [7,11,15,14] |
==> | 8 | [1.0,1.0,1.0,1.0,3.0] | [7,11,15,14,7,8] |
==> | 9 | [1.0,1.0,1.0,1.0,3.0] | [7,11,15,14,7,9] |
==> | 10 | [1.0,1.0,1.0,1.0,3.0] | [7,11,15,14,7,10] |
==> | 12 | [1.0,2.0] | [7,11,12] |
==> +-----------------------------------------------------------------+
==> 17 rows
==> 30 ms

So my questions are:



  1. what's going on with this query?

  2. is there a working approach?


For an additional detail, I know the min(length(path)) aggregator, but it doesn't work in this case where I'm trying to extract information about the best hit - the additional information I return about the best hit will disaggreate the result again - I think that's a cypher limitation.



Using a cypher query on neo4j, in a directed, cyclic graph I need a BFS query and a target node sorting per depth level.


For the within-depth sorting, a custom "total path cost function" should be used, calculated based on



  • all relation attributes r.followrank between start and end node.

  • relation directionality (followrank if it points towards end node, or 0 if not)


At any search depth level n, a node connected to a high ranked node at level n-m, m>0 should be ranked higher than a node connected to a low ranked node at level n-m. Reverse directionality should result in a 0 rank (which means, the node and its subtree are still part of the ranking).


I'm using neo4j community-1.9.M01. The approach I've taken so far was to extract an array of followranks for the shortest path to each end node


I thought I've come up with a great first idea for this query but it seems to break down at multiple points.


My query is:


START strt=node(7)
MATCH p=strt-[*1..]-tgt
WHERE not(tgt=strt)
RETURN ID(tgt), extract(r in rels(p): r.followrank*length(strt-[*0..]-()-[r]->() )) as rank, extract(n in nodes(p): ID(n));

which outputs


==> +-----------------------------------------------------------------+
==> | ID(tgt) | rank | extract(n in nodes(p): ID(n)) |
==> +-----------------------------------------------------------------+
==> | 14 | [1.0] | [7,14] |
==> | 15 | [1.0,1.0] | [7,14,15] |
==> | 11 | [1.0,1.0,1.0] | [7,14,15,11] |
==> | 8 | [1.0,1.0,1.0,1.0,0.0] | [7,14,15,11,7,8] |
==> | 9 | [1.0,1.0,1.0,1.0,0.0] | [7,14,15,11,7,9] |
==> | 10 | [1.0,1.0,1.0,1.0,0.0] | [7,14,15,11,7,10] |
==> | 12 | [1.0,1.0,1.0,0.0] | [7,14,15,11,12] |
==> | 8 | [0.0] | [7,8] |
==> | 9 | [0.0] | [7,9] |
==> | 10 | [0.0] | [7,10] |
==> | 11 | [1.0] | [7,11] |
==> | 15 | [1.0,1.0] | [7,11,15] |
==> | 14 | [1.0,1.0,1.0] | [7,11,15,14] |
==> | 8 | [1.0,1.0,1.0,1.0,0.0] | [7,11,15,14,7,8] |
==> | 9 | [1.0,1.0,1.0,1.0,0.0] | [7,11,15,14,7,9] |
==> | 10 | [1.0,1.0,1.0,1.0,0.0] | [7,11,15,14,7,10] |
==> | 12 | [1.0,0.0] | [7,11,12] |
==> +-----------------------------------------------------------------+
==> 17 rows
==> 38 ms

It looks similar to what I need, but the issues are



  1. nodes 8, 9, 10, 11 have the same relation direction to 7! The inverse query result ...*length(strt-[*0..]-()-[r]->() )... looks even stranger - see the queries right below.

  2. I don't know how to normalize the results of the length() expression to 1.


Directionality:


START strt=node(7)
MATCH strt<-[r]-m
RETURN ID(m), r.followrank;
==> +----------------------+
==> | ID(m) | r.followrank |
==> +----------------------+
==> | 8 | 1 |
==> | 9 | 1 |
==> | 10 | 1 |
==> | 11 | 1 |
==> +----------------------+
==> 4 rows
==> 0 ms
START strt=node(7)
MATCH strt-[r]->m
RETURN ID(m), r.followrank;
==> +----------------------+
==> | ID(m) | r.followrank |
==> +----------------------+
==> | 14 | 1 |
==> +----------------------+
==> 1 row
==> 0 ms

Inverse query:


START strt=node(7)
MATCH p=strt-[*1..]-tgt
WHERE not(tgt=strt)
RETURN ID(tgt), extract(rr in rels(p): rr.followrank*length(strt-[*0..]-()<-[rr]-() )) as rank, extract(n in nodes(p): ID(n));
==> +-----------------------------------------------------------------+
==> | ID(tgt) | rank | extract(n in nodes(p): ID(n)) |
==> +-----------------------------------------------------------------+
==> | 14 | [1.0] | [7,14] |
==> | 15 | [1.0,1.0] | [7,14,15] |
==> | 11 | [1.0,1.0,1.0] | [7,14,15,11] |
==> | 8 | [1.0,1.0,1.0,1.0,3.0] | [7,14,15,11,7,8] |
==> | 9 | [1.0,1.0,1.0,1.0,3.0] | [7,14,15,11,7,9] |
==> | 10 | [1.0,1.0,1.0,1.0,3.0] | [7,14,15,11,7,10] |
==> | 12 | [1.0,1.0,1.0,2.0] | [7,14,15,11,12] |
==> | 8 | [3.0] | [7,8] |
==> | 9 | [3.0] | [7,9] |
==> | 10 | [3.0] | [7,10] |
==> | 11 | [1.0] | [7,11] |
==> | 15 | [1.0,1.0] | [7,11,15] |
==> | 14 | [1.0,1.0,1.0] | [7,11,15,14] |
==> | 8 | [1.0,1.0,1.0,1.0,3.0] | [7,11,15,14,7,8] |
==> | 9 | [1.0,1.0,1.0,1.0,3.0] | [7,11,15,14,7,9] |
==> | 10 | [1.0,1.0,1.0,1.0,3.0] | [7,11,15,14,7,10] |
==> | 12 | [1.0,2.0] | [7,11,12] |
==> +-----------------------------------------------------------------+
==> 17 rows
==> 30 ms

So my questions are:



  1. what's going on with this query?

  2. is there a working approach?


For an additional detail, I know the min(length(path)) aggregator, but it doesn't work in this case where I'm trying to extract information about the best hit - the additional information I return about the best hit will disaggreate the result again - I think that's a cypher limitation.


0 commentaires:

Enregistrer un commentaire