com.bigdata.rdf.graph.analytics

Class FuzzySSSP

• All Implemented Interfaces:
Callable<FuzzySSSP.FuzzySSSPResult>

public class FuzzySSSP
extends Object
implements Callable<FuzzySSSP.FuzzySSSPResult>
This algorithm provides a fuzzy implementation of the shortest paths between a set of source vertices and a set of target vertices. This can be used to identify a set of vertices that are close to the shortest paths between those source and target vertices. For some domains, the resulting set of vertices can be understood as an "interesting subgraph".

Problem: We want to find a set of not more than N vertices out of a data set that are "close" to the shortest path between two sets of vertices.

Approach: We want to find the set of SP (Shortest Path) vertices that lie along the shortest path between each source vertex and each target vertex. We would also like to know whether a source is connected to each target. To do this, we do NSOURCES SSSP traversals. For each traversal, we note the depth of each target from each source, and mark the depth as -1 if the target was not reachable from that source. The vertices along the shortest path to the target are collected. The sets of collected vertices are merged and duplicates are removed.

Finally, we do a BFS starting with all of the vertices in that merged collection and stopping when we have N vertices, including those along the shortest paths. This grows the initial set of vertices that lie along the shortest paths into a broader collection of vertices that are close to that shortest path.

Outputs: The N vertices, their distances from the shortest paths (which we get out of the final BFS), and the distance of each target from each source along the shortest path (which we get from the per-source SSSP traversals). TODO Support breaking out of the analytic as soon as the frontier is known to contain at least N(=2k) distinct vertices (or M=10k edges). Note that for frontier implementations that allow duplicates, this means that you need to wait for the end of the iteration to make the decision. We already support a decision point at the end of each iteration. This would allow us to lift the decision point inside of the iteration and terminate processing eagerly when the frontier size exceeds a specified value. TODO: Implement unit test with ground truth.

Author:
Bryan Thompson
• Constructor Detail

• FuzzySSSP

public FuzzySSSP(org.openrdf.model.Value[] src,
org.openrdf.model.Value[] tgt,
int N,
IGASEngine gasEngine,
IGraphAccessor graphAccessor)
Parameters:
src - The source vertices (there must be at least one).
tgt - The target vertices (there must be at least one).
N - The maximum number of vertices to report (must be positive), i.e., the stopping criteria for the BFS expansion.
gasEngine - The IGASEngine will be used to execute the analytic.
graphAccessor - The object used to access the graph.