PR (Blazegraph Database Platform 2.1.5 API)

java.lang.Object
- com.bigdata.rdf.graph.impl.BaseGASProgram<PR.VS,PR.ES,Double>
- - com.bigdata.rdf.graph.analytics.PR

All Implemented Interfaces:

IBindingExtractor<PR.VS,PR.ES,Double>, IGASOptions<PR.VS,PR.ES,Double>, IGASProgram<PR.VS,PR.ES,Double>
```
public class PR
extends BaseGASProgram<PR.VS,PR.ES,Double>
```
Page rank assigns weights to the vertices in a graph based by on the relative "importance" as determined by the patterns of directed links in the graph. The algorithm is given stated in terms of a computation that is related until the delta in the computed values for the vertices is within epsilon of ZERO. However, in practice convergence is based on epsilon is problematic due to the manner in which the results of the floating point operations depend on the sequence of those operations (which is why this implementation uses double precision). Thus, page rank is typically executed a specific number of iterations, e.g., 50 or 100. If convergence is based on epsilon, then it is possible that the computation will never converge, especially for smaller values of epsilon.

init

All vertices are inserted into the initial frontier.

Gather

sum( neighbor_value / neighbor_num_out_edges ) over the in-edges of the graph.

Apply

value = resetProb + (1.0 - resetProb) * gatherSum

Scatter

if (a) value has significantly changed (fabs(old-new) GT epsilon); or (b) iterations LT limit
- where resetProb is a value that determines a random reset probability and defaults to .
- where epsilon controls the degree of convergence before the algorithm terminates and defaults to .
FIXME PR UNIT TEST. Verify computed values (within variance) and max iterations. Ground truth from GL? FIXME PR: The out-edges can be taken directly from the vertex distribution. That will reduce the initialization overhead for PR.
Author:

Bryan Thompson

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static interface`	`PR.Bindings` Additional `IBindingExtractor.IBinder`s exposed by `PR`.
`static class`	`PR.ES` Edge state is not used.
`class`	`PR.PageRankReducer` Class reports a map containing the page rank associated with each visited vertex.
`static class`	`PR.VS`

Field Summary

Fields
Modifier and Type	Field and Description
`protected static double`	`DEFAULT_EPSILON`
`protected static int`	`DEFAULT_LIMIT`
`protected static double`	`DEFAULT_MIN_PAGE_RANK`
`protected static double`	`DEFAULT_RESET_PROB`

Constructor Summary

Constructors
Constructor and Description

PR()

Constructors
Constructor and Description
`PR()`

Method Summary

Methods
Modifier and Type	Method and Description
`PR.VS`	`apply(IGASState<PR.VS,PR.ES,Double> state, org.openrdf.model.Value u, Double sum)` Apply the reduced aggregation computed by GATHER + SUM to the vertex.
`Double`	`gather(IGASState<PR.VS,PR.ES,Double> state, org.openrdf.model.Value u, org.openrdf.model.Statement e)` GATHER is a map/reduce over the edges of the vertex.
`List<IBinder<PR.VS,PR.ES,Double>>`	`getBinderList()` Return a list of interfaces that may be used to extract variable bindings for the vertices visited by the algorithm.
`Factory<org.openrdf.model.Statement,PR.ES>`	`getEdgeStateFactory()` Return a factory for edge state objects -or- `null` if the `IGASProgram` does not use edge state (in which case the edge state will not be allocated or maintained).
`EdgesEnum`	`getGatherEdges()` Return the set of edges to which the GATHER is applied for a directed graph -or- `EdgesEnum.NoEdges` to skip the GATHER phase.
`FrontierEnum`	`getInitialFrontierEnum()` Return the nature of the initial frontier for this algorithm.
`EdgesEnum`	`getScatterEdges()` Return the set of edges to which the SCATTER is applied for a directed graph -or- `EdgesEnum.NoEdges` to skip the SCATTER phase.
`Factory<org.openrdf.model.Value,PR.VS>`	`getVertexStateFactory()` Return a factory for vertex state objects.
`void`	`initVertex(IGASContext<PR.VS,PR.ES,Double> ctx, IGASState<PR.VS,PR.ES,Double> state, org.openrdf.model.Value u)` Callback to initialize the state for each vertex in the initial frontier before the first iteration.
`boolean`	`isChanged(IGASState<PR.VS,PR.ES,Double> state, org.openrdf.model.Value u)` Return `true` iff the vertex should run its SCATTER phase.
`boolean`	`nextRound(IGASContext<PR.VS,PR.ES,Double> ctx)` Return `true` iff the algorithm should continue.
`void`	`scatter(IGASState<PR.VS,PR.ES,Double> state, IGASScheduler sch, org.openrdf.model.Value u, org.openrdf.model.Statement e)` The remote vertex is scheduled for activation unless it has already been visited.
`Double`	`sum(IGASState<PR.VS,PR.ES,Double> state, Double left, Double right)` SUM

Methods inherited from class com.bigdata.rdf.graph.impl.BaseGASProgram
before, getSampleEdgesFilter

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_LIMIT
```
protected static final int DEFAULT_LIMIT
```
    See Also:
    Constant Field Values
  - DEFAULT_RESET_PROB
```
protected static final double DEFAULT_RESET_PROB
```
    See Also:
    Constant Field Values
  - DEFAULT_EPSILON
```
protected static final double DEFAULT_EPSILON
```
    See Also:
    Constant Field Values
  - DEFAULT_MIN_PAGE_RANK
```
protected static final double DEFAULT_MIN_PAGE_RANK
```
    See Also:
    Constant Field Values
- Constructor Detail
  - PR
```
public PR()
```
- Method Detail
  - getVertexStateFactory
```
public Factory<org.openrdf.model.Value,PR.VS> getVertexStateFactory()
```
    Description copied from interface: IGASOptions
    
    Return a factory for vertex state objects.
    Note: A null value may not be allowed in the visited vertex map, so if the algorithm does not use vertex state, then the factory should return a singleton instance each time it is invoked.
  - getEdgeStateFactory
```
public Factory<org.openrdf.model.Statement,PR.ES> getEdgeStateFactory()
```
    Description copied from class: BaseGASProgram
    
    Return a factory for edge state objects -or- null if the IGASProgram does not use edge state (in which case the edge state will not be allocated or maintained).
    The default implementation returns null. Override this if the algorithm uses per-edge computation state.
    
    Specified by:
    
    getEdgeStateFactory in interface IGASOptions<PR.VS,PR.ES,Double>
    
    Overrides:
    
    getEdgeStateFactory in class BaseGASProgram<PR.VS,PR.ES,Double>
  - getInitialFrontierEnum
```
public FrontierEnum getInitialFrontierEnum()
```
    Description copied from interface: IGASOptions
    
    Return the nature of the initial frontier for this algorithm.
  - getGatherEdges
```
public EdgesEnum getGatherEdges()
```
    Description copied from class: BaseGASProgram
    
    Return the set of edges to which the GATHER is applied for a directed graph -or- EdgesEnum.NoEdges to skip the GATHER phase. This will be interpreted based on the value reported by IGASContext#isDirectedTraversal(). TODO We may need to set dynamically when visting the vertex in the frontier rather than having it be a one-time property of the vertex program.
    The default gathers on the EdgesEnum.InEdges.
    
    Specified by:
    
    getGatherEdges in interface IGASOptions<PR.VS,PR.ES,Double>
    
    Overrides:
    
    getGatherEdges in class BaseGASProgram<PR.VS,PR.ES,Double>
  - getScatterEdges
```
public EdgesEnum getScatterEdges()
```
    Description copied from class: BaseGASProgram
    
    Return the set of edges to which the SCATTER is applied for a directed graph -or- EdgesEnum.NoEdges to skip the SCATTER phase. This will be interpreted based on the value reported by IGASContext#isDirectedTraversal().
    The default scatters on the EdgesEnum.OutEdges.
    
    Specified by:
    
    getScatterEdges in interface IGASOptions<PR.VS,PR.ES,Double>
    
    Overrides:
    
    getScatterEdges in class BaseGASProgram<PR.VS,PR.ES,Double>
  - initVertex
```
public void initVertex(IGASContext<PR.VS,PR.ES,Double> ctx,
              IGASState<PR.VS,PR.ES,Double> state,
              org.openrdf.model.Value u)
```
    Callback to initialize the state for each vertex in the initial frontier before the first iteration. A typical use case is to set the distance of the starting vertex to ZERO (0).
    The default is a NOP.
    Each vertex is initialized to the reset probability. FIXME We need to do this efficiently. E.g., using a scan to find all of the vertices together with their in-degree or out-degree. That should be done to populate the frontier, initializing the #of out-edges at the same time.
    
    Specified by:
    
    initVertex in interface IGASProgram<PR.VS,PR.ES,Double>
    
    Overrides:
    
    initVertex in class BaseGASProgram<PR.VS,PR.ES,Double>
    
    u - The vertex. TODO We do not need both the IGASContext and the IGASState. The latter is available from the former.
  - gather
```
public Double gather(IGASState<PR.VS,PR.ES,Double> state,
            org.openrdf.model.Value u,
            org.openrdf.model.Statement e)
```
    GATHER is a map/reduce over the edges of the vertex. The SUM provides pair-wise reduction over the edges visited by the GATHER.
    
    u - The vertex for which the gather is being performed. The gather will be invoked for each edge indident on u (as specified by IGASOptions.getGatherEdges()).
    e - An edge (s,p,o).
    
    Returns:
    The new edge state accumulant. FIXME DESIGN: The problem with pushing the ISPO onto the ES is that we are then forced to maintain edge state (for the purposes of accessing those ISPO references) even if the algorithm does not require any memory for the edge state!
    Note: by lazily resolving the vertex and/or edge state in the GAS callback methods we avoid eagerly materializing data that we do not need. [Lazy resolution does not work on a cluster. The only available semantics there are lazy resolution of state that was materialized in order to support a gather() or scatter() for a vertex.]
    Note: The state associated with the source/target vertex and the edge should all be immutable for the GATHER. The vertex state should only be mutable for the APPLY(). The target vertex state and/or edge state MAY be mutable for the SCATTER, but that depends on the algorithm. How can we get these constraints into the API?
  - sum
```
public Double sum(IGASState<PR.VS,PR.ES,Double> state,
         Double left,
         Double right)
```
    SUM
    SUM is a pair-wise reduction that is applied during the GATHER.
    
    left - An edge state accumulant.
    right - Another edge state accumulant.
    
    Returns:
    Their "sum". TODO DESIGN: Rather than pair-wise reduction, why not use vectored reduction? That way we could use an array of primitives as well as objects. TODO DESIGN: This should be a reduced interface since we only need access to the comparator semantics while the [state] provides random access to vertex and edge state. The comparator is necessary for MIN semantics for the Value implementation of the backend. E.g., Value versus IV.
  - apply
```
public PR.VS apply(IGASState<PR.VS,PR.ES,Double> state,
          org.openrdf.model.Value u,
          Double sum)
```
    Apply the reduced aggregation computed by GATHER + SUM to the vertex.
    Compute the new value for this vertex, making a note of the last change for this vertex.
    
    u - The vertex.
    sum - The aggregated accumulate across the edges as computed by GATHER and SUM -or- null if there is no accumulant (this will happen if the GATHER did not find any edges to visit).
    
    Returns:
    The new state for the vertex. TODO How to indicate if there is no state change? return the same object? This only matters with secondary storage for the vertex state. Alternative is to side-effect the vertex state, but then we can not manage the barriers (BFS versus asynchronous). Except for indicating that the state is dirty, we do not need a return value. [The pattern appears to be that a vertex leaves a marker on its vertex state object indicating whether or not it was changed and then tests that state when deciding whether or not to scatter]. TODO There could be a big win here if we are able to detect when a newly initialized vertex state does not "escape" and simply not store it. For some graphs, the vertexState map grows very rapidly when compared to either the frontier or the set of states that have been in the frontier during the computation.
  - isChanged
```
public boolean isChanged(IGASState<PR.VS,PR.ES,Double> state,
                org.openrdf.model.Value u)
```
    Return true iff the vertex should run its SCATTER phase. This may be used to avoid visiting the edges if it is known (e.g., based on the APPLY) that the vertex has not changed. This can save a substantial amount of effort.
    The default implementation returns true. Override this if you know whether or not the computation state of this vertex has changed.
    Returns true iff the last change was greater then epsilon.
    
    Specified by:
    
    isChanged in interface IGASProgram<PR.VS,PR.ES,Double>
    
    Overrides:
    
    isChanged in class BaseGASProgram<PR.VS,PR.ES,Double>
    
    u - The vertex.
    
    Returns:
  - scatter
```
public void scatter(IGASState<PR.VS,PR.ES,Double> state,
           IGASScheduler sch,
           org.openrdf.model.Value u,
           org.openrdf.model.Statement e)
```
    The remote vertex is scheduled for activation unless it has already been visited.
    Note: We are scattering to out-edges. Therefore, this vertex is Statement.getSubject(). The remote vertex is Statement.getObject().
    
    u - The vertex for which the scatter will being performed.
    e - The edge.
  - nextRound
```
public boolean nextRound(IGASContext<PR.VS,PR.ES,Double> ctx)
```
    Return true iff the algorithm should continue. This is invoked after every iteration, once the new frontier has been computed and IGASState.round() has been advanced. An implementation may simply return true, in which case the algorithm will continue IFF the current frontier is not empty.
    Note: While this can be used to make custom decisions concerning the halting criteria, it can also be used as an opportunity to handshake with a custom IGraphAccessor in order to process a dynamic graph.
    The default returns true.
    Continue unless the iteration limit has been reached.
    
    Specified by:
    
    nextRound in interface IGASProgram<PR.VS,PR.ES,Double>
    
    Overrides:
    
    nextRound in class BaseGASProgram<PR.VS,PR.ES,Double>
    
    Parameters:
    ctx - The evaluation context.
    
    Returns:
    true if the algorithm should continue (as long as the frontier is non-empty).
  - getBinderList
```
public List<IBinder<PR.VS,PR.ES,Double>> getBinderList()
```
    Return a list of interfaces that may be used to extract variable bindings for the vertices visited by the algorithm.
    
    The visited vertex itself.
    
    The page rank associated with the vertex..
    
    Specified by:
    
    getBinderList in interface IBindingExtractor<PR.VS,PR.ES,Double>
    
    Overrides:
    
    getBinderList in class BaseGASProgram<PR.VS,PR.ES,Double>

Class PR

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class com.bigdata.rdf.graph.impl.BaseGASProgram

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_LIMIT

DEFAULT_RESET_PROB

DEFAULT_EPSILON

DEFAULT_MIN_PAGE_RANK

Constructor Detail

PR

Method Detail

getVertexStateFactory

getEdgeStateFactory

getInitialFrontierEnum

getGatherEdges

getScatterEdges

initVertex

gather

sum

apply

isChanged

scatter

nextRound

getBinderList