FAST-PPR: scaling personalized pagerank estimation for large graphs

Abstract

We propose a new algorithm, FAST-PPR, for estimating personalized PageRank: given start node $s$ and target node $t$ in a directed graph, and given a threshold $\delta$, FAST-PPR estimates the Personalized PageRank $\pi_s(t)$ from $s$ to $t$, guaranteeing a small relative error as long $\pi_s(t)>\delta$. Existing algorithms for this problem have a running-time of $\Omega(1/\delta)$; in comparison, FAST-PPR has a provable average running-time guarantee of $O(\sqrt{ d/\delta})$ (where $d$ is the average in-degree of the graph). This is a significant improvement, since $\delta$ is often $O(1/n)$ (where $n$ is the number of nodes) for applications. We also complement the algorithm with an $\Omega(1/\sqrt{\delta})$ lower bound for PageRank estimation, showing that the dependence on $\delta$ cannot be improved.

We perform a detailed empirical study on numerous massive graphs, showing that FAST-PPR dramatically outperforms existing algorithms. For example, on the 2010 Twitter graph with 1.5 billion edges, for target nodes sampled by popularity, FAST-PPR has a 20 factor speedup over the state of the art. Furthermore, an enhanced version of FAST-PPR has a 160 factor speedup on the Twitter graph, and is at least 20 times faster on all our candidate graphs.

Publication
KDD ‘14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
Siddhartha Banerjee
Siddhartha Banerjee
Associate Professor

Sid Banerjee is an associate professor in the School of Operations Research at Cornell, working on topics at the intersection of data-driven decision-making and stochastic control, economics and computation, and large-scale network algorithms.

Related