NAME
Algorithm::PageRank::XS - Fast PageRank implementation
DESCRIPTION
does some pagerank calculations, but it's slow and
memory intensive. This was developed to compute pagerank on graphs with
millions of arcs. It will not, however, scale up to quadrillions of arcs
unless you have a lot of local memory. This is not a distributed
algorithm.
SYNOPSYS
use Algorithm::PageRank::XS;
my $pr = Algorithm::PageRank::XS->new(alpha => 0.85);
$pr->graph([
0 => 1,
0 => 2,
1 => 0,
2 => 1,
]
);
$pr->result();
# This simple program takes up arcs and prints the ranks.
use Algorithm::PageRank::XS;
my $pr = Algorithm::PageRank::XS->new(alpha => 0.85);
while (<>) {
chomp;
my ($from, to) = split(/\t/, $_);
$pr->add_arc($from, $to);
}
while (my ($name, $rank) = each(%{$pr->result()})) {
print("$name,$rank\n");
}
CONSTRUCTORS
new %PARAMS
Create a new PageRank object. Parameters are: "alpha", "max_tries",
and "convergence". "alpha" is the damping constant (how far from the
true eigenvector you are. "max_tries" is the maximum number of
iterations to run. "convergence" is how close our vectors must be
before we say we are done.
add_arc
Add an arc to the pagerank object before running the computation.
The actual values don't matter. So you can run:
$pr->add_arc("Apple", "Orange");
To mean that "Apple" links to "Orange".
graph
Add a graph, which is just an array of from, to combinations. This
is equivalent to calling "add_arc" a bunch of times, but may be more
convenient.
results
Compute the pagerank vector, and return it as a hash.
Whatever you called the nodes when specifying the arcs will be the
keys of this hash, where the values will be the vector (which should
sum to 1).
PERFORMANCE
This module is pretty fast. I ran this on a 1 million node set with 4.5
million arcs in 57 seconds on my 32-bit 1.8GHz laptop. Let me know if
you have any performance tips.
COPYRIGHT
Copyright (C) 2008 by Michael Axiak
This package is free software; you can redistribute it and/or modify it
under the same terms as Perl itself