Graphs

please contact Christian Sommer for comments and questions, or if you have other data sets.
last update April 2010

used for shortest path queries, DIMACS means 9th DIMACS Implementation Challenge - Shortest Paths


DBLP graph, degrees, log log, showing power law

DBLP graph

The DBLP Computer Science Bibliography
co-author graph
largest connected component
Web graph, degrees, log log, showing power law

Web graph

WebGraph by the Laboratory for Web Algorithmics
link graph
interpreted as undirected graph (in which case it is already connected)
Router topology, degrees, log log, showing power law

Router topology

CAIDA's Router-Level Topology Measurements
"The [...] data file holds link directions corresponding to the traceroute directions."
second file (itdk0304_rlinks_undirected), interpreted as undirected graph, largest connected component
Citation graph, degrees, log log, showing power law

Citation graph

KDD competition, citation graph of the hep-th portion of the arXiv
hep-th citations tarball, interpreted as undirected graph, largest connected component

Database of Interacting Proteins

DIP, downloaded dip20090126.txt
experimentally determined interactions between proteins
interpreted as undirected graph, largest connected component

BioGRID

BioGRID, downloaded HC-BIOGRID-2.0.31.tab
Biogrid: A General Repository for Interaction Datasets, database of physical and genetic interactions
interpreted as undirected graph, largest connected component

DIMACS format

copied from DIMACS

A graph contains n nodes and m arcs
Nodes are identified by integers 1...n
Graphs can be interpreted as directed or undirected, depending on the problem being studied
Graphs can have parallel arcs and self-loops
Arc weights are signed integers
By convention, shortest path graph file names should have the suffix .gr. Line types are as follows. In the format descriptions below, bold characters should appear exactly as typed:
Comment lines can appear anywhere and are ignored by programs. c This is a comment
The problem line is unique and must appear as the first non-comment line. This line has the format on the right, where n and m are the number of nodes and the number of arcs, respectively. p sp n m
Arc descriptors are of the form on the right, where U and V are the tail and the head node ids, respectively, and W is the arc weight. a U V W

Some of the DIMACS graph files that come with coordinates can be visualized using Google Earth and KML. The program compiled from the source code kmlgen.cpp takes as input a .gr and a .co file of the same graph and generates a .kml file.