GRASP is aimed at developing scalable high-performance solutions for graph processing employing resources available on a heterogeneous computing cluster. The key goal of this work is to develop easy to use programming frameworks with sophisticated runtimes that handle diverse and emerging scenarios. To date GRASP has achieved the following:
MULTIQUERY & BATCHING SCENARIOS
We have developed the distributed MultiLyra [BigData'19] system whose scalablity via overhead amortization enables simultaneous evaluation of batches of hundreds of iterative graph queries. BEAD [BigData'20] extends MultiLyra to consider scenarios in which a batch of queries needs to be continuously reevaluated due to changes to the graph. For the shared-memory setting, we have developed VRGQ [SIGOPS'21] that employs value reuse to speedup evaluation of future graph queries using results of prior queries. SimGQ [HiPC'20] is an online system that combines value reuse with batching. It optimizes evaluation of a batch of queries by reusing results of common subcomputations. SimGQ+ [JPDC'22] extends this batched system to evaluate point-to-point queries. It is based upon the PnP [ASPLOS'19] system that efficiently evaluates point-to-point graph queries. The distinctive feature of PnP is its bidirectional phase followed by a unidirectional phase based upon dynamic direction selection. Our most recent work on Glign [ASPLOS'23a] develops a set of techniques that improve the efficiency with which a batch of queries is evaluted by aligning their associated graph traversals. Our recent work shows the use of batched-PnP solver in designing the P4[IROS'24] Multi Agent Path Finding algorithm that scales to several hundred agents.
STREAMING & EVOLVING GRAPHS
To process graphs that change over time, EvoG [TACO'16] incorporates optimizations for incrementally processing multiple graph snapshots in a pipelined fashion. We have developed BEAD [BigData'20] to consider the scenario in which graph is allowed to change requiring simultaneous evaluation of a batch of queries to be continuously repeated. For the streaming graphs scenario we have developed KickStarter [ASPLOS'17b] is a dynamic dependence tracking based technique to correctly and efficiently reevaluating an installed query in presence of edge additions and deletions. We developed an improvement IncBoost [SoCC'24] that provides greater scalability and incrementally handles updates to edge weights. Our Tripoline [EuroSys'21] system generalizes the streaming model to support incremental evaluation of arbitrary queries from results of fixed set of (different) installed queries. It essentially employs value reuse by using the results of installed queries to incrementally evaluate other user queries. Finally, our work on CommonGraph [ASPLOS'23b] develops a technique for efficiently supporting deletions of edges in an evolving graph and eliminating computational redundancy that enables scaling parallel evaluation of a query on large number of snapshots. In recent work we have developed PANNS [PPoPP'25], a new approximate nearest neighbor search algorithm that considers temporal dimension in evaluating a query.
CUSTOM ACCELERATORS & GPUs FOR
GRAPH PROCESSING
In collaboration with Dr. Abu-Ghazaleh's group we have designed the GraphPulse [MICRO'20] custom accelerator for Graph processing that implements an asynchronous event-driven model of computation directly in hardware. GraphPulse eliminates a number of inefficiencies associated with BSP-based Push and Pull models. We have further generalized the above accelerator to allow for incremental reevaluation of graph queries following graph updates that take the form of edge additions and deletions. The JetStream [MICRO'21] accelerator is the first accelerator for graph queries that handles the streaming graph scenario. We have recently developed the MEGA [MICRO'23] accelerator that simultaneously process multiple snapshots of an evolving graph.

     CuSha [HPDC'14] is a CUDA-based graph processing framework that uses two novel graph representations, G-Shards and Concatenated Windows to allow coalesced memory access and overcome GPU under-utilization. WS-VR [PACT’15] is a multi-GPU graph processing framework that delivers high SIMD-efficiency and inter-GPU communication efficiency. Solutions to branch divergence and load balancing can be found in CCC [MICRO'15] and CTE [IPDPS'16]. In recent work we present G-Morph that uses a GPU to accelerate the subgraph isomorphism problem [Euro-Par'21]. We are also developing techniques for efficiently processing large graphs that cannot fit in GPU memory. In work led by Dr. Zhao's group, we presented Subway [EuroSys'20] a partition-based graph processing framework that minimizes the overhead of transferring subgraphs between host and GPU memory.

LIMITED MEMORY SINGLE-MACHINE GRAPH GENERATION, PARTITIONING & PROCESSING
While today's machines are well equipped with multiple cores and large main memories, the ever growing graph sizes pose a memory scalability challenge in processing real world graphs on single machines. For in-memory scaling, our Core Graph [EuroSys'24] work identifies a proxy graph of a power law graph that contains roughly 10% of edges and yet produces precise results for over 95% of vertices across random queries. Also our Input Reduction [HPDC'16] based processing trasforms the original graph to a smaller graph which is used to calculate intermediate approximation that is usable for incremental processing over the original graph. For out-of-core graph processing, Dynamic Partitions [USENIX ATC'16] truly capture the dynamic nature of graph computation by adapting the set of edges that are held in memory during runtime. This approach greatly reduces IO in out-of-core graph processing. We have als exploited similar approch for distrubuted systems via ExpressWay [BigGraph'23].

     We have also developed graph generation and graph partitioning algorithms under limited memory constraint. PaRMAT [GitHuB] is a tool designed to create very large RMAT graphs on machines with limited amount of memory. It provides various options for the RMAT graph: being directed or non-directed, disallowing duplicate edges, etc. InfiniMem [LCPC'15] enables generation and processing of large graphs that do not fit in memory by having them reside on disk and allows the programmer to code the applications in a size oblivious fashion. Our GO [NAS'21] out-of-core system performs Graph Partitioning on large input data sets that do not fit in the memory. Finally, we have developed OMR [ISMM'18] out-of-core system for MapReduce computations on a single machine with limited memory. We have built upon this system and developed OMRGx that transparently performs out-of-core partitioning and partition-based graph processing on a single machine [ISMM'23].

ASYNCHRONOUS DISTRIBUTED PROCESSING
The long communication latencies involved in cluster based processing heavily impact the performance of graph processing frameworks. While the communication cannot be completely eliminated, the inherent asynchrony involved in graph algorithms allow tolerating the inter-node latency by using stale vertex values. ASPIRE [OOPSLA'14] introduces a novel relaxed memory consistency model & protocol that simultaneously tolerate communication latency and minimize the use of stale values. CoRAL [ASPLOS'17a] is the first fault-tolerance protocol for distributed asynchronous graph processing. It uses local snapshots, avoids rollback, and leverages the algorithmic asynchrony to progress the global execution state to a resumable state from which forward progress produces accurate results.
PAPERS ACCEPTED
PANNS paper accepted in PPoPP 2025!
IncBoost paper accepted in SoCC 2024!
P4 paper accepted in IROS 2024!
Core Graph paper accepted
in EuroSys 2024!
MEGA evolving graph accelerator
accepted in MICRO 2023!
OMRGx paper accepted in ISMM 2023!
Glign paper accepted in ASPLOS 2023!
CommonGraph paper accepted in ASPLOS 2023!
SimGQ+ batched query evaluation paper accepted in JPDC 2022!
JetStream streaming graph accelerator paper accepted in MICRO'21!
GO paper on out-of-core partitionng of large graphs accepted in NAS'21!
G-Morph & DSGEN papers on GPUs accepted in Euro-Par'21 and ICS'21!
VRGQ paper invited to SIGOPS'21 special issue on Graph Computing!
Tripoline paper on streaming graphs accepted in EuroSys'21!
SimGQ & BEAD papers accepted in HiPC'20 and IEEE BigData'20! Also SimGQ received a Best Paper Award.
GraphPulse & Subway, first joint papers with Prof. Abu-Ghazaleh and Prof. Zhao accepted in MICRO'20 and EuroSys'20!
MultiLyra paper on evaluating batches of queries accepted in IEEE BigData'19!
PnP paper on point-to-point graph queries accepted in ASPLOS'19!
OMR paper on an Out-of-core MapReduce system accepted in ISMM'18!
CoRAL paper on recovery for asynchronous graph processing accepted in ASPLOS'17a!
KickStarter paper on streaming graphs accepted in ASPLOS'17b!
Evolving graph paper accepted in
TACO'16!
Dynamic Shards paper accepted in USENIX ATC'16!
Graph Reduction paper accepted in HPDC'16!
CTE paper on intra-warp load balancing in GPUs accepted in IPDPS'16!
CCC paper on tolerating divergence in GPUs accepted in MICRO'15!
InfiniMem paper on size oblivious programming accepted in LCPC'15!
WS&VR paper on scalable graph processing using GPUs accepted in PACT'15!
Stash paper on scalable hashing for GPUs accepted in PACT'15!
ASPIRE paper on DSM based graph processing accepted in OOPSLA'14!
CuSha paper on processing graphs using GPU accepted in HPDC'14!

SOFTWARE
Glign is available here
for download!
12/19/2022
OMR is available on GitHub
for download!
10/21/2019
KickStarter is available on GitHub
for download!
10/21/2019
WS&VR is available on GitHub
for download!
09/29/2015
PaRMAT is now available on GitHub
for download!
10/29/2014
CuSha is now available on GitHub
for download!
04/08/2014