PecanPy: a fast, efficient, and parallelized

Abstract

Learning low-dimensional representations (embeddings) of nodes in large graphs is key to applying machine learning on massive biological networks. Node2vec is the most widely used method for node embedding. However, its original Python and C++ implementations scale poorly with network density, failing for dense biological networks with hundreds of millions of edges. We have developed PecanPy, a new Python implementation of node2vec that uses cache-optimized compact graph data structures and precomputing/parallelization to result in fast, high-quality node embeddings for biological networks of all sizes and densities. PecanPy software and documentation are available at https://github.com/krishnanlab/pecanpy.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* https://github.com/krishnanlab/pecanpy

Details

Title

PecanPy: a fast, efficient, and parallelized Python implementation of node2vec

Author

Liu, Renming; Krishnan, Arjun

University/institution

Cold Spring Harbor Laboratory Press

Section

New Results

Publication year

2020

Publication date

Jul 24, 2020

Publisher

Cold Spring Harbor Laboratory Press

ISSN

2692-8205

Source type

Working Paper

Language of publication

English

DOI

https://doi.org/10.1101/2020.07.23.218487

ProQuest document ID

2426706409

© 2020. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

PecanPy: a fast, efficient, and parallelized Python implementation of node2vec

Jump to:

Abstract

Details

Suggested sources