Abstract

Learning low-dimensional representations (embeddings) of nodes in large graphs is key to applying machine learning on massive biological networks. Node2vec is the most widely used method for node embedding. However, its original Python and C++ implementations scale poorly with network density, failing for dense biological networks with hundreds of millions of edges. We have developed PecanPy, a new Python implementation of node2vec that uses cache-optimized compact graph data structures and precomputing/parallelization to result in fast, high-quality node embeddings for biological networks of all sizes and densities. PecanPy software and documentation are available at https://github.com/krishnanlab/pecanpy.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* https://github.com/krishnanlab/pecanpy

Details

Title
PecanPy: a fast, efficient, and parallelized Python implementation of node2vec
Author
Liu, Renming; Krishnan, Arjun
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2020
Publication date
Jul 24, 2020
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2426706409
Copyright
© 2020. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.