Performance issues with Neo4j embedded

I have some old graph data with 1 million nodes and 3 million edges that I'd like to convert into Neo4j.

I'm using Neo4j embedded and my program is roughly like:

for (all node in old graph data):
    node1 = neo4jdb.findNode(node1_id)
    node2 = neo4jdb.findNote(node2_id)
    if (node1 or node2 doesnt exist):
        create new nodes
    if (! relationExistBetween(node1, node2)):
        create new relation between node1 and node2

However, the creation process is super slow. With the exact same logic, the program runs much faster with TinkerGraph.

I'm wondering if there're any tricks to make this faster? Thanks!

1 answer

  • answered 2018-10-11 20:39 peidaqi

    Figured it out. Profiled the code and found out the bottleneck lies within the findNode operation. That leads me to thinking maybe it's related to indexing.

    You have to manually create an index on the property to speed things up, with Neo4j Embedded, that's something like:

    var transaction = graphDB.beginTx()
    try {
      graphDB.schema()
        .indexFor(nodeLabel).on("node_id")
        .create()
      transaction.success()
    } finally {
      transaction.close()
    }