Get disconnected pairs of nodes in the network graph?

This is my dataset:

4095    546
3213    2059 
4897    2661 
...
3586    2583
3437    3317
3364    1216

Each line is a pair of nodes which have an edge between them. The whole dataset build an graph. But I want to get many node pairs which are disconnected with each other. How can I get 1000(or more) such node pairs from dataset? Such as:

2761    2788
4777    3365
3631    3553
...
3717    4074
3013    2225

Each line is a pair of nodes without edge.

2 answers

  • answered 2018-11-08 05:45 Paulpro

    Just do a BFS or DFS to get the size of every connected component in O(|E|) time. Then once you have the component sizes, you can get the number of disconnected nodes easily: it's the sum of the products of every pair of sizes.

    Eg. If your graph has 3 connected components with sizes: 50, 20, 100. Then the number of pairs of disconnected nodes is: 50*20 + 50*100 + 20*100 = 8000.

    If you want to actually output the disconnected pairs instead of just counting them, you should probably use union-find and then just iterate through all pairs of nodes and output them if they're not in the same component.

  • answered 2018-11-08 06:19 sr_steinkamp

    I think the other options are more general, and probably nicer from a programmatic view. I just had a quick idea how you could get the list in a very easy way using numpy.

    First create the adjacency matrix and your list of nodes is an array:

        import numpy as np
        node_list= np.random.randint(10 , size=(10, 2))
        A = np.zeros((np.max(node_list) + 1, np.max(node_list) + 1)) # + 1 to account for zero indexing
        A[node_list[:, 0],  node_list[:, 1]] = 1 # set connected nodes to 1
        x, y = np.where(A == 0) # Find disconnected nodes
        diconnected_list = np.vstack([x, y]).T # The final list of disconnected nodes
    

    I have no idea though, how this will work with really large scale networks.