# Get disconnected pairs of nodes in the network graph?

This is my dataset:

``````4095    546
3213    2059
4897    2661
...
3586    2583
3437    3317
3364    1216
``````

Each line is a pair of nodes which have an edge between them. The whole dataset build an graph. But I want to get many node pairs which are disconnected with each other. How can I get 1000(or more) such node pairs from dataset? Such as:

``````2761    2788
4777    3365
3631    3553
...
3717    4074
3013    2225
``````

Each line is a pair of nodes without edge.

Just do a BFS or DFS to get the size of every connected component in `O(|E|)` time. Then once you have the component sizes, you can get the number of disconnected nodes easily: it's the sum of the products of every pair of sizes.

Eg. If your graph has 3 connected components with sizes: 50, 20, 100. Then the number of pairs of disconnected nodes is: `50*20 + 50*100 + 20*100 = 8000`.

If you want to actually output the disconnected pairs instead of just counting them, you should probably use union-find and then just iterate through all pairs of nodes and output them if they're not in the same component.

I think the other options are more general, and probably nicer from a programmatic view. I just had a quick idea how you could get the list in a very easy way using numpy.

First create the adjacency matrix and your list of nodes is an array:

``````    import numpy as np
node_list= np.random.randint(10 , size=(10, 2))
A = np.zeros((np.max(node_list) + 1, np.max(node_list) + 1)) # + 1 to account for zero indexing
A[node_list[:, 0],  node_list[:, 1]] = 1 # set connected nodes to 1
x, y = np.where(A == 0) # Find disconnected nodes
diconnected_list = np.vstack([x, y]).T # The final list of disconnected nodes
``````

I have no idea though, how this will work with really large scale networks.