MongoDB sharding storage usage

I am reading about sharding in MongoDB. After understanding how it works, I have a very basic question regarding the storage space used by it.

Suppose, I have a server containing 1 GB of storage. Now assuming my data will grow beyond 1 GB, it won't be sufficient for my purpose. So, I add one more server and shard Mongo.

So now, let's say I have 2 servers, with storage space say 1 GB each, which are to be included in the cluster. If I perform sharding, then both of these servers will be used to distribute Mongo data. So, in total, I must have 2 GB storage available for Mongo. But, I find that the official sharding documentation mentions that shards are replica sets. If that is so, then wouldn't the addition of 1 GB server just mean that I have only 1 GB storage (like before) for actual MongoDB data and remaining 1 GB is just replicated data?

If my understanding is correct, then is there any way to not create a replica set? Can we use 2 GB storage from both the servers like a logical volume?

Otherwise, if my understanding is wrong, what is the correct thing?

1 answer

  • answered 2018-07-11 05:59 Vaibhav Magon

    The documentation of Sharding at MongoDB says that - "Sharding distributes data across the shards in the cluster, allowing each shard to contain a subset of the total cluster data. As the data set grows, additional shards increase the storage capacity of the cluster". Here: (storage capacity)

    Since its a subset both contain different sets of data. So there can be multiple use's of a replica set (shards to store subset of data or saving data as a backup and creating replicas) based on the usage.

    Sharding happens one level above replication.

    When you use both sharding and replication, your cluster consists of many replica-sets and one replica-set consists of many mongod instances.

    However, it is also possible to create a cluster of stand-alone mongod instances which are not replicated or have only some shards implemented as replica-sets and some shards implemented as stand-alone mongod instances.

    Each shard is a replica set, not the shards are replica sets.

    I hope this helps.