email addr as _id instead of OID?

I’m using MongoDB as a database back-end for a list serv I built and run for the two high schools in my district.

Each school will have its own database. Within the database for a school there will be collections.

  • There are multiple e-lists (subscription channels) at each school
  • Each subscriber can be subscribed to 1 or more e-lists (if they are subscribed to zero e-lists, they aren’t a subscriber)
  • Each e-list (subscription channel) will have its own collection where each subscriber has a document (record).
  • The document for a subscriber within an e-list collection will have the email address of the subscriber and some administrative info

There will another collection containing all email addresses.

  • Each document in this collection will be a master record for the subscriber.
  • Each doc will contain the info about the user (email addr, first name, last name, etc.) and I will keep track of all the e-lists to which they are subscribed so I don’t have to iterate over all the e-list collections to find which subscriptions a user has.

I know for certain that all email addresses will be unique. My question is this… should I:

  1. Create the collections with a standard OID for the mongo _id field and add an index for the email field
  2. Use the email addr as the mongo _id field since I know it will be unique and forgo using OID

It might be worth noting that and I am running on a small footprint server.

  • Option 1 is more classic but the OID and email address are redundant since the email address is unique.
  • From a storage perspective option 2 will be more efficient on disk storage and possibly more efficient for RAM utilization
  • All my foreign key relationships are based on email addr. I can’t think of a reason I need the OID for the _id field.
  • Being a relative newbie with Mongo I’m wondering if I will be sorry later on down the road if I don’t go with option 1 (classic OID). In other words, am I “over optimizing”?

1 answer

  • answered 2020-09-24 14:54 D. SM

    Your proposed approach of using email address as _id will run into problems when:

    • A user wants to change their email address (not easily possible when other documents reference this email address)
    • A user wants to have multiple email addresses

    Using the generated identifiers for _id field is a smart default.

    You should most definitely not bother even thinking about space optimization until your dataset can potentially exceed the available RAM that would be realistically justifiable for the project (i.e. if you are developing an app for a small business, and their customer base grows, they likely can afford a server with 16 or 32 GB RAM).