Everything you wanted to know about MongoDB but were too afraid to ask

At CodeSignal we use MongoDB to store the majority of our data, from programming tasks and test cases to coding session recordings (which can be quite large). We like MongoDB because its interface is JavaScript—every query uses JSON-formatted arguments—and so it fits in nicely with our web stack where we use NodeJS, React, and other JS frameworks. MongoDB is also very flexible because it’s a NoSQL database, so objects can be stored together even if they have a completely different structure.

In this article, we address some of the questions you might have if you’re approaching MongoDB for the first time, or if you’ve been using it for a while but ran into a tricky issue. We’ve become familiar with many of the things to take advantage of or watch out for, and we’re excited to share what we’ve learned!

I’m familiar with SQL…what is NoSQL exactly?

To explain, let’s use a basic example. Imagine you’re designing a database to store users and products. With SQL, you would need to define ahead of time what fields each of these objects has—like a firstname and lastname for the user, or a description for the product. You’d also have to stringently specify your data types. For example, you might decide that a product description is always a string with a 1000-character limit. These fields become the columns of your user table and product table respectively, and once defined, changing and adding to them means you have to restructure your database.

MongoDB works differently. Instead of rows in a table, your users and products become documents grouped into collections. A document is a JSON-like object, and you can insert an object with nearly any set of keys and values into a collection (there are some restrictions, which we’ll discuss later).

What does NoSQL mean for how I design my schema?

With SQL, best practices for designing associations between objects are enforced by the database structure itself. In other words, there’s really only one “right way” to do things. With MongoDB, you have several options that each have advantages and drawbacks. This flexibility can make your code more elegant and optimized, but it also means you need to think deeply about your data to make the right choice.

Returning to our example of users and products, let’s say that you want to represent the idea of a many-to-many relationship between users and products. In SQL, there’s essentially just one definitive way to do this: you’d create an associative table where every row describes a user-product pair. There are clear best practices to follow, like using foreign keys to ensure that when you delete items, your tables stay in sync.

In MongoDB, there’s no single right way to represent relationships between objects. If users “own” products, logically it might make sense to store products or product IDs in the user document itself. This will make it easy to get a list of products for each user. But maybe you really want a list of users for each product instead, and decide to store user IDs in product documents. No matter what you choose (and especially if you do both!) you have to be careful not to introduce inconsistencies when manipulating objects.

Once you’ve made a decision about how to represent associations, it can take a lot of refactoring to make a change later on. It’s important to consider your use cases carefully—not just for now, but thinking about how your needs might change in the future. Here are a few more pointers we’ve found helpful:

For associations, try asking yourself: can concept X be used on its own, without object Y? If the answer is no, consider storing X directly in the document for object Y. For example, imagine you have labels for products. Can labels be used without products? You might decide that the answer is no, and that’s a good indicator that labels should be stored in the product document itself. If you create a separate collection for labels, you’ll be doing extra work to fetch products every single time you use a label.

When you have arbitrary objects, store them as strings. It’s possible to get carried away because of how flexible MongoDB is. For example, we have test cases in our product that can contain pretty much any keys and values—it’s essentially unstructured data. MongoDB will let you store this as an object, but beware. Since you don’t really know what shape or properties the data will have, you can run into all kinds of problems, as we’ll explain in the next section. Typically in these cases, it’s preferable to stringify the data and parse it back into an object when needed.

What are some limitations I should watch out for in MongoDB?

Size limits

Documents in MongoDB have a size limit of 16 MB. This can become a problem when you decide to store a lot of data directly in a single parent document. We experienced this first hand—originally we had decided to store associations directly in one of our collections, expecting a few dozen associations per object at most. But as our business requirements changed, we ended up having thousands of items to store. At a certain point, we had to restructure this data and move it out into a collection. You can only do so much to predict the future, but you may want to consider storing large data elsewhere and linking it into your collection.

Nesting level limits

If you are storing a lot of arbitrary objects (which we recommend against!) you may run into limits with nested data. Currently, the limit is 100 levels. If you legitimately need to have a lot of nested objects, it can be useful to stringify them as described in the previous section.

Restricted characters

Some characters are restricted in MongoDB. For example, you can’t have .’s in object keys (and you can’t use escaping to get around this, either). If you don’t test for this, you might encounter an issue down the road when you try to store some data in production that doesn’t meet the requirements. Yet another reason why arbitrary objects can be harmful.

Can I subscribe to updates?

MongoDB has an important feature called the oplog (short for “operations log”). It’s a system collection that stores all the database changes that have happened chronologically, capped at some fixed amount of data.

The oplog can be very helpful for root cause analysis for recent changes. Compared to backups, which capture changes every day or every hour, the oplog is fully granular. You can see the exact time of the change and which values were updated, along with some other arguments as well. From this information, you might be able to determine where the change came from.

Useful integrations are possible with the oplog. If you’re using MeteorJS, for example, you can subscribe to certain database documents and display live data without implementing live updates on your own.

How do I ensure reliability?

For production deployments, MongoDB supports replica sets, where every replica set is a separate instance that has a copy of the whole database. It’s essential to use replica sets to prevent data loss or downtime.

Exactly one of the replica sets is always the primary, and every other set is secondary. The secondary replica sets automatically sync up with the primary, with some small delay. You can read from all replica sets, but you can only write to the primary. MongoDB has some logic in place so that if the primary fails, a secondary replica set automatically becomes the primary.

How do I host MongoDB?

Making sure the database is reliable and stable is a major responsibility. You can decide to take this on yourself, or you can choose from a variety of managed service providers for MongoDB.

We can only speak for our experience, but having tried a few of the leading solutions, we’ve been quite satisfied with MongoDB Atlas. It offers the best tooling that we’ve found, at a competitive price. Plus, since MongoDB develops it, Atlas tends to always have the latest version of the software and support the newest features.

What if I need to migrate my data?

In our opinion, one of the highlights of Atlas is that it supports live migration. In theory, you could migrate your MongoDB data by taking a snapshot at some point in time, spinning up a new instance, and copying the database over. But this introduces some downtime because you can’t have people making changes to the data after the snapshot happens. What if your database is powering a user experience in production? That’s where live migration comes in.

To get started, you create a direct connection between your current production database and the new instance that you’re migrating to. The new instance pulls all the data in, and then it starts mirroring the updates in the production database. In essence, you end up with two running copies of the same database that are kept in sync live. At any point you can switch over to the new instance and take down the old one, making for a smooth transition.

We hope this article has helped you in some way! Feel free to visit our careers page if you’re interested in learning more about engineering at CodeSignal.

Everything you wanted to know about MongoDB but were too afraid to ask

I’m familiar with SQL…what is NoSQL exactly?

What does NoSQL mean for how I design my schema?

What are some limitations I should watch out for in MongoDB?

Size limits

Nesting level limits

Restricted characters

Can I subscribe to updates?

How do I ensure reliability?

How do I host MongoDB?

What if I need to migrate my data?

Keep reading

Prompt engineering courses: Questions to ask before you begin

LLM prompt engineering: What it is and how it can help with your AI success

Unlock your potential with our latest additions to the Experts Series

Ready to optimize your hiring and L&D processes?

Level up your career with practice-based learning

Platform

Use Cases

Industries

Roles

Resources

Company