6 Rules of Thumb for MongoDB Schema Design: Part 3

William Zola, Lead Technical Support Engineer at MongoDB

This is the last stop in modeling the 1-to-n relationship in MongoDB. In my first post, I covered three basic approaches to modeling 1 versus N. In the last article, I talked about some extensions of these basic methods: bidirectional referencing and anticanonization.

De-canonization allows you to avoid application-level joins at the expense of costly and complex updates. If those fields are read more frequently than updates, it makes sense to de-canonize one or two of these fields.

If you forget, refer to part 1 and Part 2.

Look at the choices

To review:

  • You can embed, reference from the 1 end, or reference from the N end, or combine either of these techniques.

  • You can de-canonize as many fields as you want to the 1 end or the N end

Anticanonization, in particular, gives you a lot of options: if there are eight options in the relationship that can be anticanonized, there are 28 different ways to do anticanonization (including not doing it at all). By combining three different reference methods, you can have over 3000 ways to model relationships.

Guess what? You are caught in the “paradox of choice” — because you have so many potential ways to model a 1-to-N relationship, it is now more difficult to model how.

Rule of thumb: A guide to getting through the rainbow

Here’s a rule of thumb to guide you through the myriad options.

  • One, use embedding unless there is a clear reason for it.

  • Second, the need to access the object itself can be an explicit reason not to use embedding.

  • Three, arrays should not grow indefinitely. If many ends have more than a few hundred documents, don’t use inline. Do not use the ObjectID reference array if you have more than a few thousand documents in many paragraphs. High-radix arrays are an obvious reason not to use inline.

  • Fourth, don’t be afraid of application-level joins: If you index correctly and use projection operations (see [Part 2](at the expense of having more complex and expensive Update), application-level joins are not more expensive than server-side joins in a relational database.

  • Fifth, consider uncanonical read/write ratios. A field that is read frequently and rarely updated is a good anti-canonization option. If you van

  • In MongoDB, how you model your data depends largely on the data access patterns in your particular application. You can adjust the structure of your data to match the way your application queries and updates data.

Your rainbow guide

To model 1 versus N relationships in MongoDB, you have many different options, so you have to think carefully about the structure of your data.

The main criteria you need to consider are:

  • What is the cardinality of this relationship: one pair of many, one pair of many, or one pair of very many?

  • Do you need to access n-terminal objects alone, or do you just need to access them in the context of the parent object?

  • What is the ratio of reads to updates for this particular field?

The main options for constructing data are:

  • For as little as 1 pair, you can use a set of embedded documents.

  • For many pairs of 1 or when n-ends must be used alone, you should use arrays by reference. You can also use the parent reference method on the N end if doing so can optimize your data access pattern.

  • For 1 pairs, you should use parent references in documents that store n-ends.

Once you have decided on the overall structure of the data, you can choose to de-formalize across different documents, either by formalizing data from 1 to N or from N to 1.

These operations are only appropriate for fields that are read frequently, read more frequently than updated, and have no strong consistency requirements, because updating de-canonized values is slow, costly, and non-atomic.

Productivity and Flexibility

As a result, MongoDB allows you to design data to match the needs of your application. You can structure your data in MongoDB so that it can easily adapt to changes and support the most queries and updates you need in your application.