So why am I writing this article?


Is it because I think NoSQL solutions are inferior to RDBMS solutions? Of course not!


Is it because I’m focused on the way SQL does things and don’t want to get caught up in the uncertainty of a relatively new technology? No, it’s not! In fact, I was very excited to learn and use the facilities offered by various distributed databases.


Then why am I writing this?


The reason is simple – a few years ago, I witnessed the design of a system that provided pattern management facilities for telemetry events. It turned out to be much more expensive than originally planned. Why is that? Because you chose the wrong database solution.


One requirement of this system is to ensure that schema editing is consistent and that the latest version of the schema is displayed to each schema editor. It should also support concurrent editing.


Moreover, the number of simultaneous users accessing the system will never exceed a few hundred. The amount of data stored won’t be terabytes — a few hundred gigabytes at most.


Therefore, if we consider the trade-offs of the CAP theorem, the choice should be obvious — use an RDBMS. This has the benefit of supporting the consistency and transaction support requirements of the system.


Instead, a NoSQL database (Azure table storage) was chosen for prototyping. The official reason for this choice is that it makes prototyping faster and provides greater flexibility while updating the pattern of individual telemetry events. The low cost of Azure table storage compared to Azure SQL was cited as another reason.


Fast forward 5 months…


The system began to experience many problems with maintaining the integrity of CRUD operations. The thin application logic layer designed to handle transactions is no longer so thin. The story of upgrades and backward compatibility starts to get more complicated.


Plagued by many other problems, the engineers went back to the drawing board — this time replacing the storage tier with Azure SQL! I don’t remember the exact details, but this change added about 40% extra time and cost.

Management was not happy and the project was almost cancelled. But the engineers on the team were excellent and they were able to complete the project despite some delays and initial bad technical decisions.


The project has a happy ending — but it may not be. In fact, many internal projects were shut down because they could not deliver the promised features within the promised date range.


So how do you know a NoSQL solution is right for your next software project? Start by asking yourself and your team these 10 questions:


#1: Are you prepared for the cost of developer/system administrator training?

If you’re a mature IT software development company, chances are you already have someone familiar with SQL. This group includes not only developers, but also database administrators (DBAs).

Unless you plan to hire for a new NoSQL project, there will be training costs for existing developers and DBAs. Additional training may also extend project delivery dates.

A simple way to think about it is:

  • Count the total number of years that your team members (developers and DBAs) have had relational database technology.

  • Calculate the cost of the same number of years of NoSQL experience gained through training or new hires.

  • Finally, figure out what you’re getting for that cost. Your rate of return on investment?

On this particular project, none of the team’s developers had prior NoSQL experience, but had extensive SQL Server experience. Using the NoSQL solution added about 1 sprint to the training, again due to inexperience and design errors.


#2: What are your data transactions based on? Or, what level of transaction support do you need?

If your system requires ACID properties, you’re better off sticking with an RDBMS solution. Otherwise, you will spend a lot of time trying to replicate ACID guarantees in your application/business logic layer, and you may still not be as efficient as an RDBMS solution.


#3: Do you need Web/ high scalability?

Always figure out what scalability you need first. In this particular case, we’re building systems for Microsoft’s in-house game studios.

  • Between 10 and 15 game studios are considering it — depending on how many registered users are using the system

  • Each studio has a maximum of 3-5 active titles.

  • Each game is titled three environmental storage telemetry modes — Development, Pre-production (PPE), and Production

  • For each title, 2-5 data scientists will modify the game title data simultaneously

  • Each header event has about 50 KB of Max event data

  • We are required to store all versions — we estimate the number to be 1000 divided by the lifetime of a title

With the above rough estimates, we can calculate concurrency and storage requirements:

Total concurrency = Number of studios * number of titles per studio * Number of users per title

= 15 x 5 x 5 = 375 concurrent users


Maximum storage = Number of studios * Number of titles per studio * Number of environments * Event storage size per version * Number of versions to be stored

= 15 x 5 x 3 x 50 KB x 1000 = 11250000 KB = 11.25 GB maximum storage capacity


SQL Azure supports 1024 concurrent open connections and can easily support concurrent requirements. Also, 11.25 GB is actually a very small number when you think about cloud computing.

This system isn’t the next FaceBook or Bing — so is NoSQL’s route really worth it?


#4: Can NoSQL solutions really save you money?

On paper, Azure table storage is a cheaper option because it costs only cents per GIGAByte of data, while SQL Azure charges about $5 for data during that time.

But since our system won’t have more than 12 GB of storage — does it really matter? $60 a month is what we pay for 30 minutes of coding on the same system.

So before you decide to use NoSQL simply because of its lower unit cost, find out if the savings are a big part of your budget.


#5: Do you need to attract venture capital?

Interestingly, Silicon Valley has a bias against NoSQL. This is because NoSQL is perceived as inherently scalable, and RDBMSS are perceived as non-scalable. Remember, the key word is “feel”!

This sense of scalability may convince investors that your software is on the right track and ready for mass adoption, attracting their investment dollars.

Many NoSQL companies are VCS themselves, which gives them a positive bias.

Finally, all the marketing around “NoSQL” helps drive positive investor sentiment towards your product.


#6: Are you hiring entrepreneurial people?

If you are going to hire entrepreneurial people, many of them will probably already have knowledge of NoSQL.

However, if you’re not in a major tech hub, there are fewer opportunities to get that talent. Your region may have an existing pool of RDBMS developers — trying to recruit NoSQL engineers and DBAs in such a region can delay project delivery dates and cost you more money due to the supply-demand curve.

My advice is to work with your recruitment agency/HR department to do market research on developers and incorporate it into your technology options.


#7: What technologies are your customers using downstream?

Consider a scenario where you deliver analytics data to a customer. You are using NoSQL to store analysis data. However, one of your customers decided to stick with an SQL-based reporting system.

What does that mean to you?

This means that you now need to convert all your NoSQL data into SQL format and push it down to the customer’s SQL database via services like Azure Data Factory. This is where you incur additional development and operating costs. If all of your downstream customers are using SQL, then you need to seriously consider whether using NoSQL and doing all this expensive data transformation makes sense for your system.


#8: Does usability trump consistency with your product?

If you’re building a system like Facebook’s Newsfeed, you probably want the system to be highly available and ultimately consistent.

On the other hand, if you are building a banking system (or schema store, as in our case), you may want to support strong consistency and forgo high availability.

Either way, you should first consider the implications of the CAP theorem and then decide whether your system needs an SQL or NoSQL solution.


#9: Do you anticipate significant changes to the database schema?

If you expect to make a lot of changes to your database schema, as is often the case with mobile applications, real-time analytics, content management systems, and so on, then a NoSQL solution might be one way to do it.

You can use a partitioning scheme that allows you to update your database schema in a more convenient way than most SQL databases allow.


#10: Do you want to use NoSQL for personal enrichment/fulfillment?

Please don’t do this!

I’ve seen people who are just obsessed with learning a NoSQL system and putting it on their resume. There’s nothing wrong with that — I’m also fascinated by NoSQL technology.

But don’t let this be the driver (consciously or subconsciously) behind the choice of technology stack. You can study on your own time if you like.


Who won the database wars?


Let’s be frank — no player can win and take all!

In many cases, you may want SQL and NoSQL technologies to coexist on the same system. For example, if you’re building a photo-sharing application like Instagram, your photos might be in a NoSQL database, and your login/ACL information might be in an SQL database.


English: Deb Haldar, Translated: Open Source China

Reference:www.oschina.net/translate/10-questions-to-ask-yourself-before-choosing-a-nosql-database





With all that said, I just want you to focus on me