Do you need a classic database?

For some time, the sheer volume of data processing has forced all applications to add caching policies in front of the database layer. Even with a lot of underline optimization in classic databases, it still doesn’t provide enough speed or usability. The main reason is that the farther the data is stored, the harder it is to get it. Another reason is that data in a database is usually kept on disk, not in memory. Classical databases are optimized by embedding caches in memory, but having a dedicated separate cache is also a common strategy.

A common solution to the performance problem of accessing a database is caching. Caching is nothing new, and caching is actually keeping small amounts of frequently accessed data closer to you. We have caches on processors, we have caches in databases, you can even write caches in your own applications.

But as things have evolved, we now have highly available distributed memory caches that can be used by different instances simultaneously.

Cache – Redis

Perhaps the most popular distributed in-memory data store is Redis, which is not a cache, but is used as one. The official description is as follows:

Redis is an open source (BSD protocol) storage of in-memory data structures that can be used as databases, caches, message brokers. The data structures it supports include string, hash, list, set, ordered set, bitmap, super log, geospatial index and stream with radius query and stream, Redis has built-in replication, Lua scripting, LRU expulsion, transactions and different levels of disk persistence, and automatic partitioning via Redis Sentinel and Redis cluster.

Redis is fast, and it is considered one of the fastest data stores available. It is optimized for CPU caching and there are no context switches. It was designed from the start as an in-memory database, which meant more than just moving data from disk to memory, it was optimized from the start.

Because Redis is fast and can store a variety of data structures, it is a good alternative to distributed caching.

Redis has gained a lot of popularity as a cache. There are several cache loader libraries that use Redis as a caching layer between the application and the database. Take the Redisson map loader for example:

Therefore, using distributed caching can greatly improve performance. But the code and architecture became more complex. Data is copied to databases and caches, and we must keep them in sync. The code should manage the entire cache policy, controlling cache invalidation, and repopulating the cache, all to keep the data consistent. We achieved higher performance and scalability, but introduced high-risk complexity.

The data is repetitive

You might ask why keep your data in both places? Can’t you just save the data in Redis? By doing this we can reduce the complexity of the code. But first let’s look at some of the features and advantages of classic databases and see if we can implement them directly using Redis.

Advantages of relational databases

Traditionally, caches do not hold data for long periods of time. We keep data in the cache only for quick access, but for long duration persistence, we usually use a central database.

In addition to data persistence, relational databases provide other features such as data consistency. With relational databases, you can define relationships between data, constraints, and complex queries, built to ensure consistency across multiple related tables.

It has some important advantages, and even though NoSQL databases are popular, relational databases are not going away anytime soon.

But using Redis as a buffer with a relational database adds another layer of complexity because you have to keep the two data in sync through your code.

Given your caching strategy, you have to build some complex code to send data between Redis and the database. Don’t get me wrong, sometimes you have to. As mentioned earlier, relational databases have their advantages and we can’t throw them away.

But do we have to do this every time? What if you didn’t need a very complicated relationship between different data, and just one key map was enough? Can we get rid of relational databases?

Redis serves as the central data store

As mentioned earlier, the advantages of a relational database are consistency and persistence. If we do not need relational mapping between data, then it will only preserve persistence. There are many NoSQL databases that provide key-mapping storage, but we can use Redis directly.

Redis persistence

Redis has two persistence models: RDB and AOF.

The RDB saves data snapshots at specified intervals. They are ideal for quick backup recovery. RDB maximizes Redis performance because the only work the parent does is fork the child process that created the snapshot.

But since the RDB executes plans at regular intervals, this is not a good option if you can’t afford to lose some data. Fork is a costly operation and cannot be forked every time the data changes, so it is possible that the most recent data is not saved in the snapshot.

AOF is a different persistence model. It consists of an append only file where all data is added. It is more durable because the Fsync policy is usually more planned than the entire RDB. Because this file is used for appending only, the data is immutable. Even if a power outage occurs before the last piece of data is fully written, the build state before the power outage can be easily restored.

But it also has disadvantages. The first is that AOF files are usually larger than RDB files. In addition, if the fsync policy is scheduled too frequently, for example, after each command is written, performance will suffer. By default, fsync runs once per second.

Which one should you use?

If you want a level of security similar to what Postgres offers, you’ll have to use both. Using RDB allows you to restore backups faster after a restart. Using AOF can avoid data loss. But if you can afford to lose some data, you can just use RDB. Remember, Redis merges them into a single persistence model.

Other advantages

The future belongs to byte addressing

Because disk rotation has been a persistent unit for a long time, most databases today are still optimized to accommodate disk rotation. Such as data positioning to reduce disk rotation lag, and even the selection of specialized formats to place indexes in specific parts of the disk. But these optimizations don’t make sense with current technologies, such as SSDS. Redis stores data optimized for byte addressing. The future belongs to byte addressing, and Redis is already there.

Scalability and high availability

Redis provides different ways to achieve scalability and high availability.

You can split data across different Redis nodes to achieve a level of scalability. Sharding will reduce the burden of a single instance, and you’ll benefit from multiple cores and computing power. However, you should be aware of the limitations of sharding, as multi-key operations and transactions are not supported.

High availability through replication. The master node is replicated synchronously and is protected from node failures, data center failures, and Redis process failures. If the primary node goes down, the secondary node takes over. There is also a copy in a different AZ, which will protect you from the time of disaster, such as the entire AZ failing.

If you are going to use Redis enterprise clustering, all of this is abstract to you and you will have sharding and high availability without additional code. You can code into an instance of Redis.

Complex data structure

Redis can handle not only strings, but also different data structures such as: binary safe strings, lists, collections, sorted collections, bitmaps, super logs, streams, and more. This makes Redis not just a key-value store, but a complete data structure server.

Is not a silver bullet

It all sounds great, but as a matter of fact, nothing is a silver bullet, and Redis is not. The main drawback is that all data should fit into memory. This makes Redis suitable for data that has enough memory to store it. If not, the data must be split. But you will lose guarantees such as transactions, pipelines, or publish/subscribe.

conclusion

For a long time, Redis was thought of as just a cache. A nice distributed cache, but still just a cache between the application and the main database. As you can see, Redis is not just a cache, it tries to get rid of this misconception. Redis is not a cache, it is a distributed data store. It can process different data structures with incredible speed in thread-safe mode and provides a different mechanism for data persistence.

With all this in mind, even if Redis is used very successfully as a cache, it can do a lot more. If you don’t need SQL properties like relational data and high storage, why would you create a complex three-tier system in your application? Redis as cache and database? In these cases, you can just use Redis as the main persistence layer.

Welcome to pay attention to my public number, if you have a favorite foreign language technical articles, you can recommend to me through the public number message.