- Performance Differences Between Postgres and MySQL
- Blessing Krofegha
- The Nuggets translation Project
- Permanent link to this article: github.com/xitu/gold-m…
- Translator: Greycodee
- Proofreader: 1autodidact, Kamly
Introduction to the
In the Arctype community, we answer a lot of questions about database performance, especially between Postgres and MySQL. Performance is a critical and complex task in managing databases. It may be affected by configuration, hardware, or operating system. Whether PostgreSQL and MySQL are stable and compatible depends on our hardware infrastructure.
Not all relational databases (RDBMSS) are created equal. Although PostgreSQL and MySQL are similar in some ways, they have their own performance advantages in different usage scenarios. Although we discussed some of the basic differences between them in the previous article, there are many performance differences worth discussing.
In this article, we will discuss workload analysis and running queries. Then, we’ll further explain some basic configurations that can improve the performance of MySQL and PostgreSQL databases. Finally, to summarize some key differences between MySQL and PostgreSQL.
directory
- How to measure Performance
- Query the JSON performance
- Index overhead
- Database replication and clustering
- concurrent
- conclusion
How to measure Performance
MySQL is praised for its excellent read speed despite its poor concurrency when mixed with read and write operations. PostgreSQL (commonly known as Postgres) says it is the most advanced open source relational database and has been developed as a standards-compliant, feature-rich database.
Previously, Postgres was more balanced in performance, that is, reads were generally slower than MySQL, but it has since improved and can now write large amounts of data more efficiently, allowing for better concurrent processing. The latest versions of MySQL and Postgres have slightly smoothed out the performance differences between the two databases.
Using the old MyISAM engine in MySQL can read data very quickly. Unfortunately, the latest version of MySQL does not use this engine. However, with InnoDB (allowing key constraints, transactions), the difference is negligible. The functionality in InnoDB is essential for enterprises or applications with a large number of users, so using the old engine is not an option. However, as the MySQL version continues to be updated, the differences become smaller and smaller.
A database benchmark is a reproducible experimental framework for representing and comparing the performance (time, memory, or quality) of database systems or algorithms on those systems. This practical framework defines the system under test, workload, metrics, and experiments.
In the next four sections, we discuss the performance benefits of each database.
JSON queries are faster in Postgres
In this section, we take a look at the benchmark differences between PostgreSQL and MySQL
step
- Create a project (Java, Node, or Ruby) that uses PostgreSQL and MySQL as its database.
- Create a JSON object, and then read and write.
- The entire JSON object is about 14 MB in size and creates about 200 to 210 entries in the database.
statistics
PostgreSQL: Average time (ms) : Write: 2279.25, read: 31.65, Update: 26.26
MySQL: Average time (ms) : Write: 3501.05, read: 49.99, update: 62.45
conclusion
As you can see from the data above, PostgreSQL handles JSON better than MySQL, which is one of PostgreSQL’s highlights.
We can perform frequent operations on the database (read, write, update) to understand its performance and then select the best ones to use on your project. MySQL is faster than PostgreSQL, but only under certain conditions.
The index
Indexing is one of the most important features of any database. A database queries data much faster with indexes than without indexes. However, indexes can also add overhead to the database, so we just need to use them wisely. In the absence of an index, the database will perform a Full Text search for the data, that is, compare the data line by line from the first row, so that the more data, the slower the query.
PostgreSQL and MySQL both have specific methods for handling indexes:
- B-tree index: PostgreSQL supports b-tree indexes and Hash indexes. PostgreSQL also supports the following features:
- Expression index: Instead of using fields, we can create an index for an expression or function.
- Local index: An index is only part of a table
Suppose PostgreSQL has a user table where each row represents a user. Then the table can be defined like this:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR DEFAULT NULL,
name VARCHAR
);
Copy the code
Suppose we create the following index for this table:
What’s the difference between the two indexes? Index #1 is a local index, and index #2 is an expression index. As described in the PostgreSQL documentation,
“Local indexes are built on a subset of rows in a table defined by conditional expressions (predicates called local indexes). The index contains only entries for those table rows that satisfy the predicate. The main reason for using local indexes is to avoid indexing common values. Since the values that a query would typically present (values that are more than a few percent of all table rows) traverse most tables anyway, the benefits of using indexes are trivial. A better strategy is to create a local index where these rows are completely excluded. Local indexes reduce the size of indexes, thus speeding up queries that use indexes. This will also make many writes faster, since indexes don’t need to be updated in all cases.” — from the PostgreSQL documentation
MySQL: : Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, FULLTEXT) use the b-tree data structure. The r-tree data structure is also used in special cases. MySQL also supports Hash indexes, and FULLTEXT indexes are in reverse order when used in the InnoDB engine.
Database replication
Another performance difference between PostgreSQL and MySQL is replication. Replication refers to copying data from one database to a database on another server. This distribution of data means that users can now access the data without directly affecting other users. One of the biggest challenges of database replication is coordinating data consistency across a distributed system. MySQL and PostgreSQL offer several options for database replication. In addition to one primary server, one standby database, and multiple standby databases, PostgreSQL and MySQL provide the following replication options:
Multi-version Concurrency Control (MVCC)
When a user reads and writes to a database at the same time, this phenomenon is called concurrency. Therefore, multiple clients reading and writing at the same time can result in various edge cases/race conditions, i.e. read first and write later for the same record X and many other conditions. All modern databases leverage transactions to mitigate concurrency problems.
Postgres was the first DBMS to introduce multi-version concurrency Control (MVCC), meaning that reads never block writes and vice versa. This feature is one of the main reasons enterprises prefer Postgres over MySQL
“Unlike most databases that use locks for concurrency control, Postgres maintains data consistency by using a multi-version model. In addition, when querying the database, each transaction sees a snapshot of the data (database version) as before, regardless of the current state of the underlying data. It provides transaction isolation for each database session by preventing transactions from viewing inconsistent data caused by (other) concurrent transaction updates on the same data row.” — from the PostgreSQL documentation
MVCC allows multiple readers and writers to interact with the Postgres database at the same time, eliminating the need for read/write locks every time someone interacts with data. A side benefit is that the process can be significantly more efficient. MySQL uses the InnoDB storage engine to write and read the same row without interfering with each other. MySQL also writes an entry to the rollback segment each time data is written to a row. This data structure stores rollback logs used to restore rows to their previous state. It is called a rollback segment because it is the tool used to handle rollback transactions.
“InnoDB is a multi-version storage engine: it retains information about older versions of rows that have changed to support transactional features such as concurrency and rollback. This information is stored in a data structure in the table space called a rollback segment (Oracle also has a similar structure). InnoDB uses the information in the rollback segment to perform undo operations needed in a transaction rollback. It also uses this information to build earlier versions of rows for consistent reads. — from MySQL documentation
conclusion
In this article, we’ve addressed some of the performance differences between PostgreSQL and MySQL. Although database performance depends on hardware, operating system type, and so on, the most important thing is your knowledge of the target database. PostgreSQL and MySQL both have their pros and cons, but knowing which features are right for a project and incorporating them can ultimately improve performance.
I’d love to hear about your experience with database performance.
If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.
The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.