recommended
The article will be first published on the public account “Code-Full” and personal blog “Li Yi’s small station”. If this article is helpful to you, please pay attention to the public account!
PyCon profile
PyCon is the world’s largest technical conference devoted to the Python programming language. The conference is organized by the Python community and is held annually. The conference brings Together Python users and core developers from around the world to share new things about the Python world, examples of the language, tips on how to use it, and more.
Sets a brief introduction to
Instagram is a mobile photo and video sharing app founded by Kevin Systrom and Mike Krieger in 2010. Instagram quickly became popular after it was posted. Was acquired by Facebook in 2012 for $1 billion. At the time, Instagram had just 13 employees.
Today, Instagram has a total of 3 billion registered users and more than 700 million monthly active users (for comparison, wechat recently reported 938 million monthly active users). Surprisingly, all of this traffic is supported by Python + Django, which is notoriously slow.
At Python 2017, the Instagram engineers brought a keynote about Python on Instagram and shared a story about how Instagram upgraded the entire project to Python 3.
This article is a summary of the speech.
Python @Instagram
Why Python and Django
Instagram chose Django for a simple reason: The two founders of Instagram (Kevin Systrom and Mike Krieger) were product managers by training. Django was one of the most stable and mature technologies they knew when they wanted to create Instagram.
Today, even though it has more than 3 billion registered users. Instagram is still a heavy user of Python and Django. Hui Ding, an engineer at Instagram, said: “Until we’ve passed the 32bit int limit (about 2 billion), Django itself hasn’t been a bottleneck for us.”
However, in addition to using Django’s native features, Instagram has done a lot of customization with Django:
- Extend Django Models to support Sharding, a database Sharding technology. The Instagram Engneering blog has written a post about this, see Sharding & IDs at Instagram
- To improve Python memory management by manually turning off GC, they also wrote a blog post about this: Dismissing Python Garbage Collection at Instagram
- Deploy the entire system in multiple data centers located in different geographical locations
The advantages of the Python language
Mike Krieger, co-founder of Instagram, said, “Our users don’t care what kind of relational database Instagram uses, and they certainly don’t care what programming language Instagram was developed in.”
So Python, a simple and practical programming language, finally won Instagram’s favor. Using the simple language of Python, they argue, helped shape Instagram’s engineer culture:
- Focus on locating the problem and solving it – not on the colorful features of the tool itself
- Use solutions that are proven in the market – without being bothered by the problems of the tool itself
- Users first: Bring value to users by focusing on new features they can see
But even with all the benefits of using the Python language, it’s still slow, isn’t it?
However, this is not a problem for Instagram, as they say: “The biggest bottleneck on Instagram is development efficiency, not code execution efficiency.”
At Instagram, our bottleneck is development velocity, not pure code execution.
So, the bottom line: You can use Python to implement a product used by billions of users without worrying about performance bottlenecks in the language or framework itself.
How to improve the operating efficiency
However, even if you choose Python and Django, which have many benefits. As Instagram’s number of users grew rapidly, performance issues arose: the growth rate of the number of servers slowly outpaced that of users. How does Instagram deal with this?
They used these techniques to mitigate performance problems:
- Develop tools to help with tuning: Instagram has developed a number of tools at various levels to help them tune performance and find performance bottlenecks.
- Rewrite some components in C/C++ : The most stable and performance-sensitive components, such as access to memcache’s library, are rewritten in C or C++.
- Use Cython: Cython is also one of the things they use to make Python more efficient.
In addition to the above, they are exploring the performance possibilities of asynchronous IO and the new Python Runtime.
Upgrade to Python 3
For quite some time, Instagram has been running on top of the Python 2.7 + Django 1.3 combination. Their engineers also made many, many small patches in an environment that had been lagging behind the community for many years. Are they going to be stuck with this version forever?
So, after a series of discussions, they finally made the big decision to upgrade to Python 3!!
In fact, Instagram has now completed the migration to Python 3 – their entire service has been running on Python 3 for several months. So how did they do it? Here’s the story of how Instagram migrated to Python 3 from Instagram engineer Lisa Guo.
The story of Instagram upgrading to Python 3
Why upgrade to Python 3
For Instagram, the following factors are the main reasons for moving their runtime to Python 3:
1. New feature: Type Annotations
Take a look at this code:
def compose_from_max_id(max_id) :
'''@param str max_id'''
Copy the code
What type is the max_id parameter of the function in the figure? Int? The tuple? Or a list? Wait, it says in the function document that it’s of type STR.
But what if the type of the parameter changes over time? If a careless engineer changes the code and forgets to update the documentation at the same time, the user of the function will end up with a lot of trouble and might as well have had no comments at all.
2. The performance
Instagram’s entire Django Stack runs on top of UWSGI, all using synchronous network IO. This means that the same UWSGI process can only receive and process one request at a time. This makes tuning the number of UwsGI processes that should be running on each machine a chore:
Use more processes to make better use of the CPU? But that would consume a lot of memory. Too few processes cause the CPU to be underutilized.
To this end, they decided to skip the crappy asynchronous IO implementations in Python 2 (poor Gevent, Tornado, Twisted) and upgrade directly to Python 3 to explore the possibilities of the asyncio module in the standard library.
3. The community
Because the Python community has stopped supporting Python 2. By upgrading the entire operating environment to Python 3, Instagram engineers would be closer to the Python community and better able to give their work back to the community.
Determine the migration plan
In Instagram, there are two prerequisites for Python 3 migration:
- No downtime, no service can be available as a result
- It must not affect the development of new product features
However, in the Instagram development environment, it would be very difficult to meet these two requirements to complete a project as large as moving to Python 3.6.
Main branch based development process
Even with Git, which is known for its multi-branching capabilities, all of Instagram’s development work is done primarily on the Master branch, following the development philosophy that “no matter how big a new feature or code refactoring is, it should be broken down into smaller commits.”
Any code that is merged into the Master branch will be released to the online environment within an hour. ** This will happen hundreds of times a day. ** With such frequent releases, it is especially difficult to complete the migration while satisfying the previous two premises.
Deprecated migration scheme
Create a new branch
The first thought that pops into many people’s heads when they’re dealing with this kind of problem is, “Let’s create a branch, and when we’re done, we’ll merge the branches.”
But with Instagram’s high iteration rate, using a separate branch isn’t a good idea:
- Instagram’s Codebase is updated frequently on a daily basis, and keeping the new branch in sync with the existing master branch is costly and error prone while developing the Python 3 branch
- Finally merging the Python 3 branch back to Master, which had changed a lot, was a very high risk
- Only a few engineers on the Python 3 branch work exclusively on the upgrade, and others who want to help with the migration are unable to participate
Replace interfaces one by one
Another option is to replace Instagram’s apis one by one. But Instagram’s different interfaces share many common modules. It is also very difficult to implement.
Micro service
Another option is to transform Instagram into a microservice architecture. The migration is done step by step by rewriting those generic modules as Python 3 microservices.
But this solution requires reorganizing huge amounts of code. Also, when in-process function calls become RPCS, the site-wide latency increases. In addition, more microservices introduce higher deployment complexity.
So, since Instagram’s development philosophy is: ** Take small steps, iterate quickly. ** The solution they finally decided on was to take things one step at a time and eventually make the code on the Master branch compatible with Both Python 2 and Python 3.
Start migration
Since the entire Codebase is intended to be compatible with Both Python 2 and Python 3, the first to do so are the heavily used third-party packages. For third-party packages, Instagram does the following:
- Reject all new packages that are not compatible with Python 3
- Get rid of any packages you no longer use
- Replace packages that are not compatible with Python 3
They use tools to help them in the migration process.
There is a trick to using hyperlink: fix multiple compatibility issues in multiple files at a time, instead of fixing multiple compatibility issues in one file at a time. This will make the Code Review process much simpler, as only one issue needs to be focused on by Reviewer each time.
Use unit tests to aid in migration
For a dynamic language as flexible as Python, there are few good ways to check code for errors other than actually executing it.
As mentioned earlier, all of Instagram’s code submissions that were incorporated into the Master were online within an hour, but this was not without prerequisites. All commits need to pass thousands of unit tests before going live.
So they started adding Python 3 to perform all the unit tests. At first, very few of the unit tests passed in Python 3, but as Instagram’s engineers kept fixing those that failed, all of the unit tests were successfully executed in Python 3.
Limitations of unit testing
However, there are limitations to unit testing:
- Instagram’s unit tests did not achieve 100% code coverage
- Many third-party modules use mocks, which may behave differently from real online services
So, when all the unit tests were fixed, they started running the service online in Python 3.
This process does not happen overnight. First, all Instagram engineers started accessing these new services that were implemented using Python 3, then all Facebook employees, then 0.1%, then 20%, and finally All Instagram users.
Technical issues with the migration process
Instagram ran into a number of problems when moving to Python 3, and here are some of the most typical ones:
Unicode related string issues
One of the biggest changes in Python 3 over Python 2 is the handling of Unicode within the language.
In Python 2, the boundary between text types (i.e., Unicode) and binary types (i.e., STR) is very fuzzy. Many functions can take either text or binary arguments. In Python 3, however, text and binary strings are completely separated.
Thus, the following code that works fine in Python 2 will report an error in Python 3:
mymac = hmac.new('abc')
TypeError: key: expected bytes or bytearray, but got 'str'
Copy the code
The solution is as simple as adding a judgment: if value is a text type, convert it to binary. As follows:
value = 'abc'
if isinstance(value, six.text_type):
value = value.encode(encoding='utf-8')
mymac = hmac.new(value)
Copy the code
However, there are many cases like this in the entire code base. As a developer, if you need to think about every function you call: Should this be encoded in binary or decoded into text? It’s going to be a huge burden.
So Instagram encapsulates helper functions called ensure_str(), ensure_binary(), and ensure_text(), which allow developers to first convert strings of uncertain type.
mymac = hmac.new(ensure_binary('abc'))
Copy the code
Pickling differences between Python versions
Instagram makes heavy use of pickle in its code. You can serialize an object and store it in Memcache. The following code looks like this:
memcache_data = pickle.dumps(data, pickle.HIGHEST_PROTOCOL)
data = pickle.loads(memcache_data)
Copy the code
The problem is that the pickle module in Python 2 is different from that in Python 3.
If the first line of code above happens to be a service run by Python 3 that is serialized and stored in Memcache. When the deserialization process is performed in Python 2, the code runs with the following error:
ValueError: unsupported pickle protocol: 4
Copy the code
This is because in Python 3, pickling.HIGHEST_PROTOCOL has a value of 4, whereas in Python 2, the highest supported version for pickling is 2. So how to solve this problem?
Instagram eventually chose to have Python 2 and Python 3 access memcache using completely different namespaces. This problem is solved by completely separating the data reads from the data reads.
The iterator
In Python 3, many of the built-in functions were modified to return Iterator only:
map(a)filter(a)dict.items()
Copy the code
Iterators have many benefits. The biggest benefit is that using iterators does not require a large amount of memory to be allocated at once, so it is more memory efficient.
But iterators have a natural feature. When you iterate over an iterator and access its contents, you can’t access those contents again. Everything in an iterator can only be accessed once.
In The Python 3 migration on Instagram, the iterator was compromised once because of this feature. Take a look at this code:
CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx]
builds = map(BuildProcess, CYTHON_SOURCES)
while any(not build.done() for build in builds):
pending = [build for build in builds if not build.started()]
<do some work>
Copy the code
The purpose of this code is to compile the Cython source files one by one. When they switched to Python 3, a strange problem occurred: the first file in **CYTHON_SOURCES was skipped forever. ** Why?
It’s all about iterators. In Python 3, instead of returning the entire list, the map() function returns an iterator.
So, after the second line of code generates the builds iterator, the third line of code iterates through the while loop to retrieve the first element. Subsequent pending objects will always lose the first element.
It’s easy to fix with builds by manually converting them to a list:
builds = list(map(BuildProcess, CYTHON_SOURCES))
Copy the code
But this type of bug is very difficult to locate. Users rarely notice if their feeds are permanently missing the most recent item.
Order of dictionaries
Take a look at this code:
>>> testdict = {'a': 1.'b': 2.'c': 3}
>>> json.dumps(testdict)
Copy the code
What does it print out?
# Python2
'{"a": 1, "c": 3, "b": 2}'
# Python 3.5.1 track of
'{"c": 3, "b": 2, "a": 1}' # or
'{"c": 3, "a": 1, "b": 2}'
# Python 3.6
'{"a": 1, "b": 2, "c": 3}'
Copy the code
Json dumps results are completely different with different Python versions. Even in 3.5.1, it returned two different results completely randomly. Instagram has a module that determines if a configuration file has changed, and that’s why it went wrong.
The solution to this problem is to pass sort_keys=True in the call to json.dumps:
>>> json.dumps(testdict, sort_keys=True)
'{"a": 1, "b": 2, "c": 3}'
Copy the code
Performance improvements after the migration to Python 3.6
While Instagram has worked out the odd version differences, there’s one big puzzle that’s plaguing them: performance.
At Instagram, they use two main metrics to measure the performance of their service:
- Number of CPU instructions generated per request (the lower the better)
- Number of requests per second that can be processed (the higher the better)
So, after all the migration was done, they were pleasantly surprised to find that the first performance indicator, the number of CPU instructions per request, dropped by 12%!!
However, the second metric — requests per second — should have improved by nearly 12% as well. But the final change was 0 percent. What’s the problem?
They determined that the performance gains from the reduced number of INSTRUCTIONS were offset by the different memory optimizations for different Python versions. So why are memory optimizations different between Python versions?
This is the code they use to check the UWSGI configuration:
if uwsgi.opt.get('optimize_mem'.None) = ='True':
optimize_mem()
Copy the code
Notice that… . == ‘True’? In Python 3, this condition is never satisfied. The problem is Unicode. The problem was solved by replacing ‘True’ with B ‘True’ in the code (i.e. replacing the text type with binary, which is completely indistinguishable in Python 2).
So, in the end, because a little letter was added'b'
, the overall performance of the program increased by 12%.
conclusion
In February of this year, Instagram’s back-end code was completely switched to Python 3:
After all the code has been migrated to the Python 3 runtime:
- Overall CPU usage savings of 12% (Django/ UWSGI)
- Save 30% memory usage (celery)
Meanwhile, Over the course of the migration, Instagram experienced a huge increase in monthly users from 400 million to 600 million. The product also announced a lot of new features such as comment filtering, live streaming and so on.
So, what were the reasons for their migration to Python 3 in the first place?
- Type annotations: Instagram has added type annotations to 2% of its entire Codebase, and has developed tools to help developers add type annotations
- Asyncio: They use Asynio to do multiple things in parallel in a single interface, resulting in 20-30% lower latency for requests.
- Community: They teamed up with Intel engineers to help them better tune CPU utilization. New tools have also been developed to help them with performance tuning.
What Instagram has taught us
The Instagram video was short but informative, and I had no idea how long the final post would be before I wrote it.
So what can Instagram videos tell us?
- The combination of Python and Django is enough to support a billion users, so if you’re starting a project, feel free to use Python!
- Good unit testing is essential for complex projects. Without those “thousands of unit tests.” It’s hard to imagine Instagram’s migration project going ahead successfully.
- Developers and colleagues are also users of your product. Use them. Use them to test your new features before they are released.
- A development process based entirely on the main branch will give you faster iterations. The premise is to have a good unit testing and continuous deployment process.
- Python 3 is the way to go, so if you’re ready to start a new project, don’t hesitate to embrace Python 3!
All right, that’s it. Happy Hacking!
Reprint: This article is reprinted from Excerpt of Instagram’s PyCon 2017 talk by Piglei.