• Firefox Stability on Linux — Mozilla Hacks — The Web Developer Blog
  • By Gabriele Svelto
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: No problem
  • Proofread: Kimhooo PingHGao

Improved Firefox stability on Linux

About a year ago at Mozilla, we started working on improving the stability of Firefox on Linux. This effort quickly turned into a case study of good collaboration between Free and Open Source Software (FOSS) projects.

Every time Firefox crashes, users can send us a bug report, which we analyze and hopefully fix.

This report contains, among other things, a small dump file: a small snapshot of the process that was generated when it crashed. This contains the contents of the process registers, as well as data from each thread’s stack.

It usually goes something like this:

If you’re familiar with core dumps, small dumps are essentially a scaled-down version of them. The format for small dumps was originally designed by Microsoft, and there is a native way to write small dumps in Windows. In Linux, we use Breakpad to do this. Breakpad originated at Google for their own software (Picasa, Google Earth, etc.). But we forked the project, made extensive changes to meet our goals and recently rewrote parts of it with Rust.

Once the user submits a crash report, we have a server-side component, Socorro, to process the report and extract the stack trace from the small dump file. The report is then clustered according to the top-level method name of the stack trace of the crashed thread. When a new crash report is discovered, we classify it as a bug and work on fixing it. Here’s an example of how crashes are grouped:

Extracting a meaningful stack trace from a small dump requires two more things: the unwrap and the symbol. The expansion information is a set of instructions describing how to find various frames on the stack based on a given instruction pointer. The symbol information contains the name of the function corresponding to the given address range, as well as the line number corresponding to its source file and the given instruction.

In the regular Firefox release, we extract this information from the build file and store it in a symbol file in Breakpad standard format. With this information, Socorro can create a human-readable stack trace. Here’s the entire flow chart:

An example of a correct stack trace:

If Socorro cannot use the correct symbol file to extract a crash stack trace, then the result is only the address, which is not very helpful:

When it comes to Linux, the situation is different: most of our users don’t install our build files, they install the version of Firefox packaged with their preferred distribution.

This leads to a serious problem when dealing with Firefox stability issues on Linux: For a large number of crash reports, we were unable to generate a high-quality stack trace because the Firefox build that submitted the report was not created by us, so we lacked the required symbol files. To make matters worse, Firefox relies on a number of third-party packages (such as GTK, Mesa, FFmpeg, SQLite, and so on). If a crash happens not in Firefox but in any of the third-party packages, we can’t get the correct stack trace because we don’t have their symbol files.

To address this problem, we started collecting debugging information for Firefox builds, as well as for dependencies from the package repositories of multiple distributions: Arch, Debian, Fedora, OpenSUSE and Ubuntu. Since each distribution is a little different, we have to write distance-specific scripts that will look through the package list in their repository to find relevant debugging information. The script is provided here. This data is then injected into a tool that extracts the symbol files from the debugging information and uploads them to our symbol server.

With that useful debugging information, we were able to analyze more than 99% of crash reports from Linux users, which would otherwise be less than 20%. Here is an example of a high-quality trace extracted from a Firefox distribution. We haven’t created any related libraries yet, but the function names and the files and line numbers of the affected code all exist.

One important point here should not be underestimated: Linux users are mostly more technically insightful and more likely to help us solve problems, so all these reports are a treasure troe of stability improvements, even for other operating systems (Windows, Mac, Android, etc.). In fact, we often find Fission bugs first in Linux.

The first impact of this new ability to detect Linux crashes is to greatly speed up our response time to Linux-specific issues, and to enable us to recognize problems in Firefox Nightly and Beta releases before they are encountered by users in the official release.

We can also quickly identify problems in leading edge components like WebRender, WebGPU, Wayland and VA-API video acceleration; Solutions are usually provided within a few days of the problem caused by the change.

We didn’t stop there: we can now identify release-specific issues and go back. This allows us to notify the package maintainer and allow the problem to be resolved quickly. For example, we were able to identify and immediately solve Debian specific problems within two weeks. This is due to a change Debian has made to one of Firefox’s dependencies, which causes a crash at startup. If you’re curious about the details, check out the Bug 1679430 archive.

Another good example comes from Fedora: In the Firefox build, they had been using their own crash reporting system (ABRT) to catch crashes, but given the improvements on our side they started sending us Firefox crashes.

We can also identify degradation and problems in our dependencies. This allows us to communicate issues upstream, and sometimes even contribute fixes, to the benefit of both our users.

For example, at some point Debian updated the font configuration package by backporting an upstream fix for a memory leak. However, this fix includes a bug that causes Firefox to crash and possibly other software as well. It took only six days after this change was applied to the Debian source code to discover the new crash problem, and within a few weeks the problem was fixed upstream and in Debian. We also sent reports and fixes to other projects: Mesa, GTK, Glib, PCSC, SQLite, etc.

The Nightly version of Firefox also includes a tool for detecting security-sensitive issues: the Probabilizable heap checker. This tool randomly fills in some memory allocations to detect buffer overflows and paths to use after release, and when one is detected, it sends us a very detailed crash report. Given that Firefox has a large user base that uses Linux, this allows us to find and report complex issues in upstream projects.

We used the tool for crash analysis, which exposed some of the limitations of the tool, so we decided to use Rust rewrite, relying heavily on the excellent Crates developed by Sentry. Compared to the original tool, the new tool is much faster, uses less memory, and produces more accurate results. This is mutually beneficial: we contribute improvements to their crates (and dependencies), and they extend their apis to handle our new use cases and fix the problems we find.

Another pleasant side effect of this work is that Thunderbird now benefits from the improvements we’ve made to Firefox.

This continues to show that collaboration between FOSS projects not only benefits their users, but ultimately enhances the entire ecosystem and the wider community that depends on it. Special thanks to Calixte Denizet, Nicholas Nethercote, Jan Auer and others for their contributions!

If you find any errors in the translation or other areas that need improvement, you are welcome to revise and PR the translation in the Gold Translation program, and you can also get corresponding bonus points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.