Judging from the announcement on 10 November 2020, users have high expectations for the new Apple M1 device. Users and developers are wondering if it’s possible to implement a native Dolphin build with apple’s powerful silicon processor. Now we have the answer.
Apple’s M1 hardware is very powerful and does a great job running Dolphin. The news has been brewing for a while, and sharp-eyed users may have noticed earlier this month that the macOS version is now designated “Intel.” That’s because Delroth and Skyler built a new build bot that uses a service called MacStadium to create generic macOS binaries. These builders are available immediately and support both macOS M1 and Intel macOS devices.
Handle macOS issues on ARM
It’s easy to say that Apple has dropped a bomb on the PC industry with the M1 ARM processor. ARM is a compact Instruction set computing (RISC) architecture designed specifically for the efficiency of portable devices. Using a compact instruction set, rather than the ever-expanding mess of x86, ARM was able to use fewer processors to perform optimized tasks, making it superior power efficiency. However, ARM processors require many more cycles to execute than X86 cpus, given their unoptimized workloads. Taken together, ARMS are the processors of choice for battery life in portable devices, but their overall performance is poor compared to Intel’s x86 processors. It’s a processor for casual things like phones, not really for “real work.” But that’s the past.
Intel’s iron-fisted grip on technological superiority has long since slipped, and the ARM instruction set has been carefully expanded to handle more tasks more efficiently without sacrificing power efficiency. However, even as ARM reaches the data center, and even as some interesting hardware gives us a glimpse of what’s possible, ARM’s reputation for being weaker than x86 remains entrenched.
But with the M1, Apple has completely dispelled that silly notion. Not only can the M1 perform the same tasks as their previous Intel processors, it can even do them faster when using their Rosetta 2 translation layer, all while still providing much better single-threaded performance than Intel. Let’s just say they’ve got our attention.
It was tested. Using Rosetta 2 translation layer and Dolphin’s x86-64 JIT, the M1 easily ran most games at full speed and easily outperformed comparable Intel Macs. The experience was not entirely smooth due to JIT jitter, but the processor proved itself capable of handling Dolphin. But the fact that it has to be done through the translation layer is a huge performance bottleneck. Why not just use Dolphin’s AArch64 JIT for native support? So the race began, with some people trying to figure out the obstacles that would get the Dolphin’s AArch64 JIT running on the M1.
Unfortunately, getting the AArch64 JIT to run was not an easy task. Apple requires W^X (Write Xor Execute) to conform to native macOS M1 applications. Its purpose is to make memory areas explicitly marked as write or execute, but not both. Because this is easier, and not banned on previous platforms supported by Dolphin, the emulator previously simply marked the memory areas used by the JIT as Write and Execute. Apple’s requirement is aimed primarily at preventing bugs in programs that read untrusted data from being exploited to run malware’s security features. Outside of the emulator, the main place you actually see self-modifying code is in the Web browser, which is often a vehicle for attacking computers.
Thankfully, this is much less strict than it is on iOS devices, because iOS strictly prohibits memory mapping to executables, which prevents iOS from being officially supported by us. Apple has even provided documentation to help developers port JIT to macOS on ARM. Skyler uses a method described in the documentation that changes mapped memory from Writeable to Executable when the code is launched. As Dolphin was not designed for this, there were a few hiccups along the way, but in the end everything worked out under the new restrictions.
Once that was resolved, the focus shifted to maintainability and building the infrastructure. Aside from getting it to work correctly, this is by far the most difficult challenge to officially support the M1. Dolphin’s infrastructure is complex and sensitive to change. Moving macOS builds to a general-purpose binary (x86-64 and AArch64 combined) while obtaining the hardware needed to build macOS general-purpose binaries is a challenge and could prove costly. Finally, MacStadium offers free M1 hardware, making action very cheap, so it was able to focus on enabling Dolphin’s BuildBot infrastructure to handle new builds.
Test M1 hardware
Now that it’s up and running, you might be wondering how it works. There are a few things to keep in mind. Dolphin’s AArch64 JIT is not as mature as the x86-64 JIT. While things aren’t as bad as they were a few years ago, thanks to JosJuice’s efforts, the compatibility should be roughly the same, it’s still incomplete between the two jits.
One difference is instruction coverage. Any PowerPC instructions not included in the JIT must be bounced back into the interpreter at a significant performance cost. At this point, most common instructions are overridden by two JIts. However, one important feature is missing from the AArch64 JIT: memory checking. Thankfully, this only affects full MMU games such as Star Wars Thieves Squadron II, III, and Spider-Man 2. Some nice things are also missing from the AArch64 JIT, such as the repeated use of the JitCache space to prevent spurious JitCache flusher.
Even with the lack of memory checking in the AArch64 JIT, Inception 2 performed admirably
The AArch64 does have its advantages, though. That is, these processors have 31 registers, compared to 16 for x86-64 processors. An emulated PowerPC processor has 32 registers, and while it’s rare to use all of them in a block of code, it’s always good to have more registers. Another difference is that the AArch64 and PowerPC have three operand instructions, while the x86-64 has only two.
PPC: A = B + CAArch64: A = B + Cx86-64: A = B, A = A + C
Copy the code
As you can see, it makes emulating some instructions much cleaner and easier than on an x86-64 JIT. All right, skip the boring details. So how does the M1 hardware stack up against some of the beasts in the GameCube and Wii libraries?
There’s no denying that the macOS M1 hardware absolutely beats the two-and-a-half-year-old Intel MacBook Pro, which costs more than three times as much, while staying within ARM’s range of powerful desktop computers. I made a second diagram to show it.
Its efficiency is almost breathtaking. It uses less than a tenth of the power and provides 65% of the performance of an absolute freakish desktop computer. And the poor Intel MacBook Pro is nothing compared to that.
Took things one step further
After strenuous performance testing of the macOS M1 and its Apple silicon, it’s clear that it’s strong. The problem is, if you give developers a new toy, they end up deciding to push things further and further. This is the first time to see Dolphin’s AArch64 JIT really stretch its legs on something other than a phone or tablet, rather than with an ultra-aggressive governor, which is also limited to graphics drive. What’s the worst idea that could come up with this newfound power? Netplay.
This is a real test to see if AArch64 JIts and x86-64 JIts are really equal. The test team wasn’t able to accurately test this before because the Android GUI lacked NetPlay support, but macOS ran the desktop version without any compromise. This includes having full NetPlay support. The chances of doing this job are almost nil, but there’s no reason to stop and think about whether we should do it — technology already allows us to do it.
Sometimes tests produce unexpected results!
And, it actually works! It’s just that with limited testing, the testing team isn’t sure how good it is. So far, every game they’ve tested on NetPlay has synced, although Dolphin’s unsync checker gives false positives. Testers tried everything from Super Smash Bros. Melee and Mario Party 5 to watching The Legend of Zelda: The Wind Traveler. All meetings are synchronized.
This may not be true of all games. Until earlier this month, when people like Mario Kart: Double Dash!! Games like F-Zero GX and Mario Kart Wii immediately lose synchronicity due to physics differences. Thanks to the work of JosJuice, those rounding errors in the AArch64 JIT and interpreter have now been fixed, which means these games should at least have a chance to be synchronized in online games.
With limited libraries, there is no good idea of which games work and which have problems. As a stress test, Techjar and Skyler played Super Mario Sunshine. The physical calculations are extremely sensitive to CPU rounding errors and provide a severe test for both JIts. They also enabled the 60FPS hack to make things even more interesting.
Not only does the game sync, but the Macbook Air M1 also handles Super Mario Sunshine’s 60 FPS hacks.
Everyone who knows Dolphin JITs agrees that cross-JIT online gaming is impossible, at least without a lot of dedicated fixes. But after first-hand experience, we believe it will only get better, as it is now possible to monitor and test JIT determinism in online games. While you might be excited to jump right in, be aware that at this stage only a few games are being tested and it is not known what compatibility will be unleashed in the wider library of games.
Note: We know that there were Windows and Linux AArch64 devices before the M1. There is no appeal in testing NetPlay on these devices because they don’t run Dolphin properly. We really didn’t think it would work, or we might have tried it earlier.
In the conclusion
The M1 hardware is great, and higher tiers are on the way, promising better performance. But what we have now is efficient, powerful, and provides us with a mainstream AArch64 device that is not an Android system and takes full advantage of the potential of the AArch64 JIT. The only downside is the proprietary graphics API that exists in macOS, which prevents us from using the latest version of OpenGL and forces us to use MoltenVK to take advantage of Vulkan. It’s a very small price to pay to see some really cool hardware and redefine what ARM processors can do. There’s no denying that people are excited about the next generation of AArch64 hardware and want to see how far it can go.
Original link: dolphin-emu.org/blog/2021/0…