Original address: medium.com/flutter/imp…

The author is medium.com/gaaclarke

Published: June 15, 2021-9 minutes to read

For the past few years, I have been interested in the question “how can we make the communication between Flutter and its host platform faster and easier”. This is a question of particular interest to Flutter developers and add-on app developers.

Communication between the Flutter and the host platform is usually done through the platform channel, so my efforts have been focused here. In late 2019, to remedy the massive templates and strict typed code required to use platform channels, I designed a CodeGen package, Pigeon, to make platform channel type-safe, and the team continued to improve it. In the spring of 2020, I conducted a performance audit of platform channels and foreign Function interfaces (FFI). Now, I’ve set my sights on improving the performance of platform channels. Since Pigeon was built on top of the platform channel, and I plan to build a data synchronization solution on Top of Pigeon for multiple Flutter instances, this is a great opportunity to help meet many different needs of developers and is also my initiative.

After some investigation, I was able to identify redundant copies of data being sent over the platform channel and was able to delete them. Below you’ll see the results of this change, along with an overview of the work that led to the identification and removal of these copies.

The results of

When 1MB of binary data was sent from Flutter to the host platform and 1MB of response was received, we saw an approximately 42% improvement in performance on iOS after removing the redundant copies. On Android, the results are more nuanced. When migrating to the new BinaryCodec.instance_direct codec, our automated performance tests improved by about 15%, while our native tests saw an increase of about 52%. The difference could be because automated performance tests are run on an older device, but the difference could also be micro tests, especially on how artifacts are performed on an older device (for example, hammering garbage collectors). You can find the source code for automated performance tests at platform_channels_benchmarks/lib/main.dart.

For platform channels using StandardMessageCodec, I saw less performance improvement (about 5% at 14k payloads). I tested with a large number of supported types, stress-testing encoding and decoding. I found that the encoding and decoding time of MessageCodecs paled in comparison to the time it took to copy messages between platforms. Much of the coding time is due to the cost of traversing data structures and using reflection to figure out their contents.

So, your mileage may vary, depending on how you use the platform channel and your device. If you want the fastest communication with platform channels, you should use BasicMessageChannels with FlutterBinaryCodec on iOS, BinaryCodec.INSTANCE_DIRECT on Android, And develop your own protocol for encoding and decoding messages, independent of reflection. Implementing a new MessageCodec might be cleaner).

If you want to play with the new faster platform channels, they are now available on the main channel.

Copy deletion details

If you are not interested in learning more about how I achieved these results and the problems I had to overcome, stop reading now. If you like these details, read on.

The API for platform channels has not changed much since 2017. Because platform channels are the basis on which engines and plug-ins run, they are not easily changed. While I have a general idea of how platform channels work, they are somewhat convoluted. So the first step to improving their performance is to understand what they actually do.

The following diagram summarizes the original flow that the framework follows when communicating with iOS from Flutter using platform channels.

We can get some inspiration from the picture.

  • Messages jump from the UI thread to the platform thread and back to the UI thread. (In Flutter engine parlances, the UI thread is where Dart is executed and the platform thread is the main thread of the main platform.)
  • Messages and their responses use C++ as an intermediate layer of communication between Flutter and the target language of the host platform.
  • The message’s information is copied four times (steps 3, 5, 7, 8) before reaching the Objective-C (obj-c) handler. Steps 3 and 8 translate, while steps 5 and 8 copy, transferring ownership of the data to a new memory layout. The same process is repeated in reverse for the reply.
  • Steps 1, 9, and 16 are code written by developers using Flutter.

Sending messages from Flutter to Java/Kotlin is similar, except that there is a Java Native Interface (JNI) layer between C++ and the Java virtual machine (JVM).

After determining how platform channels work, it became clear that eliminating copying when transferring data between these layers (for example, from C++ to OBJ-C) was an obvious way to improve performance. To achieve this, the Flutter engine must place data in memory in a way that can be accessed directly from Java/ OBJ-C and has memory management semantics compatible with the host platform.

Platform channel messages are eventually consumed by MessageCodec’s decodeMessage method on the host platform. On Android, this means a ByteBuffer, and on iOS, it means NSData. The data in C++ needs to conform to these interfaces. As I approached the problem, I found that the message information resides in C++ memory as a STD ::vector, in a PlatformMessage object, maintained by a shared pointer. This means that developers cannot safely delete copies of data when they send it from C++ to the host platform because they cannot guarantee that the data will not be mutated by C++ after it is handed over to the host platform. Also, I had to be careful because BinaryCodec’s implementation treated encodeMessage and decodeMessage as useless, which could cause code using BinaryCodec to unknowingly receive ByteBuffer directly. While it is unlikely that anyone will be surprised by the changes to MessageCodec, few people implement their own codecs. The use of BinaryCodecs, on the other hand, is quite common.

After reading through the code, I realized that while PlatformMessage is managed by a shared pointer, semantically it is a unique pointer. The intention is that only one client can access it at a time (this is not entirely true, since multiple copies exist when PlatformMessage is passed between threads, but this is just for convenience, not really intended). This means we can move from a shared pointer to a unique pointer, allowing us to safely pass data to the host platform.

After migrating to unique Pointers, I had to find a way to pass ownership of information from C++ to OBJ-C. (I implemented obJ-c first, and I’ll discuss Java in more detail later.) The information is stored in the STD ::vector, which has no way of releasing ownership of the underlying buffers. Your only options are to copy out the data, provide an adapter with STD :: Vector, or unuse STD :: Vector.

My first attempt was to subclass NSData, eliminating copying by moving STD ::vector and reading its data from there. This attempt didn’t go well, because it turns out that NSData is a class in Foundation. That means you can’t just subclass NSData. After reading a lot of Documentation from Apple, their advice seems to be to use composition and message forwarding to make an object behave like an NSData. This will trick those who use proxy objects, except those who call -[NSObject isKindOfClass:]. It’s unlikely, but I can’t rule it out. While I thought there might be some tinkering when OBJ-C was running to make objects behave the way I wanted them to, it’s getting more and more complicated. I chose to move memory out of the STD ::vector and into our own buffer class, allowing for the release of ownership of the data. So, I can use – [NSData dataWithBytesNoCopy: length:] to transfer the ownership of the data to Obj – C.

Replicating this process on Android proved to be a bit difficult. On Android, the platform channel conforms to ByteBuffer, which has the concept of direct ByteBuffers, allowing Java code to dock directly with memory laid out in C/C++ style. In a short time, I achieved a shift to direct ByteBuffers, but I didn’t see the improvements I expected. I spent a lot of time learning about Android profiling tools, and when those tools failed or returned something I couldn’t believe, I ended up choosing trace statements. It turns out that the response to platform channel messages scheduled from platform threads on the UI thread is very slow, and the slowness seems to vary with the message payload. To make a long story short, I used the incorrect compile flag when compiling the Dart VIRTUAL machine, in my view — no optimization means no link-time optimization, but the flag is actually used for runtime optimization.

In the excitement of discovering my error, I forgot the consequences of using ByteBuffer directly when sending data to the Flutter client code, especially by customizing MessageCodecs or BinaryCodec clients. Sending direct ByteBuffer means you have a Java object that communicates with C/C++ memory, so if you remove C/C++ memory, then Java will interact with random garbage and may crash due to an operating system access violation.

Following the iOS example, I tried to pass ownership of C/C++ memory to Java so that it would delete C/C++ memory when Java objects were garbage collected. It turns out that this is not possible when direct ByteBuffer is created from JNI via NewDirectByteBuffer. JNI does not provide any hooks to know when a Java object has been deleted. You cannot subclass ByteBuffer to call JNI when it is finalized. The only hope is to allocate the direct ByteBuffer from the Java API in step 5 of the previous diagram. Direct ByteBuffers allocated through Java do not have this limitation. However, introducing a new entry point into Java would be a big change, and anyone who has worked with JNI knows that it is dangerous.

Instead, I chose to petition the team to accept ByteBuffers directly in calls to decode messages. At first, I give MessageCodec introduces a new method, bool wantsDirectByteBufferForDecoding (), to ensure that no one get direct ByteBuffer, unless they request and know their semantic (that is to say, While the underlying C/C++ memory is still valid). This turns out to be complex, and the worry is that developers may still subscribe but not know the semantics of direct ByteBuffers because they operate opposite to typical ByteBuffers and may have removed the C memory support underneath them. The storage code buffer was an atypical use over impossible use, but the team could not rule it out. After much discussion and negotiation, it was decided that each MessageCodec would get a direct ByteBuffer that would be cleared after DecodeMessage was called. This way, if someone caches encoded messages, they will get a certain, aptly correct error in Java if they try to use ByteBuffer after the C underlying memory has been cleaned up.

Giving everyone a performance boost to direct ByteBuffers is fine, but it is a disruptive change to BinaryCodec, whose encodeMessage and decodeMessage implementations are useless and simply take their input as their return value. To keep the memory semantics of BinaryCodec intact, I introduced a new instance variable that controls whether the decoded message is a direct ByteBuffer (the new, faster code) or a standard ByteBuffer (the old, slower code). We couldn’t create a way for all BinaryCodec clients to get a performance boost.

Future jobs

Now that the copy elimination is complete, my next task is to improve the communication between the Flutter and the host platform.

  1. Implement a custom MessageCodec for Pigeon that does not rely on reflection for faster encoding and decoding.
  2. Implement FFI platform channels that allow you to make calls from Dart to the host platform without jumping between UI and platform threads.

I hope you enjoyed this in-depth look at the details of performance improvements!


www.deepl.com translation