I’ve been working on the Internet of Things for the last year. I took over a Wifi project at first, and then took charge of a BLE project from scratch. Here’s a simple DemoGithub – BLE Demo for your reference. When running, you can search nearby Bluetooth devices and display some basic information obtained. Click the link to view various services and fields. In the actual development, did encounter many problems. In this article, I would like to summarize the experience of previous Wifi projects.

The mesh network used in the BLE project is a mesh self-organizing network. The mobile terminal tries to scan the surrounding devices and connect to any Bluetooth device in the network. The directly connected Bluetooth device is like a route. Then the communication between the mobile terminal and every device in the network needs to be through this directly connected device.

Our daily application development is to stand at the level of application layer, call system API or use network library for communication in iOS applications, which have been highly encapsulated, without too much worry about the underlying transmission problems. The application can gather enough information from the callback to handle and deal with the current network situation. The mesh network is similar to the network layer in the OSI model. Without the support of the upper-layer protocol, the transmission of the mesh network is unreliable. The amount of data transferred is also very limited. Frequent data transmission (for example, obtaining some large-volume information from the device requires multiple consecutive transmission) may cause network storms in mesh networks. All of these issues need to be dealt with in our application.

In THE BLE project, the entire project is divided into three layers. The bottom layer is the network layer, which is responsible for communicating with the mesh network. On top is our business logic code; The Model layer is responsible for data processing and forwarding. At first, I did not think about designing the middle layer. I just wanted the network layer to directly interact with the business logic code through callbacks after sending and receiving messages. However, as the number of functions increased, problems began to occur.

  • The network layer is too heavy. First, the network layer needs to communicate with the Bluetooth device. In the various callback methods for establishing connection and communication with bluetooth devices, we need to perform check device, automatic connection, device verification, and so on. After discovering a device, the network layer needs to query device information for local cache. In the process of communication, the network layer also needs to process data for distribution and act as a route. From top to bottom, each type of command sent by the upper layer requires the network layer to concatenate and send the command. The network layer gradually becomes very bloated, and too many tasks make the original communication function of the network layer appear unclear.

  • When we receive the data returned to us by the device, we pass it back to the business logic code for processing. The network layer does not know what the business logic layer does and can only throw data directly to the upper layer each time. In the project, there are screens that display a list of devices and refresh their status instantly based on the information we receive. Normally there is no problem, but in some batch operations or certain situations, we receive a lot of device status information. At this time, the front-end tableView will keep calling the refresh operation, causing the interface to lag. In fact, a lot of redundant information does not need to be refreshed, we can try to filter and do some buffer operations to avoid the impact of wave peaks on our project.

  • The code was difficult to reuse in the BLE project because we needed to show the customer an operational Bluetooth device in the current environment. This means only local and not remote is supported, so we don’t need to cache devices found by the runtime ourselves. (In fact, we do cache. We cache the MAC address and device type of the device to the local. When we search for the device, we first match the local to avoid some unnecessary request operations.) The lifecycle of the device model is consistent with the lifecycle of the App. The list of Bluetooth devices using the current environment is needed in many places in the project, and we need to process and process the device information returned by the network layer at each place. As the project progressed, there was more and more duplicate code, and it was difficult to keep the list of Bluetooth devices consistent from place to place.

To solve these problems and clarify the tasks of each layer, I tried to add a middleware layer between the network layer and the business logic layer. For the network layer, it is the object that handles the business logic on the upper layer, and for the business logic code, it is the manager of the data communication on the bottom layer. The network layer only cares about the authentication and communication between bluetooth devices, and the business logic layer only cares about the process processing of events. We have stripped out the processing and forwarding of network layer data and the device model management of business logic layer to the middle layer for processing.

At the end of the layer, consider the communication between each layer. From the top down:

The network layer exposes the command sending interface to the Model layer and only receives the concatenated command without caring about the content of the command. Specific commands are spliced by Model layer. The operation commands of a single device are bound to Model objects as instance methods. The operations of all devices in the mesh network are bound to the management class of Model layer as class methods. The business logic layer calls the methods of the Model layer as needed.

From bottom to top, the network layer receives messages from the mesh network and throws them up through callbacks. In iOS, there are three types of callbacks: block, delegate, and notification. In this project, a delegate is used between the network layer and the Model layer, and notification is used between the Model layer and the business logic layer.

Blocks are my favorite and most common way to handle callbacks in daily development, because they are so light and simple to embed in logical code and easy to read. It is very simple to access the context. As an anonymous function, independent lexical scopes can be saved on-site for easy processing. Unfortunately, in this case, it doesn’t fit. We’ve talked a lot about the advantages of blocks, but the disadvantages are pretty obvious. Because blocks are so light, slightly more complex cases can be overwhelming. There are so many interaction scenarios between the business logic layer and the Model layer that setting a block for each scenario would be trivial and unmanageable. The callback between the Model layer and the network layer is relatively easy to handle, which is mainly to dump the simple processing of the received message onto the Model layer, but we should take into account that the block is very easy to cause a retain cycle. In addition to the very common handling of strong/weak self, one thing to note is that this block is held and not released because we need it for a long time. Once a memory leak is created, it is very difficult to detect. We can take a page from Apple’s playbook. There are lots of instances of blocks and delegates in Cocoa, such as the animation class method of a simple UIView, the completion of some interface jumps that uses blocks, and the Delegate of a UITableView. In contrast, temporary one-off things, or things that are held for a long time but have a definite completion date, we use blocks; But if we need to call it frequently, we’re better off using a delegate.

In the remaining two callbacks, the use of notifications is much more arbitrary than a delegate; you can send and receive notifications anywhere. You also have to manually unregister after the registration notification runs out. But notification has one advantage over Delegate in that it has a broadcast-like feature of one-to-many.

In fact, blocks can also implement one-to-many, but it’s a bit of a hassle. In SDWebImage, a dictionary is used to save the array of blocks according to the identifiers generated by the URL. After executing a task, the array is extracted according to the identifiers and the blocks are executed successively.

Back in the project, there is a one-to-one relationship between the network layer and the Model layer. Great for delegate play. We use Protocol to define each part of the communication process, which is very methodical. It is also important that the data processed between the network layer and the Model layer is of basic type, whereas Notification can only pass ObjectType. However, between the Model layer and the business logic layer, the change of a Model parameter often requires multiple responses, so nitofication is more appropriate.

Of course, some of the shortcomings of Notification itself can be remedied by standardizing the code. In this project:

  • All NotificationNames are macro defined in one file.
  • A notificationName is defined as two sets of macros for GET and POST, and is named according to the rules for GET/POST + event names.
  • The name of the response function for the same event remains the same. Follow the rule of get+ event name + Notification

This specification makes it much easier to manage notification related content.


In BLE’s case, the app is in close communication with bluetooth devices. In addition to the request from the APP to the Bluetooth device, the Bluetooth device itself will also actively push the message to the APP when the status changes. In fact, the network layer only undertakes the simple sending and receiving function, as for the specific content, I do not care about and do not know. All this is left to the Model layer to take care of.

What’s the problem with the Wifi project I took over before? In the Wifi project, all the messages received and sent were sent through notification. In the notification message, it simply distinguished which protocol received the message, but did not care about the type of message, and all the messages were broadcast. After receiving the message, I will judge whether it is what I need according to the content of the message (dictionary).

The problem is big:

  • There are many messages in communication and many types of messages. Many times an interface only cares about a particular message type, but is forced to process every message that comes back. This is not just a simple judgment, you need to take out the contents of the dictionary and compare, every key piece of data needs to be legally determined. The logic for determining the message type if it changes you need to change it in each notification.

  • These messages cannot be filtered. For example, the device periodically reports its status. Suppose that the device status is on, and the status reported by the latest device is still on. We can actually process this message directly, notifying our business logic only when the device state changes

  • Logical decisions that may affect future services. The whole project is really about adding new business requirements. Some of the operations in the callback can be triggered by the addition of new business without the developer knowing, causing some bizarre problems. Such as reconnection, login name and password are switched.

In addition to these points BLE project and a more troublesome, mesh network between each equipment will communicate on a regular basis to confirm equipment online, but if the external is requesting data will affect the internal communication, over a certain time will determine whether a rolled off the production line equipment, but offline will be refreshed back immediately. When reflected in the APP, the button that displays the switch of the device will blink. We want to be able to add judgments in the application to take care of this.

After receiving the message, the network layer passes the data to the Model layer through callbacks. The Model layer first distinguishes the task type of the message based on the flag bits in the message. The message is then cleaned. In this step, all invalid messages are discarded, and the valid information is parsed into the model ** (the information is an array of char types) ** to ensure that the data is correct before being distributed via notification. The Model layer adds a delay where the message is received, similar to the ACK pigride in TCP. After receiving some status messages, we will delay for a period of time to wait for other status information to be reported. After confirming that there is no other status information, we will perform the next operation. If there is a new message, we will reset the delay time. If the devices reporting the status message are the same, the last status message is saved. If the devices reporting the status message are different, the models of these devices are packaged until the delay ends and thrown to the service logic layer for processing. In this way, the problems caused by poor communication in the mesh network can be effectively avoided and the pressure on the FRONT-END UI can be reduced.

If the status information of multiple devices is reported at the same time, the table can be refreshed at one time instead of calling the operation of single row refresh several times in a short time. Similar to wechat receiving messages, if a single message will be a prompt, but if there are multiple messages at the same time, such as brush facial expression will prompt ‘XX’ unread messages.

Legitimate messages are processed into models for distribution, and the Model layer does some processing in this area. First, we break down the return types into several categories. Notifications are distributed with different NotificationNames. The distinguishing criteria here are the interfaces and service hierarchies involved by the various models in the project. The service hierarchy may not be accurately described. As a simple analogy, in Python the logger module has a level parameter, and you can set different levels to distinguish between the log level, the importance level. Also in BLE projects we care about different messages differently. Basic messages such as switches are frequent and simple; Query messages such as device types are simple and related query requirements are basically used in a centralized manner. The query information of the timer is large in volume and the service is important. We hierarchically distribute messages based on their characteristics, but not too much. The purpose of this is to make each Notification as independent as possible and not disturb the irrelevant parts, but not too scattered so that too much notification is difficult to manage, some places may need to register multiple notifications to receive the required messages. After the notification is hierarchical, a mask of type int is added to each notification to specify the type passed by each notification. Each one identifies a message type, and the place where the Notification is registered can quickly determine what the model in each notification represents through bit operations with the mask.


There are multiple channels in a mesh network. To ensure that basic messages such as switches can be transmitted in real time, resources for other services are limited. In the application, if we exchange a large amount of data instantaneously, it will cause very serious packet loss and network instability. In cases where hardware conditions cannot be improved, we must do relevant processing in the application to improve the situation. And involves some key message transmission, the support of reliability must be realized by our application.

Messages sent in a BLE project fall into two categories. One is to send messages that do not care about the result and have no callback, such as switches. The other type of query commands are those that need to obtain information from BLE devices, such as obtaining alarm clocks and querying BLE device information. The basic commands about devices are sent directly, and the device and message type are specified by the Model layer package command and then sent to the network layer. Since mesh has already reserved enough resources for these services, we can just send them without any delay or anything like that. There is no need to worry about the group switch either, because unlike normal projects, the BLE group function is to set the address bit to 0xFFFF and the rest is consistent with a single switch message. Messages are transmitted in the mesh network, and the device only needs to match the address to determine whether it responds within the group, instead of increasing the message volume one by one. The latter type of query command is much more complicated. Because the reserved resources of the mesh network are not large and require a large amount of data interaction, we also require the BLE device to return messages to us. At the beginning of this article, it was mentioned that mesh network is similar to network layer and its corresponding protocol is IP protocol. The TCP/IP protocol cluster and the reliability support of HTTP based on it are all realized by TCP protocol, the upper layer protocol of IP protocol. How are queries that require explicit results in BLE projects handled?

Let’s start with a brief list of the actual problems.

  • The packet loss rate is high and data transmission is unreliable. To put it simply, the mesh network cannot bear the burden of excessive communication pressure. Generally speaking, if the BLE device is not directly connected, 10% to 30% of the packets will be lost, or even the BLE device will break down and the mesh network cannot communicate for a short time

  • If the amount of data is too large to send all packets, the mobile terminal needs to request BLE devices to send ** (similar to sharding) ** one by one. Our devices need to handle the logic of sending and receiving to ensure that the data is correct.

  • BLE devices have multiple versions, varying for some message type protocols.

One thing that needs to be clarified before we can solve the problem is what reliability is. Each query result has a clear reply, which can be the correct return from BLE device or the status code we get when an error occurs. Similar to the so-called “live to see the dead to see the dead”, we must know the result of every command sent. Compared with real networks, mesh networks are much simpler. In an HTTP request, the status code of the response can provide the result of the request to the client. There are many types of Status code, covering almost all possible situations in the network, which are very many and complicated. However, in BLE project, the BLE device and mesh network will not notify the mobile terminal when problems occur, such as timeout, insufficient permission being rejected, and large packets. The only situations we might encounter are when a message is sent successfully and a return message is received correctly and when a timeout is not returned.

In fact, there is a possibility that we send multiple messages and return multiple results at the same time, resulting in receiving errors. We will mention later that this does not happen, because we have to use serial queues to send messages in sequence, considering the mesh’s capacity and the fact that the device itself does not support us to query data at specific points in a set. The Negla algorithm is similar to that in TCP protocol.

To avoid packet loss, we should avoid transferring large amounts of data in a short period of time. However, if the requirements do require a large amount of data interaction, we consider limiting the transmission rate to reduce the pressure on the mesh network in an environment where hardware conditions cannot be improved. The implementation is an NSOperationQueue that makes a serial queue with a maximum allowed number of parallelism set to 1. The next command is executed only after the previous command is executed. One purpose of this is to reduce the pressure of communication; Second, if multiple messages are returned at the same time, they may be out of order (the return path of a message is not determined. Later messages may take a shorter path to the destination. And our device does not support the query of specific index data can only be queried all at once), mobile terminal cannot correctly identify. The implementation of transmission reliability is also based on the method of serial queue: when each command is sent out, the format of returned data can be clearly defined. After sending a command, the mobile terminal will confirm the completion of the current command only after receiving the corresponding message. After the query command is issued for the first time, it waits for 1s. If there is no return, it sends the second command after 2S. Wait for 1 s, continue to query the received data, there is no return after the 5 s try to query the last time, haven’t received correctly identify the current communication condition, this command failed to return we have reason to believe that other commands in the current network environment is also unable to receive the returned correctly, the program returns a timeout error, prompting the user to the current network condition. The extension of each interval takes into account that if the return message is not received successfully, the program considers that the current network environment is not good, and waits for more time to allow the mesh network to adjust and process the congested message.

After getting the data, we need to do a good job of local storage of the recorded data. Then each time you query such data, try to obtain it from the local device first. If no request is made from the device again. This reduces the amount of data to be queried.


The above is to explain the planning and design of the whole project from the overall concept. In real projects, categories are used to divide some management classes into multiple parts. Composition is superior to inheritance, especially in an environment where new business features are constantly being added to BLE projects. We implement basic functions on the main support, and then peel out relevant business codes according to different businesses. Convenient for our subsequent maintenance and expansion.