Hello everyone, I am Lao Pan

Not long ago, I couldn’t resist buying an M1 chip MAC mini and sold my old mbp2017 model to make up for my guilt. Data has also been completely migrated to the new machine, and my previous job has been completely changed from MBP2017 to MAC Mini. If you want to change it completely, don’t leave yourself a backdoor.

Why buy mini instead of macbook series, of course, is to reduce the cost of trying new products, mini for children’s shoes with a monitor and keyboard, it should be the most cost-effective test M1 chip (a treasure only 4700).

In addition, it needs to be noted that although the Apple Silicon M1 is an ARM architecture, it is similar to the previous iPad, but its performance is greatly improved compared with that of iPad A12Z. The specific running points are not shown here, the whole network can be searched casually, also for a joy, or need to see some of the common use of the situation.

Take a quick look at the box. In fact, this mini looks small before seeing the real thing, but in fact, it still feels quite big in the hand, at least in the bag is also a big one can not be ignored.

Take a look at the number of interfaces behind it. For me, the number of interfaces is not very important. It is good to meet the basic requirements. With a monitor, AN HDMI link to a 4K screen is perfect.

The keyboard is the IKBC silent red axis, the monitor is LG 27UL550, 27 inch 4K, although it is not the best size of 4K, the display level is relatively delicate, it is the entry-level 4K screen.

The display resolution is set to 2304 x 1296 60HZ, which is just right, after all, native 4K viewing eyes will be blind 😓, it needs to be noted that 30HZ and 60HZ have a great impact on the fluency of the mouse, before MBP2017 link 4k screen when 30HZ refresh rate is too uncomfortable to use.

Experience with

After more than a month of use, most of the cases are almost the same as usual. For me, it is just VSCODE+Pycharm+ some other tools (paste, esayConnect, iterm2, etc.), which are not very different from normal use, on the condition that you need to put a little effort into it. Even if you don’t need an IDE, the direct Iterm +vim plug-in solves most of the compiled code and usage scenarios.

There are some commonly used software, thunderbolt, QQ, wechat, nail nail, iqiyi what are no problem, which have a plenty of translation have a plenty of native support, no obvious difference is used at present. Feel free to use it boldly! Doesitarm.com/ It is Now On January 10th, most of the software have already supported the M1 chip.

There are a lot of reports about the use of M1 chips on the Internet, so I won’t go over them here, just pick out the aspects I’m interested in.

The latest news

Pycharm and Clion have natively supported Apple Silicon in their latest update on January 2nd (their whole family of products should support M1 already). Try it out, ZNM silky smooth.

CPU performance

To test the performance of the 8-core M1 CPU, the following code uses the Pytorch library to do matrix addition (the code is from github.com/pytorch/pyt…

from tqdm import tqdm
import torch

@torch.jit.script
def foo() :
    x = torch.ones((1024 * 12.1024 * 12), dtype=torch.float32).cuda()
    y = torch.ones((1024 * 12.1024 * 12), dtype=torch.float32).cuda()
    z = x + y
    return z


if __name__ == '__main__':
    z0 = None
    for _ in tqdm(range(10000000000)):
        zz = foo()
        if z0 is None:
            z0 = zz
        else:
            z0 += zz
Copy the code

The above code runs at 325 on a 1080ti, and you can see that the GPU is full with the nvidia-smI command.

11936/10000000000 [0% | | 00:44 < 8543:10:59, 325.15 it/s]Copy the code

Again, running this code with the CPU of the M1 chip (without the above CUDA ()) results in 45, and again the CPU is full.

The difference between the two is almost 7 times, but in fact there is a problem with this code, which does not take into account the transfer of data from CPU to GPU on 1080TI (while M1 does not take into account the transfer time), so it is not an objective comparison of CPU performance. PS: I really don’t train with CPU!

Hopefully Pytorch will run on Gpus of M1 chips (it will be hard to rely on Pytorch officials to push it, because official developers are busy and need to focus on other directions, and other open source developers are needed).

M1 Mac Mini Scores Higher Than My RTX 2080TI in TensorFlow Speed Test

Compile PyTorch on M1

To build Pytorch on M1, you need to install the arm-based conda. The arm-based conda can be downloaded at conda-forge.org/blog/posts/…

After installing Miniconda, follow these steps to compile and install Pytorch:

Github.com/pytorch/pyt…

Torch 1.8.0A0-cp38-cp38-macOSX_11_0_arm64.whl:

Link: pan.baidu.com/s/10WSazrv3… Password: ipp0

Additional reference links: iphonesdkdev.blogspot.com/2020/11/202…

Neural Engine

In fact, what attracts me most about M1 chip is its neural network engine (hereafter referred to as ANE) :

The neural network engine, or neural engine, first appeared in the A11 Bionic, the same chip used in the iphoneX/8, but at the time it was only available for face id and Animoji. It was only later that the A12 Bionic was deployed on mobile via Core ML, then the A13 Bionic, then the A14 Bionic, and each generation was better than the last.

The neural enging used by the M1 chip seems to have the same performance as the A14 Bionic chip, which is 16 and at most 11tflops. Remember that the GTX TAITAN X was just 11tflops, but of course the accuracy of the two is different. ANE only supports calculation of fp16 and (U) INT8 data.

Details about ANE can be found here.

coremltools

The easiest way to use apple’s Neural Engine is to use CoremlTools. The first step is to install CoremlTools! Clone from official GITHUB and execute:

1. cd to root of coremltools
2. mkdir build && cd build
3. cmake ..
4. make install
5. python setup.py install
Copy the code

It is recommended that you compile it yourself, and it should also be possible to install it using PIP (check if libcoremlpython.so is included in python’s site-package).

import numpy as np
import coremltools as ct
from coremltools.models.neural_network import datatypes, NeuralNetworkBuilder

input_features = [('image', datatypes.Array(3))]
output_features = [('probs', datatypes.Array(3))]

weights = np.zeros((3.3)) + 3
bias = np.ones(3)

builder = NeuralNetworkBuilder(input_features, output_features)
builder.add_inner_product(name='ip_layer', W=weights, b=None, input_channels=3, output_channels=3, has_bias=False, input_name='image', output_name='med')
builder.add_bias(name='bias', b=bias, input_name='med', output_name='probs', shape_bias=(3,))

mlmodel = ct.models.MLModel(builder.spec)
# ANE is used for the actual execution
out = mlmodel.predict({"image": np.array([1337.0.0], dtype=np.float32)})
print(out)
Copy the code

To invoke the ANE engine, run the above code. How do you know when pinching is called.

To observe the

We use dmesg to see if ANE is invoked.

The dmesg command detects and controls the kernel loop buffer, which can be used to learn about system startup information, as well as to see if the MAC system calls the neural engine.

Execute the following command to view the window and print information when the system calls the Neural Engine:

Watch the -n 0.1'sudo dmesg | grep H11'
Copy the code

Then run the.py code above.

python coreml_ane.py
{'probs': array([4012..4012..4012.])}
Copy the code

Watch dmesg: Watch dmesg:

[14453.207863] : the Sandbox: ContextStoreAgen(482) deny(1) mach-lookup com.apple.ocspdvirtual IORetu rn H11ANEIn::newUserClient(task_t, void *, UInt32, IOUserClient **) : H11ANEIn::newUserClient TY PE =2 [14453.228654]: virtual IOReturn H11ANEIn::newUserClient(task_t, void *, UInt32, IOUserClient ** ) : H11ANEIn::newUserClient : Creating Default full-entitlement Client [14453.228663]: virtual bool H11ANEInUserClient::init(task_t, OSDictionary *) - New UserClient f or process: Aned (pid 6887) [14453.228720] : IOReturn H11ANEInUserClient::ANE_PowerOn() - Client homicide PowerOn [14453.228723]: IOReturn H11ANEIn::ANE_PowerOn_gated(void *, const char *, bool) : H11ANEIn::Pow ering on ANE [14453.228728]: IOReturn H11ANEIn::ANE_PowerOn_gated(void *, const char *, bool) : H11ANEIn::AN E_PowerOn_gated - Wait until ANE gets powered upforClient < PTR > retries = 1 [14453.228775] : IOReturn H11ANEIn: : setPowerStateGated (unsigned long, IOService *) : H11ANEIn::se tPowerStateGated: 1 [14453.234362]: H11ANEIn:: POWER_on_Hardware - FW App Image... [14453.252851]: IOReturn H11ANEIn::ANE_Init(): Statistics: ColdStarts: 7, JetsamTriggeredColdSta RTS: 0, Resumes: 0, ResumesFailed: 0, SuspendsSuccessful: 0, SuspendsFailed: 0 FirmwareTimeouts: 0 ANEDeInits: 6 ANEInitFailures: 0 [14453.252864]: IOReturn H11ANEIn::ANE_Init(): Work Stats: WorkSubmitted: 6 WorkBegin: 6 WorkEn DED: 6 PendingRequests: 0 [14453.253097]: H11ANEIn: ANE_ProgramCreate_gated:, ZinComputeProgramMake, get Mcache size: 0x0 [14453.253100]: H11ANEIn: ANE_ProgramCreate_gated:,Program Identifier:ANEC V1 [14453.253108]: IOReturn H11ANEIn::ANE_ProgramCreate_gated(H11ANEProgramCreateArgs *, H11ANEProg ramCreateArgsOutput *, H11ANEProgramCreateArgsAdditionalParams *) : H11ANEIn: : the kernel is non - muta ble kernel section [14453.253162]. IOReturn H11ANEIn::ANE_ProgramCreate_gated(H11ANEProgramCreateArgs *, H11ANEProg ramCreateArgsOutput *, H11ANEProgramCreateArgsAdditionalParams *) : WARN: H11ANEIn: Intermediate buffer size is zero [14453.253342]. IOReturn H11ANEIn::ANE_ProcessCreate_gated(H11ANEProcessCreateArgs *, H11ANEProc essCreateArgsOutput *) : ProgramBuffer programHandle = 0x50c38b4FA8 programId = 0 [14453.254432]: virtual IOReturn H11ANEIn::newUserClient(task_t, void *, UInt32, IOUserClient ** ) : H11ANEIn::newUserClienttype=1 [14453.254434]: virtual IOReturn H11ANEIn: newUserClient(task_t, void *, UInt32, IOUserClient **) : H11ANEIn::newUserClient: Creating Direct evaluate Client [14453.254438]: virtual bool H11ANEInDirectPathClient::init(task_t, OSDictionary *) - New UserCl ientforProcess: python3.8 (PID 63314) [14453.286145]: IOReturn H11ANEIn::FreeIntermediateBuffer(H11ANEIntermediateBufferSurfaceParams *, bool): Passing NULLforIntemediate buffer. Returning from here [14453.286163]: IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, boCopy the code

Focusing on the ANE part above, You can see H11ANEInUserClient: : ANE_PowerOn () – > H11ANEIn: : ANE_Init () – > ANE_ProcessCreate_gated – > H11ANEIn: : FreeIntermediateBuffer – > ANE_ProcessDestroy_gated process.

Prints if the call fails (which can occur if the execution is not authorized) :

[14822.089254]: AMFI: The qualities of our living core dumpfor pid 73626 (a.out)Sandbox: 5 duplicate reports for Co
ntextStoreAgen deny(1) mach-lookup com.apple.ocspdSandbox: bird(516) deny(1) file-read-data /Use
rs/guoyanzongFailed to write key 1950826800 to SMC with error code 86Failed to write key 1950826
829 to SMC with error code 86Failed to write key 1950826801 to SMC with error code 86Failed to w
rite key 1950829892 to SMC with error code 86virtual IOReturn H11ANEIn::newUserClient(task_t, vo
id *, UInt32, IOUserClient **) : H11ANEIn::newUserClient type=2 [14822.989968]: virtual IOReturn H11ANEIn::newUserClient(task_t, void *, UInt32, IOUserClient **) : H11ANEIn::newUserClient: Creating Default full-entitlement Client [14822.989977]: virtual bool H11ANEInUserClient::init(task_t, OSDictionary *) - process a.out (p id 73673) denied accessCopy the code

Extract the dynamic link library

If you want to use the ANE of M1 externally (instead of coremltools), you can refer to the ANE section of Tinygrad (not very mature). Decomcompile to get the ANEServices dynamic link library specifically called in dyLD_SHARED_CACHE_arm64e:

strings dyld_shared_cache_arm64e | grep ANEServices

/System/Library/PrivateFrameworks/ANEServices.framework/Versions/A/ANEServices
/System/Library/PrivateFrameworks/ANEServices.framework/Versions/A/ANEServices
H11ANEServicesThread
/System/Library/PrivateFrameworks/ANEServices.framework/Versions/A/ANEServices
/System/Library/PrivateFrameworks/ANEServices.framework/ANEServices
__ZN6H11ANEL25H11ANEServicesThreadStartEPNS_26H11ANEServicesThreadParamsE
/System/Library/PrivateFrameworks/ANEServices.framework/Versions/A/ANEServices
/System/Library/PrivateFrameworks/ANEServices.framework/ANEServices
ANEServices
Versions/A/ANEServices
/System/iOSSupport/System/Library/PrivateFrameworks/ANEServices.framework/Versions/A/ANEServices
Copy the code

The repository for extracting dynamic link libraries is as follows:

Github.com/madordie/ds…

According to the way according to the steps of the readme to extract can, we usually need ANECompiler, ANEServices, AppleNeuralEngine, CoreML, Espresso this a few.

Call stack: libCoremlpython. so -> CoreML -> Espresso -> AppleNeuralEngine -> ANEServices.

We won’t go into details about how to call the ANE externally… It’s a little bit more complicated and needs to be covered in a separate article.

Some understanding about ANE can also take a look at this: www.slideshare.net/kstan2/why-…

brew

Homebrew can be installed by translation, directly executed, using the following command:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Copy the code

You can install it directly and then add ARCH-x86_64 before running brew, for example:

arch -x86_64` brew install opencv
Copy the code

Note that the libraries installed in this way are x86 by default! If you link x86 libraries during compilation, there will be a library architecture mismatch.

The 2021-01-31 update:

Native BREW has been available for a while now, supporting most libraries already compiled by ARM64. And can coexist with the Intel version, see this article:

Noahpeeters. DE/posts/apple…

VSCODE

VSCODE is still in preview (native M1 support), preview is yellow! Most plug-ins work by translation.

The plugin is currently x86 and needs to be translated to run:

Allow extension’s x64 binaries to run on Apple Silicon x64 emulator

The game

In terms of games, those who only play the CLOUD game version of LOL, use Tencent’s START client, and those with MAC versions are free in public testing.

Surprisingly smooth, almost the same as the local game, probably due to wiFI-6, my 100M Internet can come in handy, I still use the wireless network to play, by the way show a video, there is no pressure to play.

I saw that some other people can play LOL using Windows VIRTUAL machine, as this kind of software gradually improves, running Windows on MAC will gradually become perfect after a while.

Check whether the library used is arm architecture

Use the following command:

lipo -info xxx
Copy the code

You can check whether the executable file or dynamic link library you are using is based on the Arm architecture to ensure that the software is properly structured. For example, compile the source code for Pytorch directly on the MAC and check to see if _c.cpython-38-Darwin. so is arm:

@bogon Torch % lipo-info _c.cpython-38-darwin. so // x86 architecture does not work properly on M1 non-fat file: _C. cpython-38-Darwin. So is Architecture: X86_64 // Arm architecture Architecturesin the fat file: _C.cpython-38-darwin.so are: x86_64 arm64
Copy the code

Therefore, if the M1 chip does not run properly executable file or dynamic link library, first use this command to see if it is ARM architecture!

A few minor problems

There are a few minor bugs (which may be fixed later, but still exist) :

  • When the MAC Mini’s HDMI is connected to the monitor, it will occasionally freeze. In fact, the system does not have a monitor card but a monitor card. Just plug and unplug the monitor port or switch some display sources
  • When you use paste, you will get a block owner

The current system is Big Sur Version 11.1.

Afterword.

That’s all for now. For the MAC-Mini on the M1 chip, everything is no different from the MBP on the x86 chip. In addition to compiling and linking some of the source code needs to pay attention to the architectural problems, the trouble some of the trouble, but this is also the joy of the programmer, right?

communication

If you are like-minded with me, Lao Pan is willing to communicate with you; If you like Lao Pan’s content, welcome to follow and support. The blog is updated with an in-depth original article every week. Follow oldpan blog and miss the latest articles. Lao Pan will also organize some of his own private collection, hope to help you, the public account reply “888” to get Lao Pan learning route information and article summary, there are more waiting for you to dig. If you don’t want to miss Penn’s latest tweets, check out the mystery link.