Ali Cloud open source image-Syncer tool, the ultimate tool of container image migration synchronization

Why make this tool?

Because ali cloud container service ACK in the use of cost, operation and maintenance costs, convenience, long-term stability greatly exceeds the company’s self-built self-maintenance Kubernets cluster, there are many companies have wanted to maintain their own Kubernetes load migration to Ali cloud ACK service. In the process of migration, there is often a small pit: that is how to smoothly migrate the existing container image to Ali Cloud image service ACR. This problem looks very simple, if there are only three or five mirrors, only one docker pull/docker push can be completed, but the actual production involves thousands or hundreds of mirrors, a few tons of mirror warehouse data, the migration process will become very time-consuming, and even lost data.

The engineers of Aliyun Cloud native application platform, namely us, found that this was a common requirement. Users would migrate between various container image warehouses, or further, expect synchronous replication capability. Therefore, we developed image-Syncer project to support migrating cloud, and open source to the public in the industry. Used to solve the generic container image batch migration/synchronization problem.

In the actual production, this tool has helped many customers to carry out mirror migration, among which the total amount of the largest mirror warehouse reaches more than 3T, which can run full machine bandwidth during synchronization, and the disk capacity of the machine for synchronization task is not required.

Image – syncer profile

As mentioned above, in the K8S cluster migration scenario, image migration/synchronization between image warehouses is a basic requirement. However, the traditional method of docker pull/push combined with scripts for image synchronization has the following limitations:

Therefore, local mirrors need to be deleted in a timely manner because disk storage is required. In addition, falling disks cost extra time. Therefore, a large number of mirrors cannot be migrated in production scenarios
Depending on docker programs, Docker Daemons have a strict limit on the number of concurrent pulls/pushes, and cannot perform high concurrency synchronization
Some functions can only be operated through THE HTTP API, which cannot be done by using the Docker CLI alone, making the script complicated

Image-syncer is a simple and easy-to-use batch image migration/synchronization tool that supports almost all the current mainstream image storage services based on Docker Registry V2, such as ACR, Docker Hub, Quay, self-built Harbor, etc. At present, the tB-level production environment image migration has been initially verified, and started from github.com/AliyunConta… , welcome to download and use and provide valuable suggestions ~

Tool features

Image-syncer has the following features:

2. Support Docker image warehouse services built based on Docker Registry V2 (such as Docker Hub, Quay, Ali Cloud image service ACR, Harbor, etc.) 3. 4. Incremental synchronization: The bloB information of the synchronized mirror is dropped to the disk, and the synchronized mirror is not repeatedly synchronized. 5. In concurrent synchronization, you can adjust the number of concurrent synchronization tasks based on the configuration file. 6. Automatically retry failed synchronization tasks, which eliminates network jitter in most mirror synchronization. Not dependent on Docker and other programs

With image-Syncer, you can quickly migrate, copy, and increments from the repository by ensuring that the image-Syncer operating environment is connected to the registry network that needs to be synchronized. Moreover, there is almost no requirement on hardware resources (image-Syncer strictly controls the number of network connections = the number of concurrent images, so only when a single image layer is too large, the excessive number of concurrent images may fill the memory, and the memory usage <= the number of concurrent images x maximum image layer size). In addition to using the retransmission mechanism to avoid occasional problems that may occur during synchronization, image-Syncer counts the number of images that failed to be synchronized at the end of the operation and prints detailed logs to help users locate problems that may occur during synchronization.

Use guide

To run image-syncer, the user only needs to provide a configuration file, which is as follows:

{
    "auth": {// Authentication fields, where each object is an account and // password of a Registry; In general, the source needs to have pull and access tags, and the target needs push and create repositories. If not, anonymous access is default"quay.io": {// The URL of registry, which needs to be the same as the url of registry in the images below"username": "xxx"// User name, optional"password": "xxxxxxxxx"// The password is optional"insecure": true// Registry is an HTTP service. If so, the insecure field needs to betrueThe default isfalse, optional. Support for this option requires image-syncer version > v1.0.1}"registry.cn-beijing.aliyuncs.com": {
            "username": "xxx"."password": "xxxxxxxxx"
        },
        "registry.hub.docker.com": {
            "username": "xxx"."password": "xxxxxxxxxx"}},"images": {// Sync mirror rule field, where a rule includes a source repository (key) and a target repository (value) // The largest unit of sync is the repository (repO), Does not support through a rule synchronization of the namespace and registry warehouse locations / / the source and target formats and docker pull/push command using image url similar (registry/repository/namespace: tag) / / The source and target warehouse warehouse (not to an empty string if the target warehouse) containing at least registry/namespace/repository / / source warehouse field cannot be empty, If you need to synchronize a source repository to multiple target repositories, you need to configure multiple rules. // The target repository name can be different from the source repository name (and the tag can be different). In this case, the synchronization function is similar to docker pull + docker tag + Docker push"quay.io/coreos/kube-rbac-proxy": "quay.io/ruohe/kube-rbac-proxy"."xxxx":"xxxxx"."xxx/xxx/xx:tag1,tag2,tag3":"xxx/xxx/xx"// If the field in the source repository does not contain a tag, all the tags in the source repository are synchronized to the target repository. In this case, the target repository cannot contain a tag. The default is to use the source tag // the tag in the source repository field can contain more than one tag (e.g"A/b/c: 1, 2, 3"), tag between passes","// If the target repository is an empty string, the source image will be synchronized to the default namespace of the default Registry, and the repO and tag will be the same as the source repository. The default Registry and default namespace can be configured using command-line parameters as well as environment variables, as described below}}Copy the code

Users can configure different combinations of mirror synchronization rules to meet different migration/synchronization requirements. Such as synchronizing a single mirrored REPO to multiple different mirrored RepOs, synchronizing multiple source images to a single mirrored REPO (distinguished by tags), copying a mirrored REPO in the same Registry with a different name, and so on. When using the registry address, it should be noted that if anonymous access is provided to the registry address as the source of synchronization, there may be permission problems such as unable to pull the image and unable to obtain tags. In this case, the account password with corresponding permission should be added to “auth”. However, if anonymous access to the Registry address as the synchronization target, there may be a permission problem and the image cannot be pushed, and users may also need to provide the account password with the corresponding permission. Image-syncer also supports registry of insecure (like docker’s — insecure – registry argument, add “insecure” to the corresponding entry of “auth “: True) can be migrated between HTTP and HTTPS mirroring services at the same time. Image-syncer also provides some simple parameters to control the operation of the program, including concurrency control, retransmission times Settings, and so on:

-h --help Displays the current default values of some startup parameters --config Specifies the path to the configuration file provided by the user. You need to create the configuration file before using it. By default, the configuration file is printed from the image-syncer.json file --log in the current working directorylogIf you print logs to a file, there will be no command line output. In this case, you need to check the log file corresponding to CAT. -- Namespace Set the default target namespace. The default registry is also valid when the default registry is not empty, which can be set through the environment variable DEFAULT_NAMESPACE. At the same time, the command line parameter value --registry is used to set the default target Registry. When the target repository of an images rule is empty in the configuration file, The default namespace is also valid if the namespace is not empty, which can be set by using the environment variable DEFAULT_REGISTRY, and the command line parameter values are passed in preference to --proc concurrency, number of concurrent goroutines for mirror synchronization, Default is 5 --records Specifies the file output/read path to save the bloB during the transfer. By default, the output is to the current working directory. A record records the migrated information of the corresponding destination warehouse, which can be used for consecutive multiple migrations (saves a lot of time, If there is an error such as unknown blob, delete the file and try again. Retries fails the number of retries for a synchronization task. The default value is 2. In addition, the generation of the task that failed for the corresponding number of times will be tried again. For occasional network errors such as IO timeout and TLS Handshake timeout, you can set the retry times to reduce the number of failed tasksCopy the code

After the synchronization, image-Syncer will count the number of successful and failed synchronization tasks (each synchronization task represents an image). “Finished, sync Tasks failed, tasks generate failed” is displayed in the standard output and logs to obtain the synchronization result. For more FAQs see faqs.md

Use the sample

Alibaba Cloud Container Registry (ACR) is a Container image hosting service provided by Alibaba Cloud. It supports the full life cycle management of images in 20 regions around the world, and combines Container services and other Cloud products to create a one-stop experience of Cloud native applications. Here is a basic use example of image-Syncer by synchronizing an image on a self-built harbor to ACR

Synchronized mirror from self-built Harbor to ACR

1. Open the container image service on the Ali Cloud console and access the ACR console

2. Create a namespace. The default repository type determines whether the repository type docker push automatically creates is public or private when the repository does not exist. If some of the target repositories that need to be synchronized do not exist, turn on the auto-create repository button so that operations like “Docker push” can automatically create the repository

3. Create access credentials. The corresponding account is the Docker login account, as shown below:

4. The preceding operations are performed using the primary account, which has all rights by default. To manage permissions, we can also create RAM sub-accounts and configure corresponding permissions. In this scenario, we only use permissions related to creating and updating the image warehouse. The minimum permissions are set as follows, and the resource granularity of access control is the image-Syncer namespace:

{
    "Statement": [{"Effect": "Allow"."Action": [
                "cr:CreateRepository"."cr:UpdateRepository"."cr:PushRepository"."cr:PullRepository"]."Resource": [
                "acs:cr:*:*:repository/image-syncer/*"]}],"Version": "1"
}
Copy the code

5. Similarly, RAM account needs to log in to the Ali Cloud console through the RAM user login entrance and enter the ACR console to create access credentials (same as 3.)

6. Then we can complete the following image-syncer synchronization configuration by accessing the password created in the certificate (the access certificate of RAM sub-account is used in the configuration); Here we will set harbor (HTTP service, set insecure, Through harbor.myk8s.paas.com: 32080) in the library/nginx warehouse synchronization to north China 2 (via registry.cn-beijing.aliyuncs.com) as an image – syncer Namespace, and keep the repository name nginx, config.json as follows:

{
    "auth": {
        "harbor.myk8s.paas.com:32080": {
            "username": "admin"."password": "xxxxxxxxx"."insecure": true
        },
        "registry.cn-beijing.aliyuncs.com": {
            "username": "acr_pusher@1938562138124787"."password": "xxxxxxxx"}},"images": {
        "harbor.myk8s.paas.com:32080/library/nginx": ""}}Copy the code

7. Download the latest image-Syncer executable (currently only supported by Linux AMD64, you can compile it yourself), unpack it, and run the tool

Execute command:

# set the default registry to registry.cn-beijing.aliyuncs.com and the default target namespace to image-syncer
The number of concurrent attempts is 10 and the number of retries is 10
# log to./log file, no logs are created automatically, if not specified, logs are printed to Stderr by default
Harbor-to-acr. json, as described above
./image-syncer --proc=10 --config=./harbor-to-acr.json --registry=registry.cn-beijing.aliyuncs.com --namespace=image-syncer --retries=10 --log=./log
Copy the code

A synchronization goes through three phases: generating a synchronization task, executing a synchronization task, and retry a failed task. Each synchronization task represents a tag (mirror) to be synchronized. If no tag is specified in a rule in the configuration file, the system automatically lists all the tags in the source repository and generates a synchronization task. If the synchronization task fails to be generated, the system tries again in the retry phase. (when the account password is intentionally mismatched) the following output is executed:

Output for normal operation:

When running, image-Syncer will print the following log information:

Sync your harbor image to ACR Enterprise Edition

ACR Enterprise edition provides enterprise-level container mirroring, Helm Chart security hosting capabilities, enterprise-level security exclusive features, image distribution of thousands of nodes, global multi-region synchronization capabilities. Provide cloud native application delivery chain to realize automatic delivery of application change once and multiple scenarios globally. It is recommended for enterprise customers with high security requirements, service deployment in multiple regions, and large-scale cluster nodes.

The operations required to synchronize to ACR Enterprise edition and ACR General edition are basically the same:

1. Create an instance of ACR Enterprise Edition

2. Create a namespace, set the default warehouse type, and enable the automatic warehouse creation function

3. To configure access control on the public network, enable the access portal of ACR Enterprise edition and add a public network whitelist to enable external users to access the image service

4. Configure access credentials. This part is the same as the ACR common version

5. Use the password created in the access certificate to complete the image-syncer synchronization configuration. Different from the ACR shared edition, each ACR enterprise edition instance has its own domain name (one is available on the public network, and the other is available only on private networks). If the image synchronization tool runs on a personal environment, the public domain name needs to be used. If you want to use the domain name visible only to the private network, run the mirror synchronization tool on ali Cloud ECS instance, and configure the domain name to be visible to the private network where the ECS is located. Use the public domain name ruohe-test-registry.cn-shanghai.cr.aliyuncs.com), and namespace will be isolated between each enterprise version instance. We also set up harbor (HTTP service, to set I nsecure, Through harbor.myk8s.paas.com: 32080) in the library/nginx warehouse synchronization to ACR enterprise image – syncer namespace in the instance, and keep the warehouse name is nginx, config. Json is as follows:

{
    "auth": {
        "harbor.myk8s.paas.com:32080": {
            "username": "admin"."password": "xxxxxxxxx"."insecure": true
        },
        "ruohe-test-registry.cn-shanghai.cr.aliyuncs.com": {
            "username": "ruohehhy"."password": "xxxxxxxx"}},"images": {
        "harbor.myk8s.paas.com:32080/library/nginx": ""}}Copy the code

6. Run the tool to run the command

Application registry: ruohe-cluster application registry: ruohe-cluster application registry: ruohe-cluster application registry: registry.cn-shanghai.cr.aliyuncs.com
The number of concurrent attempts is 10 and the number of retries is 10
# log to./log file, no logs are created automatically, if not specified, logs are printed to Stderr by default
Harbor-to-acr. json, as described above
./image-syncer --proc=10 --config=./harbor-to-acr.json --registry=ruohe-test-registry.cn-shanghai.cr.aliyuncs.com --namespace=image-syncer --retries=10
Copy the code

The output is the same as above

More ability

Does the above image-syncer satisfy all of your container image migration and synchronization requirements? If you need more, or even want to build more capabilities, please visit github.com/AliyunConta… Leave an issue and join the Kubernetes nails group discussion

【 Kubernetes nail group qR code 】

Open source is not easy, and long-term maintenance projects are not easy. If you feel good, please give this project a STAR. The boss in the company will decide whether to invest more RESEARCH and development resources to maintain this project according to the number of star of this project

One More Thing

So, mirror warehouse can smooth migration, whether to move cloud can be carried out smoothly? The answer is — it’s not that simple. The warehouse is just one of the problems in moving the cloud. There are other pain points that need to be addressed.

For users who have already run their business applications on K8S in private/public clouds, how to keep their business unaffected in the process of moving to the cloud is a top priority. The solution architect of Aliyun cloud native application platform has taken this into consideration and is committed to helping users migrate their applications to ACK service efficiently and stably. Moved in to help the user to carry out the cloud solution at the same time, we are thinking about how to do the common things in these cases some precipitation, sums up some good solutions, best practices, and development tools to help users quickly complete moving cloud this matter, this is our migration process for the user to take into account the point

If you have the need to migrate aliyun ACK, please click on me! Looking forward to your message