The original article is reprinted from liu Yue’s Technology blog v3u.cn/a_id_201

AnimeGAN has released the latest version of AnimeGAN, a popular animation transform filter, to a lot of buzz. Speaking of two-dimensional elements, there is no doubt that Douyin app has the largest user base in China. Its built-in animation conversion filter “Transformation into Cartoon” enables users to transform their actual appearance into two-dimensional elements “painting style” during live broadcast. For two-dimensional fans, “breaking the dimensional wall and turning into a paper man” is always a good way to amuse themselves:

However, it is hard to avoid some aesthetic fatigue when you read too much. The “awl face” of a thousand people and the invariable “Cajilan” big eyes make people feel a little bit like chewing wax, which is too much of a good thing and the reality is lost.

The AnimeGAN animation style filter based on CartoonGan can retain the characteristics of the original image, while simultaneously maintaining the cool of the second dimension and the realistic of the third dimension, which is quite a combination of hardness and softness, and the feeling of lifting heavy load with ease:

And AnimeGAN project team has been released online demo interface, can be directly run model effect: huggingface. Co/Spaces/akha… However, due to the limitation of bandwidth and online resource bottleneck, the online migration queue is often in the queuing state, and the uploading of some original images may also cause the leakage of personal privacy.

Therefore, we locally set up AnimeGANV2 version of static pictures and dynamic video conversion service based on Pytorch deep learning framework in Mac OS Monterey of M1 chip.

As we know, the current CPU version of Pytorch is Python3.8 supported on THE M1 chip MAC, and in a previous article: Marriage is easy to match and rare wood | M1 Mac OS (Apple Silicon) born of a couple Python3 development environment to build (integrated Tensorflow/Pytorch deep learning framework), used Pytorch condaforge to construct the development environment, This time we use native installation package for installation, first enter the Python’s website, download Python3.8.10 universal2 stable version: www.python.org/downloads/r…

Double-click install, then go to terminal and type to install Pytorch:

Pip3.8 Install Torch TorchVision TorchaudioCopy the code

Here we install the latest stable version 1.10 by default, then go to the Python3.8 command line and import the Torch library:

(Base) ➜ video git:(main) University python3.8 Python 3.8.10 (v3.8.10: 3D8993a744, May 3 2021, 09:09:08) [Clang 12.0.5 (clang-1205.0.22.9)] on Darwin Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>>Copy the code

Once Pytorch is ready to use, clone the official project:

git clone https://github.com/bryandlee/animegan2-pytorch.git
Copy the code

AnimeGAN is also based on Generative adversarial network. The principle is that we have a certain amount of Generative adversarial images, which we can call three-dimensional images, and there is a distribution of real image features, such as: Normally distributed, uniformly distributed, or more complex forms of distribution, then the purpose of GAN is to generate a batch of data that is close to the real distribution through generators. The data can be understood as a quadratic optimization, but some of the three-dimensional characteristics will remain, such as larger eyes, face shape more similar to the filter model, and so on. In our processing, this generator tends to use neural networks because it can represent more complex data distribution.

Celeba_distill. Pt and paprika. Pt are used to transform landscape pictures. Face_paint_512_v1.pt and face_paint_512_v2.pt focus more on transforming portraits.

First install the image processing library, Pillow:

Pip3.8 install PillowCopy the code

Then create a new test_img.py file:

`from PIL import Image  
import torch  
import ssl  
ssl._create_default_https_context = ssl._create_unverified_context  
  
model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="celeba_distill")  
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v1")  
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v2")  
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="paprika")  
  
  
face2paint = torch.hub.load("bryandlee/animegan2-pytorch:main", "face2paint", size=512)  
  
img = Image.open("Arc.jpg").convert("RGB")``out = face2paint(model, img)  
  
out.show()`
Copy the code

Here take the photos of the Arc de Triomphe as an example, and use celeba_distill and Paprika filters respectively to check the effect. Note that SSL certificate detection needs to be disabled for local request, and online model parameters need to be downloaded for the first time:

Here the image size parameter refers to the total number of wide and high channels. Next, the character portrait animation style is transformed. Adjust the imported model generator type and input the picture to the character portrait:

from PIL import Image  
import torch  
import ssl  
ssl._create_default_https_context = ssl._create_unverified_context  
  
import numpy as np  
  
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="celeba_distill")  
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v1")  
model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v2")  
#model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="paprika")  
  
  
face2paint = torch.hub.load("bryandlee/animegan2-pytorch:main", "face2paint", size=512)  
  
img = Image.open("11.png").convert("RGB")  
  
out = face2paint(model, img)  
  
  
out.show()
Copy the code

It can be seen that V1 filter is more stylized, while V2 retains the characteristics of the original image on the basis of stylization. It is derived from three-dimensional but not restricted to experience, and it is aerial but not superficial. It is unknown how much higher than the cartoon filter of Douyin.

Let’s take a look at animation filter conversion for dynamic video. Video, in a broad sense, is a sequence of multiple images, but it depends on the rate of the video frame. The frame rate is also called Frames PerSecond, which refers to the number of images refreshed PerSecond. It can also be interpreted as the graphics processor being able to refresh several times per second. The higher the frame rate, the smoother and more realistic the animation, and the more frames per second (FPS), the smoother the action displayed.

Here you can use third-party software to convert continuous video into pictures in FPS units. On M1 MAC OS, Ffmpeg, the well-known video processing software, is recommended

Install using Homebrew with ARM architecture:

brew install ffmpeg
Copy the code

After the installation is successful, type the ffmpeg command on the terminal to view the version:

(base) ➜ animegan2-pytorch git:(main) those who go onto university can go onto ffmpeg ffmpeg version 4.4.1 Copyright (c) 2000-2021 the ffmpeg developers built With Apple Clang Version 13.0.0 (Clang-1300.0.29.3) Configuration: -- prefix = / opt/homebrew/Cellar/ffmpeg / 4.4.1 _3, enable -shared - enable - pthreads - enable - version3 - cc = clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-avresample --enable-videotoolboxCopy the code

Install without problem, then prepare a video file and create a new video_img.py:

Mp4-r 15-s 1280, 720-ss 00:00:20-to 00:00:22./myvideo/%03d.png") import OS # os.system(" ffmPEG-i./ video.mp4-r 15-s 1280, 720-ss 00:00:20-to 00:00:22./myvideo/% 03D.png ")Copy the code

Here we use Python3 built-in OS module to directly run the ffmpeg command, for the current directory of the video, at the rate of 15 frames per second, -s parameter represents the video resolution, -ss parameter can control the start and end position of the video, and finally the directory to export the image.

After running the script, go to myVideo directory:

(base) ➜ animegan2-pytorch git:(main) go onto university CD myvideo (base) ➜ myvideo git:(main) go onto university ls 004.png 007.png 010.png 013.png 016.png 019.png 022.png 025.png 028.png 002.png 005.png 008.png 011.png 014.png 017.png 020.png 023.png 026.png Those who qualify can go onto university. (base) ➜ myvideo git:(main) qualifyCopy the code

As you can see, the file name of the image subscript by frame number has been converted.

Next, we need to batch convert the images using AnimeGAN filters:

from PIL import Image  
import torch  
import ssl  
ssl._create_default_https_context = ssl._create_unverified_context  
  
import numpy as np  
  
import os  
  
img_list = os.listdir("./myvideo/")  
  
  
# model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="celeba_distill")  
# model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v1")  
model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="face_paint_512_v2")  
# #model = torch.hub.load("bryandlee/animegan2-pytorch:main", "generator", pretrained="paprika")  
  
face2paint = torch.hub.load("bryandlee/animegan2-pytorch:main", "face2paint", size=512)  
  
for x in img_list:  
  
    if os.path.splitext(x)[-1] == ".png":  
  
        print(x)  
  
        img = Image.open("./myvideo/"+x).convert("RGB")  
  
        out = face2paint(model, img)  
  
        out.show()  
        out.save("./myimg/"+x)  
  
        # exit(-1)
Copy the code

For each conversion, the original image is retained and the filtered image is stored in the relative directory myimg. Then, img_video.py is created to convert it back to video:

Os. system(" ffmPEG-y-r 15-i./myimg/%03d.png -vcodec libx264./myvideo/test.mp4")Copy the code

It’s still 15 frames per second, the same rate as the original video.

If the original video has an audio track, you can separate the audio track first:

Import OS os.system(" ffmPEG-y-I./ Lisa.mp4-ss 00:00:20-to 00:22-vn-y-acodec copy./myvideo/3.aac")Copy the code

After the animation filter conversion, merge the converted video with the original audio track:

System (" ffmPEG-y -i./myvideo/ test.mp4-i./myvideo/ 3.aac-vcodec copy-acodec copy./myvideo/output.mp4")Copy the code

Test case for the original video:

Effect after conversion:

The CPU version of the Pytorch runs well with the M1 chip, but unfortunately the GPU version of the Pytorch will have to wait a while. Last month soumith, a member of the Pytorch project team, said:

So, here’s an update.

We plan to get the M1 GPU supported. @albanD, @ezyang and a few core-devs have been looking into it. I can’t confirm/deny the involvement of any other folks right now.

So, what we have so far is that we had a prototype that was just about okay. We took the wrong approach (more graph-matching-ish), and the user-experience wasn’t great — some operations were really fast, some were really slow, there wasn’t a smooth experience overall. One had to guess-work which of their workflows would be fast.

So, we’re completely re-writing it using a new approach, which I think is a lot closer to your good ole PyTorch, but it is going to take some time. I don’t think we’re going to hit a public alpha in the next ~4 months.

We will open up development of this backend as soon as we can.

Pytorch will not be released in the near future, perhaps in the second half of next year.

Conclusion: Both CartoonGAN of Tsinghua University and AnimeGANv2 based on CartoonGAN are undoubtedly the best in the industry and the pinnacle of the peak. Even if they are placed next to a project like PyTorch-Gan in the scope of artificial intelligence in the world, they are not inferior. In the field of artificial intelligence, AnimeGANv2 announced to the world that the days when the Chinese could only make pill supplements are gone forever.

The original article is reprinted from liu Yue’s Technology blog v3u.cn/a_id_201