Computational vision - content-based image scaling

Problems with image scaling

I believe you have seen such a problem, the image needs to adapt to the size of the problem, a very simple and rough way is stretching, but the effect is not satisfactory, take the blogger’s own monitor as an example, see the following figure.

In fact, image scaling is so ubiquitous that it can cause problems when the same image is displayed on different display devices (such as mobile phones with different display sizes). To take a more straightforward example, the size saved during filming of a movie often doesn’t match the size of a movie theater screen, which requires scaling.

The most obvious solution (the violent solution) is to stretch the image in the image above, and the result is rotten, but the purpose has been achieved. However, if you are watching a movie and encounter a film that has been violently stretched, I personally think it will not have a better viewing experience, or at least affect the normal perception. Therefore, various processing methods of image scaling have emerged.

The Letterbox and Pillarbox

Letterbox and Pillarbox are Letterbox and Pillarbox.

~~There’s a beefcake’s favorite Pikachu and a Bilibili watermark in the upper right corner.~~

Similarly, Pillarbox is a method of using low aspect ratio images on high aspect ratio display devices, with black boxes on both sides of the video, which won’t be explained much.

Pan&Scan

Pan&Scan is a method still used for transcribing film and television works, such as displaying 16 on 4:3 devices (such as a short film screen) : When the image of 9 (such as a very long movie picture, the length refers to the horizontal length of the picture) is made to be the same height as the device (the length is the same in the vertical direction), the redundant image in the horizontal direction will be truncated, and the range of interception can be adjusted according to the center point offset.

As shown in the figure below, the dark area is the capture frame with the same proportion with the display size, while the combination of the light part and the dark part is the original picture to be cut, just like a frame is sliding scanning, so it is called Pan & Scan. Appropriate scaling is achieved by searching the intercept region with the least loss of information.

The paper on pan&scan: graphics.cs.wisc.edu/Papers/2006…

Pixels to remove

Compared with the previous three methods, pixel removal finally has the meaning of image processing, and the idea is relatively easy to understand. If the important content of the image is on both sides, cropping must not be able to preserve two important content at the same time, then in case some strange party a needs to have this effect in extreme circumstances, wouldn’t photoshop be the only option? Of course, it seems impossible, but it’s actually possible, like prime elimination.

Since pixels are spatially correlated with each other, if image compression is to be carried out, the compression effect of discarding some rows (or columns) of pixels that are not important (i.e. areas with less gradient information) is actually quite considerable, as shown in the figure.

The image at the top is the original image, the heat image is a gradient image, and the image at the bottom right is the result of pixel culling (select the result with the least gradient information accumulated in each column and cycle the culling result). In fact, the effect is ok (unintentionally).

Seam Carving

Round 4 circles (mark) sub (topic) finally arrived at the key method of this article! Seam Carving is a magical image scaling algorithm.

Links up to jilt paper first paper on the seam carving: graphics.cs.cmu.edu/courses/15-…

As mentioned in Pan&Scan, we can distinguish the main important content by the gradient information of the image.

Seam Carving Algorithm

Assumption: You only need to crop in one direction.
Basic idea: Remove “unimportant pixels” from the image
Based on the gradient as a function of energy
- I keep the outline
- Human vision is more sensitive to edge information — thus removing content from smooth areas
- The algorithm is simple and the idea is easy to understand, but it can produce good results
A path of pixels from top to bottom (or left to right). Each row contains only one pixel and the horizontal movement of the path of pixels up and down the two rows is less than or equal to 1

The mathematical description is as follows

Seam Carving aims to find the Seam with the best path

Based on the dynamic programming algorithm, we can use recursive relations to solve, as shown in the following equation.

The accumulation of energy loss is easier to understand as shown below

The algorithm pseudocode is shown in the figure below.

Put on some amazing results

Actual code

# Python
# seam_carving.py
import numpy as np
from skimage import color


def energy_function(image):
    """Computes energy of the input image. For each pixel, calculate its x- and y- gradients and add their absolute values. Remember to turn color images into grayscale images first. Hint: Args: image: (H, W, 3) Returns: out: Energy (H, W) in each pixel""
    H, W, _ = image.shape
    out = np.zeros((H, W))
    gray_image = color.rgb2gray(image)

    ### YOUR CODE HERE
    out1,out2= np.gradient(gray_image)
    out1 = np.abs(out1)
    out2 = np.abs(out2)
    out = out1 + out2
    ### END YOUR CODE

    return out


def compute_cost(image, energy, axis=1):
    """In this step, from the top to the bottom of the picture, the path to find the minimum cost starts from the first line and calculates the cumulative energy sum of each pixel, namely cost. Pixel cost is defined as the minimum cumulative energy sum of pixels on the same Seam, starting from the top. In the meantime, we need to go back to this path. There are only three possibilities for the value of each pixel on the path: -1,0, and 1, where -1 means the current pixel is connected to the element in its upper left corner,0 means the current pixel is connected to the element directly above it, and 1 means the current pixel is connected to the element to the upper right of it. For example, for a 3 by 3 matrix, if the value of point (2,2) is -1, it means that point (2,2) is connected to point (1,1). When the energy is the same, we specify that we take the leftmost path. Note that the np.argmin function returns the position of the minimum value in the array (the index value). Tip: since this function is used a lot, it can slow down your program if you loop too much. Normally, you loop through columns only once, not through elements in each row. If you take the cost of (I,j), then (I,j) can only be attached to (I -1,j-1), (I -1,j), or (I -1,j+1), and be the smallest of them. To avoid looping through the elements of each row, we can vectorize. For example: Assuming our energy = [1, 2, 3; 4, 5, 6], now we need to determine which elements of the second row [4, 5, 6] are connected to the first row, then we only need to construct a new matrix M = [∞, 1, 2, 2; 3;2, 3, ∞]. The first column of the matrix M represents the three possible corresponding elements of element 4, namely: [infinity, 1,2]; The second column represents the three possible corresponding elements of element 5, namely [1, 2, 3]; The third column represents the three possible counterparts of element 6, namely [2, 3, infinity]. And in this way, we can just minimize M once vertically, and we can figure out all of the elements in the second row. It avoids looping through elements in each row. Meanwhile, we can use the np.argmin function to compute the path argument once: image: this function is not used (left here to have the same interface as the compute_forward_cost function) Axis: Determine along which axis (axis=1 horizontal, axis=0 vertical) cost: array paths of shape (H, W) Array of shape (H, W) with elements -1, 0, or 1"""
    energy = energy.copy()

    if axis == 0:
        energy = np.transpose(energy, (1, 0))

    H, W = energy.shape

    cost = np.zeros((H, W))
    paths = np.zeros((H, W), dtype=np.int)

    # initialization
    cost[0] = energy[0]The cosine of t in the first row is itself
    paths[0] = 0  # For the first line, we don't care

    ### YOUR CODE HERE
    
    for i in range(1,H):
        M=np.zeros((3,W))
        M[0][0]=float('inf'[0, 1) M:] = cost [] I - 1, : W - 1 M = cost [1, :] [I - 1, :] M [2: W - 1] = cost [I - 1, 1:] M [2] = [W - 1]float('inf')
        cost[i]=np.min(M,0)+energy[i]
        paths[i]=np.argmin(M,0)-1
        
        
    ### END YOUR CODE

    if axis == 0:
        cost = np.transpose(cost, (1, 0))
        paths = np.transpose(paths, (1, 0))

    The path contains only -1, 0, or 1
    assert np.all(np.any([paths == 1, paths == 0, paths == -1], axis=0)), \
           "paths contains other values than -1, 0 or 1"

    return cost, paths


def backtrack_seam(paths, end):
    ""To do this, we need to start from the bottom row of the image and find the entire Seam in the direction given in the Paths diagram: - top left (-1) - right (0) - top right (1) Returns: Seam: An array of shape (H, W) where the i-th element holds the index value of the i-th row. That is, the position of each element in Seam is (I, seam[I]). """
    H, W = paths.shape
    Initialize with -1 to ensure that each element is correctly assigned (if not, the value is still -1, and -1 is an invalid index)
    seam = - np.ones(H, dtype=np.int)

    # Last element
    seam[H-1] = end
    G=np.ones(H, dtype=np.int)
    a=np.ones(H, dtype=np.int)
    ### YOUR CODE HERE
    for i inRange (H - 1, 0, 1) :if paths[i][seam[i]]==-1:
            seam[i-1]=seam[i]-1
        if paths[i][seam[i]]==0:
            seam[i-1]=seam[i]
        if paths[i][seam[i]]==1:
            seam[i-1]=seam[i]+1          
    ### END YOUR CODE

    Seam contains only [0, W-1]
    assert np.all(np.all([seam >= 0, seam < W], axis=0)), "seam contains values out of bounds"

    return seam


def remove_seam(image, seam):
    """Remove a seam from the image by filling the output image out with the original image image. This function can be used in reduce function and reduce_forward function. Seam: An array of shape (H,). The i-th element of the array holds the i-th row index. That is, the position of each element in Seam is (I, seam[I]). Return value: out: array of shape (H, w-1, C) or (H, W-1). Make sure 'out' and 'image' are of the same type.""

    # If it is a 2-dimensional image (grayscale image), add dimensions, that is, the image becomes (H, W, 1)
    if len(image.shape) == 2:
        image = np.expand_dims(image, axis=2)

    out = None
    H, W, C = image.shape
    out = np.zeros((H, W - 1, C), dtype=image.dtype)        Delete one pixel per line
    ### YOUR CODE HERE
    for i in range(H):
        for j in range(W-1):
            if j<seam[i]:
                out[i][j]=image[i][j]
            if j>=seam[i]:
                out[i][j]=image[i][j+1]
    ### END YOUR CODE
    out = np.squeeze(out)  # remove the dimension of shape 1. That is, if the previous dimension is increased, the added dimension is removed

    Make sure 'out' and 'image' are of the same type
    assert out.dtype == image.dtype, \
       "Type changed between image (%s) and out (%s) in remove_seam" % (image.dtype, out.dtype)

    return out


def reduce(image, size, axis=1, efunc=energy_function, cfunc=compute_cost):
    ""Seam carving algorithm was used to reduce the image size. Each time we loop we remove the seam with the least energy. Loop through this operation until we get the desired image size. -efunc -cfunc -backtrack_seam -remove_seam Args: image: array of shape (H, W, 3) size: height or width of target (determined by axis) Axis: Reduce width (axis=1) or height (axis=0) efunc: the function used to calculate energy cfunc: Functions used to calculate cost (including backtrack and forward versions) directly use cFUNc (image, energy) to call cFUNc to calculate cost. Compute_cost Returns: out: Output size is (size, W, 3) if axis=0, output size is (H, size, 3) if axis=1.""

    out = np.copy(image)
    if axis == 0:
        out = np.transpose(out, (1, 0, 2))

    H = out.shape[0]
    W = out.shape[1]

    assert W > size, "Size must be smaller than %d" % W

    assert size > 0, "Size must be greater than zero"

    ### YOUR CODE HERE
    while(out.shape[1]>size):
        energy = efunc(out)
        vcost,vpaths=cfunc(out, energy)
        end = np.argmin(vcost[-1])
        seam = backtrack_seam(vpaths, end)
        out=remove_seam(out, seam)
    ### END YOUR CODE

    assert out.shape[1] == size, "Output doesn't have the right shape"

    if axis == 0:
        out = np.transpose(out, (1, 0, 2))

    return out


def duplicate_seam(image, seam):
    """Copy the pixels on Seam so that they appear twice. This function would be called by enlarge_naive and enlarge. Seam: An array of shape (H,). The i-th element of the array holds the i-th row index. That is, the position of each element in Seam is (I, seam[I]). Returns: out: array of shape (H, W + 1, C)"""
    if len(image.shape) == 2:
        image = np.expand_dims(image, axis=2)
    H, W, C = image.shape
    out = np.zeros((H, W + 1, C))
    ### YOUR CODE HERE
    for i in range(H):
        for j in range(W+1):
            if j<=seam[i]:
                out[i][j]=image[i][j]
            if j>seam[i]:
                out[i][j]=image[i][j-1]
    ### END YOUR CODE

    return out


def enlarge_naive(image, size, axis=1, efunc=energy_function, cfunc=compute_cost):
    """Copy Seam to increase the size of the image. For each loop, we copy seam, the one with the lowest energy in the image. Repeat the process until the image size meets the requirements. -efunc -cfunc -backtrack_duplicate_seam Args: image: array of shape (H, W, C) size: target height or width (determined by axis) Axis: Increase width (axis=1) or height (axis=0) efunc: the function used to calculate energy cfunc: Functions used to calculate cost (including backtrack and forward versions) directly use cFUNc (image, energy) to call cFUNc to calculate cost. Compute_cost Returns: out: Output size is (size, W, C) if axis=0, output size is (H, size, C) if axis=1.""

    out = np.copy(image)
    if axis == 0:
        out = np.transpose(out, (1, 0, 2))

    H = out.shape[0]
    W = out.shape[1]

    assert size > W, "size must be greather than %d" % W

    ### YOUR CODE HERE
    while(out.shape[1]<size):
        energy = efunc(out)
        vcost,vpaths=cfunc(out, energy)
        end = np.argmin(vcost[-1])
        seam = backtrack_seam(vpaths, end)
        out=duplicate_seam(out, seam)
    ### END YOUR CODE

    if axis == 0:
        out = np.transpose(out, (1, 0, 2))

    return out


def find_seams(image, k, axis=1, efunc=energy_function, cfunc=compute_cost):
    """Find the k seams with the least energy in the image. We can record the first K seams as removed and then use the function enlarge to copy them. The problem is that each time you remove a Seam from the image, the relative position of the pixels changes, so you can't remove it directly. To solve this problem, we define two matrices, Lays and Indices. They fit perfectly into the original image to record where seam is placed in the original image each time it is removed. The Indices matrix, along with the image image, becomes smaller as Seam is removed and is used to record the location of Seam in the image each time it is removed. We also use it to track Seam's position in the original image. -efunc -cfunc -backtrack_seam -remove_seam parameter: image: array of shape (H, W, C) k: number of seams to find axis: Find efunc on width (axis=1) or height (axis=0) : Functions used to calculate cost (including backtrack and forward versions) directly use cFUNc (image, energy) to call cFUNc to calculate cost. Lower seams: arrays of size (H, W)"""

    image = np.copy(image)
    if axis == 0:
        image = np.transpose(image, (1, 0, 2))

    H, W, C = image.shape
    assert W > k, "k must be smaller than %d" % W

    Generate indices to remember the indexes of the original pixels
    Indices [row, COL] specifies the COL value of the current pixel.
    Indices [row, COL]
    # By doing this, we record the pixel coordinates in pixel values. Since row values do not change, even while removing Seam
    We can also trace the original pixels from Seam
    For an image of shape (2, 4), the shape of the initial 'indices' matrix is:
    # [[0, 1, 2, 3],
    # [0, 1, 2, 3]
    indices = np.tile(range(W), (H, 1))  # array of size (H, W)

    * We use the Seams array to record the seams that have been deleted
    # In the Seams array, the seams line I will record as I +1 pixels (seam numbers start at 0).
    For example, the first two parts of a picture (3, 4) might look like this:
    # [[0, 1, 0, 2],
    # [1, 0, 2, 0],
    # [1, 0, 0, 2]
    # As you can see, pixels with a value of 1 or 2 can form a seam
    seams = np.zeros((H, W), dtype=np.int)

    # Loop to find k seam
    for i in range(k):
        # Get the current best seam, so you can compare it with the functions you wrote earlier
        energy = efunc(image)
        cost, paths = cfunc(image, energy)
        end = np.argmin(cost[H - 1])
        seam = backtrack_seam(paths, end)

        # Remove the current seam
        image = remove_seam(image, seam)

        Save the seam with I +1 in the image
        # Check to see if all the paths this seam passes through are 0
        assert np.all(seams[np.arange(H), indices[np.arange(H), seam]] == 0), \
            "we are overwriting seams"
        seams[np.arange(H), indices[np.arange(H), seam]] = i + 1

        At the same time, we remove Seam from the indices array to make it the same shape as the image shape
        indices = remove_seam(indices, seam)

    if axis == 0:
        seams = np.transpose(seams, (1, 0))

    return seams


def enlarge(image, size, axis=1, efunc=energy_function, cfunc=compute_cost):
    """By replicating lower-energy seams, we can enlarge the images. First, we use the function find_seams to capture k lower-energy seams. They fit perfectly into the seams. Then we loop k times to copy the seams. We use the -find_shew-duplicate_seam function to copy seams. Find efunc on width (axis=1) or height (axis=0) : Functions used to calculate cost (including backtrack and forward versions) directly use cFUNc (image, energy) to call cFUNc to calculate cost. Default compute_cost Returns: out: If axis=0, output is an array of size (size, W, C). If axis=1, output an array of (H, size, C)"""

    out = np.copy(image)
    Determine whether the transpose is required
    if axis == 0:
        out = np.transpose(out, (1, 0, 2))

    H, W, C = out.shape

    assert size > W, "size must be greather than %d" % W

    assert size <= 2 * W, "size must be smaller than %d" % (2 * W)

    ### YOUR CODE HERE
    K=size-W
    seams=find_seams(out, K)
    for k in range(K):
        seam = - np.ones(H, dtype=np.int)
        for i in range(H):
            for j in range(W+k):
                if k+1==seams[i][j]:
                    seam[i]=j
        out=duplicate_seam(out, seam)
        seams=duplicate_seam(seams, seam)
    ### END YOUR CODE

    if axis == 0:
        out = np.transpose(out, (1, 0, 2))

    return out


def compute_forward_cost(image, energy):
    """Calculate forward cost map (vertical) and corresponding Seams paths. From the first line, calculate the cumulative energy sum of each pixel point, namely cost. Pixel cost is defined as the minimum cumulative energy sum of pixels on the same Seam, starting from the top. Also, make sure you have added the new energy introduced by removing the Pixel to the original cost. As before, we need to go back to this path. There are only three possibilities for the value of each pixel on the path: -1,0, and 1, where -1 means the current pixel is connected to the element in its upper left corner,0 means the current pixel is connected to the element directly above it, and 1 means the current pixel is connected to the element to the upper right of it. Parameter: image: not used in this function (left here to have the same interface as the compute_forward_cost function) Array of shape (H, W) with elements -1, 0, or 1"""

    image = color.rgb2gray(image)
    H, W = image.shape

    cost = np.zeros((H, W))
    paths = np.zeros((H, W), dtype=np.int)

    # initialization
    cost[0] = energy[0]
    for j in range(W):
        if j > 0 and j < W - 1:
            cost[0, j] += np.abs(image[0, j+1] - image[0, j-1])
    paths[0] = 0  We don't need to consider the first row of the Paths matrix

    ### YOUR CODE HERE
    for i inrange(1,H): m1 = np.insert(image[i, 0:W - 1], 0, 0, axis=0) m2 = np.insert(image[i, 1:W], W - 1, 0, axis=0) m3 = image[i - 1] v = abs(m1 - m2) v[0] = 0 v[-1] = 0 l = v + abs(m3 - m1) r = v + abs(m3 - m2) l[0] = 0 r[-1] =  0 i1 = np.insert(cost[i - 1, 0:W - 1], 0, 1e10, axis=0) i2 = cost[i - 1] i3 = np.insert(cost[i - 1, 1:W], W - 1, 1e10, axis=0) C = np.r_[i1 + l, i2 + v, i3 + r].reshape(3, -1) cost[i] = energy[i] + np.min(C, axis=0) paths[i] = np.argmin(C, axis=0) - 1### END YOUR CODE

    Make sure paths contains only -1, 0, or 1
    assert np.all(np.any([paths == 1, paths == 0, paths == -1], axis=0)), \
           "paths contains other values than -1, 0 or 1"

    return cost, paths

Copy the code

Running instance

Initial configuration
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
from skimage import color
from skimage import io, util
from time import time
from IPython.display import HTML
import seam_carving

%matplotlib inline
plt.rcParams['figure.figsize'] = (30.0, 24.0) Set the default size
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Automatic reload of external modules
%load_ext autoreload
%autoreload 2

# load image
img = io.imread('imgs/wave.jpg')
img = util.img_as_float(img)

plt.title('Original Image')
plt.imshow(img)
plt.show()
Copy the code

# Calculate the image energy
energy = energy_function(img)
plt.title('Energy')
plt.axis('off')
plt.imshow(energy)
plt.show()
Copy the code

# Vertical energy loss
start = time()
vcost, _ = compute_cost(_, energy, axis=1)  The first argument is not required
end = time()

print("Computing vertical cost map: %f seconds." % (end - start))

plt.title('Vertical Cost Map')
plt.axis('off')
plt.imshow(vcost, cmap='inferno')
plt.show()
Copy the code

# Horizontal energy loss
start = time()
hcost, _ = compute_cost(_, energy, axis=0)
end = time()

print("Computing horizontal cost map: %f seconds." % (end - start))

plt.title('Horizontal Cost Map')
plt.axis('off')
plt.imshow(hcost, cmap='inferno')
plt.show()
Copy the code

Using the cost map we found above, we can determine seam which has the least energy in the image. We can then remove the Seam and repeat this step until we get the image size we want.

Reduce the width of the image
H, W, _ = img.shape
W_new = 400

start = time()
out = reduce(img, W_new)
end = time()

print("Reducing width from %d to %d: %f seconds." % (W, W_new, end - start))

plt.subplot(2, 1, 1)
plt.title('Original')
plt.imshow(img)

plt.subplot(2, 1, 2)
plt.title('Resized')
plt.imshow(out)

plt.show()
Copy the code

Seam carving results as follows

Seam carving can also be executed horizontally

# Reduce the height of the image
H, W, _ = img.shape
H_new = 300

start = time()
out = reduce(img, H_new, axis=0)
end = time()

print("Reducing height from %d to %d: %f seconds." % (H, H_new, end - start))

plt.subplot(1, 2, 1)
plt.title('Original')
plt.imshow(img)

plt.subplot(1, 2, 2)
plt.title('Resized')
plt.imshow(out)

plt.show()
Copy the code

Image expanding

We can also solve the problem the other way around, which is to scale up the image. One of the simplest ways to do this is to add the least energy Seam over and over again (the seam that has been used cannot be used again) until the size fits our requirements.

W_new = 750

start = time()
out = enlarge(img, W_new)
end = time()

# It takes about 20 seconds
print("Enlarging width from %d to %d: %f seconds." \
      % (W, W_new, end - start))

plt.subplot(2, 1, 1)
plt.title('Original')
plt.imshow(img)

plt.subplot(2, 1, 2)
plt.title('Resized')
plt.imshow(out)

plt.show()
Copy the code

The original image

Enlarged image

I always believe that many models in deep learning are black box models, in which you can’t see the changes of data. Why the accuracy of this network model is higher than that of another network model is a question that has bothered me for a long time…. I guess I’m just not good enough.

Seam carving algorithm demonstrates very beautiful research methodology and results through simple mathematical models and ideas, just as in the Theory of Everything, Redmayne played Hawking said: Wouldn’t that be nice,Professor? With one simple elegant equation to explain everything.

In fact, Seam carving still has a lot of room for optimization. For example, the image edge created by Forward energy is smoother. Due to excessive writing, readers may not have enough energy to read the article again

If you like this blog post, please give it a thumbs up! If you like my style and want to know more about it, please follow me. I will update you occasionally with interesting graphics and visual knowledge and add my own unique understanding

Your support is the biggest affirmation for me.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Computational vision — content-based image scaling

Problems with image scaling

The Letterbox and Pillarbox

Pan&Scan

Pixels to remove

Seam Carving

Seam Carving Algorithm

Actual code

Running instance

Image expanding

Computational vision — content-based image scaling

Problems with image scaling

The Letterbox and Pillarbox

Pan&Scan

Pixels to remove

Seam Carving

Seam Carving Algorithm

Actual code

Running instance

Image expanding

Related Posts

Whose kitten are you?

From zero to one, implement your first multilayer neural network

Machine learning parameter server PS-Lite (2) —– communication module Van