The Shadow is Within.
Blind watermarking and steganography
Blind watermarking
A, presentations,
First of all, this is a girlfriend
Decoding of the watermark
Next we type a magical command:
python bwm.py --action decode --origin Demo.jpg --im .. /Gakki.jpg --result res.jpg
You can get a picture like this:
After who rob girlfriend with you again can so statement copyright hey hey.
(The script and the original image are in the appendix at the end. Interested friends just need to save the above image as demo.jpg and the original image in the appendix as gakki.jpg to decode the above information.)
Encrypted watermark
Today’s method allows you to encrypt messages by putting them in any image.
The script in the appendix, encryption usage:
python bwm.py --action encode --origin Gakki.jpg --im wm1.png --result Demo.jpg --alpha 2
Second, the purpose
The watermark above is called blind watermark. The hidden watermark is added to audio, picture or film in the form of digital data, but it can not be seen under normal conditions. One of the most important applications of hidden watermarking is to protect copyright in order to avoid or prevent unauthorized copying and copying of digital media.
1. Different people add the same watermark
Copyright statement
Application case:
- Some painters, photographers, and designers add watermarks to their work.
In about 13 years, there was a self-described "hyperrealism" painter, who claimed that his pure manual painting can exceed the degree of realism of the camera, and opened training classes to collect money. After a photographer added a blind watermark, the original "paintings" are directly photoshopped into hand-painted texture of the picture.Copy the code
- Taobao anti-theft map function
Taobao sellers will be automatically watermark taobao, if there are other sellers save the picture as their own picture upload will be detected.Copy the code
2. Different people add different watermarks
When a certain confidential digital data is sent to different people, different identifiers may be added. If the data is copied or transmitted, the responsible person may be investigated according to the unique identifiers decoded.
Application case:
-
When the film is first released, different invisible watermarks will be added to the film film in each theater, and if the film is leaked, the theater can be held responsible.
-
The internal forum and platform of the domestic big factory will add enough unique identifiers in the HTML page and not be found. When sensitive internal information is leaked through screenshots and other means, it can also be traced to individuals.
Third, the principle of
Schematic diagram
Fourier transform
- Just a quick review of the Fourier transform
The Fourier transform is simply the transformation of a signal from a function in the time domain or space domain to a representation in the frequency domain, and it has many applications in science and engineering. Because its basic idea was first proposed systematically by The French scholar Joseph Fourier.
- Let’s think about the time domain and the frequency domain
Now, what does the Fourier transform do,
-
So let me draw a sine of x on the paper, it’s not necessarily standard, but it’s pretty much the same thing. It’s not so hard.
-
Ok, so let’s draw a sine of 3x plus sine of 5x. This one is hard to draw.
Now I’m going to give you the sine of 3x plus sine of 5x, and I can’t tell you what the equation for this whole thing looks like just by looking at the graph, so now I’m going to take the sine of 5x out of the graph, and see what’s left. It’s almost impossible to do.
But what about in the frequency domain? It’s as simple as a few vertical lines.
This is the simplest usage; the other more complex uses will not be covered here.
spectrum
Once you understand the transformation of the one-dimensional signal, what does the spectrum of the image look like.
The bright parts of the image are the low frequencies (flat) and the dark parts are the high frequencies (abrupt boundary). Generally, the low frequency part of the spectrum is moved to the center for illustration. There is no one-to-one correspondence between the points in the spectrum and the original image, and each point in the spectrum is derived from all the images (similar to the points in the time domain curve and the points in the frequency domain graph).
This might not be intuitive enough, but let’s look at this.
This is a 400×400 picture with 160,000 pixels.
So how do we represent a picture, first of all, in Cartesian coordinates we use x and y to locate a certain point. So, how do we describe this point?
As we know, all colors are made up of three primary colors. Life often said red, yellow, blue (cyan), in fact, is a subtraction type of three primary colors, optical three primary colors are red, green, blue, that is, R, G, B.
Usually we used to describe the image point is the value of RGB, in fact, the image processing is the Gray scale (Gray scale) to represent the image, but in order to facilitate understanding, the following is the RGB demonstration.
The figure above is a curve graph made by intercepting the value of a row of RGB. It can be seen that each curve is constantly fluctuating up and down, and the frequency of fluctuation is the same. In some areas the fluctuations were relatively small, while in others there were sudden and large fluctuations.
When you compare the images, you can see that the places where the curve fluctuates are also the places where the image changes.
Image spectrum can be understood as the one-dimensional spectrum around the longitudinal axis of rotation, the formation of a 3 d mathematical function chart (the original center of symmetry, mirror symmetry can do it, and other similar), x, y axis represents the frequency of the two directions, the z axis represents the amplitude, frequency spectrum image is only a 2 d figure, so use brightness to represent the amplitude.
The physical meaning of two-dimensional Fourier transform is to transform the grayscale distribution function of the image into the frequency distribution function of the image.
Characteristics of blind watermarking
Robustness generally requires resistance to (compression, clipping, painting, rotation).
-
concealment
Because we do not want to be detected, do not want to interfere with the user experience, do not want to be imitated, etc., our watermark is not visible, that is, hidden.
-
Non-remotability
Hard to remove is similar to robustness, except that:
Robustness emphasizes that digital resources should not be interfered and destroyed unconsciously in the process of transmission.
Not easy to remove is in the ulterior motives to detect the existence of blind watermark, they will not consciously remove or destroy.
-
robustness
Robustness is also commonly referred to as Robustness, from the transliteration of its English name.
Simply put, it is endurance.
It is important to note that robustness and concealment do not usually go together.
-
clarity
There’s nothing to be said for it, except that a blind watermark needs to represent a clear message.
Four,
Figure of
For example, if this is a plain-looking image, you can save it, change the suffix to “RAR” or open it directly with the decompression tool to see the mysterious benefits.
| ू, omega, `)
It is also very easy to make a “map”. In win, enter the following command to make a “map”.
copy /b A.jpg + B.zip C.jpg
About a decade ago, maps were widely uploaded to forums and other places to disseminate resources. Later, many websites will judge the logo at the end of the picture when uploading pictures, and then all the discarded, slowly no longer used. (Sm.ms/this map bed is very good, it can still parse the seeds after testing)
Hidden files
Images can be combined with seed files, as well as with other files.
In fact, hidden files and blind watermarking are both image steganography.
Image steganography
Steganography is another application of digital watermarking, in which two parties can communicate using information hidden in digital signals.
The annotated data in a digital photo can record information such as when the photo was taken, the aperture and shutter used, and even the camera’s brand, which is one of the applications of digital watermarking.
Some file formats can contain this additional information, called “metadata.”
use
Avoid sensitive word filtering
The so-called “sensitive word filter”, often over the wall of students, should be very familiar with. Hide information with pictures to avoid sensitive word filtering by the GFW.
Avoid visual inspection
Many websites in China, for uploaded pictures, will conduct manual review. If the information can be hidden in the pictures by technical means, and the pictures themselves do not see anything different, the human review will not see.
Transmit encrypted information
Material, information, etc that is not expected to be seen by others.
Common methods
The principle of
Content coverage method
Generally, an image file has two parts: the header and the data section.
And “content overlay method”, is to hide the file, directly [overlay] to the image file [data area] [tail].
For example, if an image is 100KB and the file header is 1KB, the data area is 99KB. That is, you can only hide a maximum of 99KB of files.
Remember: when overwriting, do not destroy the header. Once the header is corrupted, the image file is no longer a valid image file.
Using this method, the image file format, is careful – best use 24 bit color BMP format.
-
BMP format itself is relatively simple, data area cover casually, no problem;
-
BMP with 24 bit color has a larger file size and can hide more content than other BMP formats.
import sys
def embed(container_file, data_file, output_file) :
""" "The code does not strictly calculate the BMP file header size, only roughly 1024 bytes reserved. """
container = open(container_file, "rb").read()
data = open(data_file, "rb").read()
if len(data)+1024> =len(container) :
print("Not enough space to save " + data_file)
else :
f = open(output_file, "wb")
f.write(container[ : len(container)-len(data)])
f.write(data)
f.close()
if "__main__" == __name__ :
try :
if len(sys.argv) == 4 :
embed(sys.argv[1], sys.argv[2], sys.argv[3])
else :
print("Usage:\n%s container data output" % sys.argv[0])
except Exception as err :
print(err)
Copy the code
LSB Lowest significant bit
A lot of commercial software uses this principle.
For example, in the storage of PNG images, each color will have 8 bits, LSB (Least Significant Bit) steganography is to modify the image number of the lowest 1bit, the human eye can not see the difference, also hide the information. (Each image can carry 3 bits of information.)
For example, if we want to hide ‘A’, as shown in the figure below, we can convert A to 0x61 in hexadecimal and then to 01100001 in binary, and then change the lowest bit of the red channel to these binary strings.
The last
-
Attached is an implementation of the previous demo code:
(Several projects on Git Hub are referenced, but none are very robust)
# coding=utf-8 import cv2 import numpy as np import random import os from argparse import ArgumentParser ALPHA = 5 class BlindWaterMark() : """ Blind watermark encryption and decryption, simple version without frequency shift """ def __init__(self) : self.parser = ArgumentParser() self.parser.add_argument('--action', dest='action', required=True) self.parser.add_argument('--origin', dest='ori', required=True) self.parser.add_argument('--img', dest='img', required=True) self.parser.add_argument('--result', dest='res', required=True) self.parser.add_argument('--alpha', dest='alpha', default=ALPHA) def encode(self, ori_path, wm_path, res_path, alpha) : img = cv2.imread(ori_path) img_f = np.fft.fft2(img) # 2-dimensional discrete Fourier Transform height, width, channel = np.shape(img) watermark = cv2.imread(wm_path) wm_height, wm_width = watermark.shape[0], watermark.shape[1] # Watermark random code x, y = range(height / 2), range(width) random.seed(height + width) # Random number decoding is controllable random.shuffle(x) random.shuffle(y) # Symmetrical the watermark according to the target image size tmp = np.zeros(img.shape) # Generate a zero-filled matrix based on the image shape for i in range(height / 2) :for j in range(width): if x[i] < wm_height and y[j] < wm_width: tmp[i][j] = watermark[x[i]][y[j]] tmp[height - 1 - i][width - 1 - j] = tmp[i][j] res_f = img_f + alpha * tmp # Original image frequency domain value + watermark frequency domain value res = np.fft.ifft2(res_f) # Inverse Fourier transform res = np.real(res) # Convert to real numbers cv2.imwrite(res_path, res, [int(cv2.IMWRITE_JPEG_QUALITY), 100]) def decode(self, ori_path, img_path, res_path, alpha) : ori = cv2.imread(ori_path) img = cv2.imread(img_path) ori_f = np.fft.fft2(ori) img_f = np.fft.fft2(img) height, width = ori.shape[0], ori.shape[1] watermark = (ori_f - img_f) / alpha watermark = np.real(watermark) res = np.zeros(watermark.shape) random.seed(height + width) x = range(height / 2) y = range(width) random.shuffle(x) random.shuffle(y) for i in range(height / 2) :for j in range(width): res[x[i]][y[j]] = watermark[i][j] res[height - i - 1][width - j - 1] = res[i][j] cv2.imwrite(res_path, res, [int(cv2.IMWRITE_JPEG_QUALITY), 100]) def run(self) : options = self.parser.parse_args() action = options.action ori = options.ori img = options.img res = options.res alpha = float(options.alpha) if not os.path.isfile(ori): parser.error("image %s does not exist." % ori) if not os.path.isfile(img): parser.error("watermark %s does not exist." % img) if action == "encode": self.encode(ori, img, res, alpha) elif action == "decode": self.decode(ori, img, res, alpha) if __name__ == '__main__': bwm = BlindWaterMark() bwm.run() Copy the code
Steganography is a very deep, very wide application of knowledge, here is very extensive, right as a brick to introduce jade. Steganography is just one of them, and those of you who are interested can read the following book.
Hughes Chen, BACKGROUND development engineer at UCloud
The blog is ulyc.github. IO /