Siamese Network twin neural network — a simple and magical structure
- Origin of name
Siamese is similar to Chinese. Siam is the ancient Thai name, which is translated as Siam in Chinese. A Siamese is a Siamese or Thai person. Siamese means “twin” or “joined together” in English. Why?
Thailand was born in the 19th century the conjoined twins, the medical technology cannot make two people, hence two people stubbornly life life, found in 1829 by British businessman, in the circus, performance in many parts of the world, in 1839 their visit to the United States north Carolina became a “circus” lingling, and eventually become a citizen of the United States. On April 13, 1843, he married two sisters in England. Eun gave birth to 10 children and Chang gave birth to 12. When the sisters quarreled, the brothers would take turns to live with each wife for three days. He died of consumption in 1874, and the other soon followed, both at the age of 63. Their livers are still preserved at the Matte Museum in Philadelphia. The “Siamese twins” have since become synonymous with conjoined twins and helped bring attention to this particular condition around the world.
image
How conjoined twins managed to have 22 children each with their British sisters is up to the imagination.
To put it simply, Siamese network is “conjoined neural network”. The “conjoined” of neural network is realized by sharing weights, as shown in the figure below.
image
You might be wondering: What does sharing weights mean? The right and left neural networks have exactly the same weight, right?
A: Yes, when the code is implemented, it can even be the same network without implementing another one, because the weights are the same. For Siamese network, both sides can be LSTM or CNN.
And you might wonder: what do you call two different neural networks instead of sharing weights?
A: Pseudo-siamese network, a pseudo-twin neural network, is shown in the figure below. For a pseudo-siamese network, the two sides can be different neural networks (such as LSTM and CNN), or they can be the same type of neural network.
image
2. What is the use of twin neural networks?
In simple terms, it measures how similar two inputs are. The twin neural network has two inputs (Input1 and Input2) and feeds the two inputs into two neural networks (Network1 and Network2), which map the inputs to a new space and form a representation of the inputs in the new space. Through the calculation of Loss, the similarity of two inputs is evaluated.
According to the information I found, Yangle Village published a paper “Signature Verification using a ‘Siamese’ Time Delay Neural Network” in NIPS 1993 for the Verification of signatures on American checks. That is to verify that the signature on the check is consistent with the bank’s reserved signature. In 1993, Yangle village was using two convolutional neural networks to do signature verification, and I was just born, as a new neural network, just a few years old, trained by my parents.
image
With the rise of SVM and other algorithms, neural network has been forgotten by people. Fortunately, some persistent people stick to the position of neural network research. In 2010, Hinton published an article titled “Rectified Linear Units Improve Restricted Boltzmann Machines” on ICML, which was used to verify human face and achieved good results. The principle is simple: feed two faces into the convolutional neural network and output same or different.
image
What? Siamese network can only be dichotomized?
No, no, no. There’s a whole bunch of other things it can do, and we’ll get to that later.
3. What scenarios are twin neural network and pseudo-twin neural network respectively applicable to?
Conclusion: Twin neural network is used to deal with the “similar” case of two inputs. Pseudo – twin neural network is suitable to deal with the case of “certain difference” between two inputs. For example, Siamese network is suitable for calculating the semantic similarity of two sentences or words. If you verify that the title matches the description of the text (the title and the length of the text differ greatly), or that the text describes an image (one is an image, the other is text), you should use the pseudo-siamese network. In other words, it is necessary to determine which structure and Loss should be used according to specific applications.
4. Which kind of Siamese Network Loss Function is generally used?
Softmax is certainly a good choice, but not necessarily the best choice, even in classification issues. The traditional Siamese network uses Contrastive Loss. There are more options for loss functions. The original purpose of Siamese Network is to calculate the similarity of two inputs. The left and right neural networks convert the input into a “vector”. In the new space, the similarity can be calculated by judging cosine distance. Cosine is an option, exp function is also an option, Euclidean distance is ok, the goal of training is to make the distance between two similar inputs as small as possible and the distance between two different categories as large as possible. I don’t have much experience with other distance measures, here’s a quick word on the difference between Cosine and EXP in NLP.
According to the experimental analysis, cosine is more suitable for semantic similarity measurement at lexical level, while EXP is more suitable for text similarity measurement at sentence level and paragraph level. The reason may be that cosine only counted the Angle between two vectors, exp also holds the length information of two vectors, while sentences contain more information (no experiments were done to verify this, of course).
In this paper, we used exp distance to do long classification to solve the measurement problem of whether the title and text position are consistent.
5. Siamese network is conjoined twins. Is it ok to conjoined triplets?
Sorry, someone has done this already, called Triplet Network, the paper is “Deep metric Learning using Triplet Network”, the input is three, one positive example + two negative examples, or one negative example + two positive examples, The goal of training is to make the distance between the same categories as small as possible and the distance between different categories as large as possible. Triplet has a very good effect on ciFAR and MNIST data sets, which exceeds Siamese network. Quadruplets? Would quintuplets be cooler? . Not so far…
image
6. What are the uses of Siamese Network?
There are a lot of applications in nlP&CV.
- Semantic similarity analysis of the aforementioned words, question and answer matching in QA, signature/face verification.
- Script recognition can also be used with Siamese Network, github code is available online.
- There is also the question pair competition on Quora on Kaggle, that is, to judge whether two questions are the same question, the champion team uses N multiple features +Siamese network. Zhihu team can also play with this model.
- In terms of images, the visual tracking algorithm based on Siamese network has also become a hot topic “Fully-convolutional Siamese Networks for Object Tracking”.