A sequence of

This article belongs to the NLP study Notes series.

What are the disadvantages of the one-hot representation?

It doesn’t show direct similarity, Euclidean distance, cosine similarity.

Weakness 2: Sparsity

Two word vector

Distributed Representation

1. The distributed representation length has nothing to do with the dictionary length, and can be customized to solve the sparsity problem.

The 2 vector is essentially non-zero.

So let’s just understand the difference between this and this, but we don’t care where the data came from.

So let’s verify that. Can this representation solve the problem of word similarity?

The data fits our intuition.

Here we use distributed representations, also called Word vectors

Consider:

How many words can the 1.100-dimensional one-hot model express at most?

100 words.

2. In distributed representation, how many words can 100 dimensions express at most?

An infinity. For ease of understanding, each dimension is simplified to 0.1. And you could say 2 to the 100,

Next, learn the distributed vectors for each word.

Enter enough data (10^9 words) to train the word vector using a model that defines dim dimensions and the rest of the parameters are model-specific.

This requires large enough data and resources to calculate, which can be used by dachang already calculated word vector. Vertical field: finance, medical need self-training.

Word vector represents the meaning of a word.

Word2vec, in a sense a word, projects dim dimensional data into two or three dimensions.

We want the word vector to express the meaning of the word more accurately.

Words with similar semantics are clustered together.

Expression of sentence word vector from word vector

1. The law of average

Add the average to get the vector of the sentence. For the similarity between sentences, you can use cosine similarity.