The interview questions

1. How to exchange the values of two variables?

Method 1: we usually use, the whole intermediate variable:


		let mid 

		mid = x 

		x = y 

		y = mid

Copy the code

Method 2: With bit operations, no auxiliary space is required

But this one is flawed!! You can only convert numbers

Method 3: No need for any auxiliary space highly recommended!!

This method can convert any type


x = [y,y=x][0]

Copy the code

Compare the size

other

Javascript to judge array element | classic interview algorithm bit operations


Bloom filter

Introduce hash functions

A hash function is also called a hash function. The input field of a hash function can be very large, but the output field is fixed.

Properties of hash functions:

1. A typical hash function has an infinite input range

2. If the input value is the same, the return value is the same.

3. If the input value is different, the return value may be the same or different.

If all the blacklists were stored in the database, [insert picture description here] (img – blog. Csdnimg. Cn / 20200404144… =200x)

I’ve calculated that I need 5.8 terabytes of storage space.

But!! They say you have no more than 30GB of space!! So you can’t just store it in the database, you have to use your knowledge of Bloom filters.

I will only systematically talk about what is bloom filter, want to know more about their own to check

Bloom filter can accurately represent a set, can accurately determine whether an element is in the set, the degree of accuracy depends on the user’s specific design, to achieve 100% accuracy is correct is not possible. The advantage of the Bloom filter is that it can achieve high accuracy with very little space.

The principle of

  • When your url is added, a bitarray is a collection of all urls.

    • Creates a binary array of bits, each of which is 1 bit in size and starts with zeros

    • A URL, and we compute it with k hash functions. I get k results. If the result is 0 in the hash table, the position in the hash table is set to 1

  • When you want to find out if a URL is in there, you just run the URL through k hash functions that you set up,

Look at the position of the result in the hash table. If all k positions are 1, then the URL you want to check is in bitarray. Anyway, one of the positions is 0, which means the page is not in bitarray.

So we can sense the flaw, because we can make a mistake, so maybe this page is not in the blacklist, but after k hash functions are calculated, all k positions are 1, and it is considered to be in the blacklist.

! [insert picture description here] (img – blog. Csdnimg. Cn / 20200404145… =600x)

How to set bloom filter?

If bloom’s filter can make a miscalculation, how do you know the probability of miscalculation? Let’s go back to this:

Let’s look at the data in this question:

Number of blacklist urls: 10 billion

Url size: 64B

Error rate allowed: 0.01%

Space limit: 30 GB

  • Bitarray: the value is affected by the number of blacklist urls and the allowed error rate.

    • Formula:

-m: bitarray size (the size of a bitarray item is 1 bit) -n: number of samples -p: allowed error rate - in this problem, n = 10 billion, p=0.01%, find m = 19.19n, round up 20n, 20N bits =! [insert picture description here] = 200 x (https://img-blog.csdnimg.cn/20200404152319845.png) - < font color = the sienna > 23 g less than required 30 g. Compared to the 6,000GB required for the database, it is much less. </font>Copy the code
  • Hash functions: The number is affected by the size of the bitarray and the number of samples; Function design is influenced by URL size

    • Quantity formula:

    -k: indicates the number of hash functions. -m: indicates the bitarray size (the size of a bitarray item is 1 bit). -n: indicates the number of samplesCopy the code
    • In this case, m is equal to 20n, and k is equal to 14, which means you need to set up at least 14 hash functions
  • accurate

    • Formula:

    <font color=sienna> <font color=sienna> <font color=sienna> <font color=sienna> </font>Copy the code

applicable

Tolerance of a certain degree of error rate, more stringent space requirements

  • Web blacklisting System

  • Spam filtering system

  • Crawler’s URL recognition repeat system