Interviewer: Why don’t you tell me something you’ve been reading lately? We can pull it out and talk about it

Candidate: I’ve been reading about “deduplication” and “idempotence.

Interviewer: Why don’t you start by talking about your understanding of de-duplication and idempotence

Candidate: I think idempotent and deheavy are very similar, and I can’t tell you the exact difference between them

Candidate: Let me give you my personal understanding. I don’t know if it’s right

Candidate: “de-duplicate” is the de-duplication of a request or message “N times in a certain period of time”

Candidate: idempotence guarantees that a request or message will be processed “at any time” and that its results will be consistent

Candidates: Either “deduplicated” or “idempotent”, you need to have a “unique Key” for a pair, and somewhere to “store” the unique Key

Candidate: Take the project as an example. The message management platform I maintain has the function of “de-weighting” : “De-weighting of the same content messages in 5 minutes”, “de-weighting of templates in 1 hour”, “de-weighting of channels for N times in a day”…

Candidate: Again, the essence of idempotent and de-heavy: “unique Key” + “store”

Interviewer: How did you do that

Candidates: Unique keys are different in different business scenarios, depending on the business

Candidates: There are a lot of storage options, such as “local cache”/” Redis “/” MySQL “/” HBase “etc. What to choose is also relevant to the business

Candidates: For example, in the “message management platform” scenario, I store the selected “Redis” (excellent read/write performance), and Redis also has an “expiration time” to solve the “within a certain amount of time” problem

Candidate: And unique keys, of course, are built differently for different businesses.

Candidate: for example, “5 minutes to delete the same content message”, I directly request MD5 parameter as the unique Key. In 1-hour template de-duplication mode, “template ID+userId” is used as the unique Key. In one-day channel de-duplication mode, “Channel ID+userId” is used as the unique Key.

Interviewer: Now that we’re talking about de-weighting, have you heard of bloom filters?

Candidate: Of course you know

Interviewer: Talk about bloom filters. Why don’t you use them?

Candidate: The underlying data structure of a Bloom filter can be understood as a bitmap, which can also be understood simply as an array with elements that store only zeros and ones, so it takes up relatively little space

Candidate: When an element is stored in a bitmap, it is actually to see where it is stored in the bitmap, which is usually a hash algorithm, marked as 1

Candidates: a position marked 1 indicates presence, a position marked 0 indicates absence

Candidate: A Bloom filter can be used to determine the presence of elements with a lower footprint for de-weighting, but it has its drawbacks

Candidate: As long as the use of hash algorithm is inseparable from “hash conflict”, resulting in the existence of “misjudgment” situation

Candidate: In bloom filters, if an element is judged to exist, it is “not necessarily” real. If the element is judged not to exist, it certainly does not exist

Candidate: I don’t have to explain much, do I? (A combination of “hashing” and “1 indicates existence, 0 indicates nonexistence” results.)

Candidate: Bloom filters also cannot “delete” elements (another limitation of hashing, it is impossible to locate an element in bloom filters)

Candidate: The bloom filter implementation can be directly implemented by Guava if needed, but this is standalone

Candidate: Distributed bloom filters are now Redis, but not all companies deploy the Redis version of Bloom filters (there are limitations, as my previous company did not).

Candidate: So, I’m currently working on projects that don’t use Bloom filters.

Candidates: If de-duplication is expensive, consider setting up the logic of multi-layer filtering

Candidates: For example, see if the “local cache” can filter some of the remaining “strong check” to “remote storage” (commonly Redis or DB) for secondary filtering

Interviewer: Well, that reminds me of the last time you answered Kafka

Interviewer: You said you were at least one + idempotent in dealing with orders

Interviewer: Idempotent processing: Redis is used for pre-filtering, DB unique index is used for strong consistency check, also to improve performance, right?

Interviewer: The only Key seems to be “Order number + order status”

Candidate: Interviewer you have a good memory!

Candidate: Generally we need to check the consistency of data, directly MySQL (DB), after all, there is transaction support

Candidate: “Local caching” that can be a “front-loaded” judgment if it suits the business

Candidate: Redis High performance read and write, pre judgment and post (:

Candidate: HBase is generally used in scenarios with a large amount of data (Redis memory is too expensive and DB is not flexible enough to store a large amount of data in a single table)

Candidates: As for idempotent, the general storage is still “Redis” and “database”

Candidates: The very, very common one is database “unique indexes” for idempotent implementations (several projects I’ve worked on have used this)

Candidate: Building a “unique Key” is a business matter (usually concatenated with your own business ID to produce a “meaningful” unique Key)

Candidates: Of course, you can also use “Redis” and “MySQL” to implement distributed locks for idempotent (:

Candidate: But Redis distributed locking is not completely secure, while MySQL implements distributed locking (optimistic locking and pessimistic locking depends on the business, I am not used)

Candidates: There are many idempotent solutions out there, essentially variations on “storage” and “unique keys” and a name for them…

Candidates: Generally speaking, the same thing as the same thing

Interviewer: HMM… To understand the

Welcome to follow my wechat official account [Java3y] to talk about Java interview, on line interview series continue to update!

Online Interviewer – Mobile seriesTwo continuous updates a week!

Line – to – line interviewers – computer – end seriesTwo continuous updates a week!

Original is not easy!! Three times!!