Short Link Solution (Hash)
“This is the 9th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”
Short link service scenario of e-commerce
Scene 1: Ali Health Pharmacy SMS
The browser opens the link c.tb.cn/d5.dJPad
Jump to detail.tmall.com/item.htm?id…
Scenario 2: JINGdong SMS
The browser opens the link 3.cn/1iGyAa-x to the following: pro.m.jd.com/mall/active…
Scenario 3: Meituan prefers SMS
There are taobao commodity links to share and so on…
What is a short link
It’s to convert a regular url into a shorter url.
Let’s go back to the generated short chain c.tb.cn/d5.dJPad. Although the link looks a little strange, it is still a link. We can distinguish it from the characteristics of the URL:
c.tb.cn
Is the domain named5.dJPad
Is the parameter
Why do you need short links
Save url length and facilitate social communication
.Control cost and improve experience
For example, if the text is more than 70 characters long, it will be charged for two text messages. If the text is more than 140 characters long, it will be charged for three text messagesAnonymous access
: Hides the access source. In the final page, the Refer field of HTTP cannot be used to obtain the source before the jump.Password access
: For some links to set the password to access, but can’t realize the direct link of the original password control, can through to the original link converted short link access control to realize the purpose (original link at this time there is no access control, but the link generated no discipline, can think that no one can access to the original link), as with password access, Access can also be controlled by limiting access times, access periods, and access blacklists to a short link.Access statistics
: The page view of the original link can be obtained by counting the number of short links. On the mobile Internet, different short links can be generated for different clients (Android, IOS, PC, and MAC) to count the page view of different clients.Simplified QR code
: The complexity of two-dimensional code pattern is positively correlated with the size of the original information, too long link will lead to the generated two-dimensional code picture is too complex, reduce the success rate of two-dimensional code recognition.Reduce weight transfer
: In the search engine field, there is the reality of link weight transfer. If links to A have A lot of people visit, through search engines search link A related content, will put the link to A in the front of the search results, if in A didn’t have A lot of traffic link B to introduce A, let the search engine when scanning link relations give link B slightly higher weight, lead to search link link related content, they’ll put B B on the front of the search results. Use short links to break the chain of dependencies (fromLink A> Link B
becomeLink A> Short link > Link B
).
Short link principle
- Will long link through certain
methods
Generate a short link - When accessing the short link, you actually visit the short link server, and then retrieve the corresponding long link according to the parameters of the short link
- The server returns a 302 status code, setting Location in the response header to: Long link
- The browser
redirect
: a long link - The long-link server returns a response
Graph TD client accesses short link c.tb.cn/d5.dJPad --> C.T.N server C.T.N server --> find original link according to d5.djpad --> Long link back to browser Long link back to browser --> Browser redirect
Core parts of short link service include:
- Long link to short link algorithm
- The corresponding relationship is persistent, which facilitates long links and short links to query each other
Short link service provider
At present, there are many services that provide short link generation. When you need to use short link, you can directly call related services for transformation.
- Sina short link
- SUO short link
- MRW short link
- Baidu short link
Design the short link generation service
First and foremost, apply for a domain name that is as short as possible. The subsequent process is essentially a number issuing process. Whenever a short link is generated for a long link, a unique identifier is generated, attached to the short domain name, and the corresponding relationship is recorded and the generated short link is returned.
If there is a subsequent request for the original link, query the corresponding table. If there is a query, return 301 or 302 and return the original link. As you can see, two key points in the entire system are the unique identification of the generation and corresponding table storage.
As for the service of storing corresponding tables, since it only stores simple K-V relationships, k-V databases like Redis can be used.
Short link algorithm
Algorithm features
- Highly available, services can be deployed to multiple servers without a single point of failure.
- 1:1 mapping between long links and short links. On the one hand, it can solve storage resources, and it can also be convenient to click statistics and other functions.
- Less impact. Short link generation performance does not increase linearly with the number of generated short links increasing.
There are three algorithms commonly used in the industry
1. Transmitter & base conversion
Implementation:
Transmitter (ID increment)+62 base code
Algorithm idea:
- First request generator generation
Non-repeating decimal
digital - Decimal digits are converted to
62 into the system
Why use base 62 conversion? I’ve heard a lot about 64 conversion
- Base 62 conversion because base 62 conversion
Contains only digits, lowercase letters, and uppercase letters
. The 64-bit conversion will contain/
.+
Such symbols (characters that do not match normal URLS) - We can convert base 10 to base 62
To shorten the
Character, if we want 6 characters,62 ^ 6 = 56800235584
There are 56 billion combinations.
Algorithm features:
- The algorithm can support 6-bit character combinations 62^6= more than 50 billion long links.
- The sender can choose database self-increment ID, Snake algorithm, Redis and many other implementations.
For example:
https://juejin.cn/user/474627279698541/posts long link to its id number is the only 10000, and then will get the results of the 10000 to 62 hexadecimal code is: 2 bi
Then my short chain URL can be changed into https://3.cn/2Bi, where 3.cn is the domain name and 2Bi is the parameter converted into base 62. Let’s preserve the relationship between 2Bi and the long link, and the next time we use the short link request we’ll find the long link.
Base Conversion – online tool
Conclusion:
Run the redis autoincrement command to obtain the ID, convert it to base 62, and save the mapping between short links and long links using hash in Redis
Graph of TD https://juejin.cn/user/474627279698541/posts - > get the id value of 10000 obtained from toughening id value of 10000 - > 10000 of 62 into the system for 2 10000 62 base for 2 bi bi Persistence - > relation between 2 bi map at https://juejin.cn/user/474627279698541/posts
2. Random Algorithm & Hash Algorithm (MD5)
Algorithm idea:
The principle of randomization algorithm is to generate strings randomly from long links and then map the strings to the set [0-9][A-z][A-z].
Algorithm is a
- Md5 is used to generate 32-bit signature strings for long urls, which are divided into 4 segments with 8 bytes each.
- Loop through these four segments, take 8 bytes, convert it to hexadecimal string and
0x3FFFFFFF
(1) 30with
Operations, i.e., ignore processing of more than 30 bits; - The 30 bits are divided into six segments, each of which takes a specific character as an index of the alphabet, followed by a 6-bit string;
- The total MD5 string can obtain 4 6-bit strings; I’m going to take any one of these and use it as the short URL of this long URL;
This algorithm, although it produces four, still has a repetition rate.
Algorithm 2
The 62 optional elements are then randomly selected to form a character array of length 64.
- Randomly generate 36 bits of 0/1 random number
- The 36 bits are divided into six parts and the lower six bits are obtained by randomValue>>6.
- (randomValue>>6) &0x3f gets the corresponding character
Algorithm two has a collision probability of 1/64^6=1/2^36.
3. Pre-build & Access allocation
Algorithm idea:
[0-9][A-z][A-Z] Set These 62 bits take 6-bit combination, can produce more than 50 billion combination number.
Mapping numeric and character combinations (short links) produces a unique string
For example, the 62nd combination is AAAAa9, and the 63rd combination is AAaaba, and the shuffling algorithm is used to disarrange the original string and save it, then the combination string at the corresponding position will be an out-of-order combination.
Put the long url into the database, take the returned ID, find the corresponding string, for example, return id 1, then the corresponding string combination is BBB,
Similarly, when ID is 2, the string combination is BBa, and so on, until 64 combinations are reached, the possibility of repetition will appear.
Algorithm features:
- If you use the above 62 characters, take any six characters and combine them into a string, you’ll have more than 50 billion pieces of data before any repetition occurs.
Redirect 301 or 302?
- 301: Redirection is yes
permanent
Redirection, the search engine in the crawl new content, but also the old site to replace, replace the redirect after the address. Search engines simply add new links to the index. The browser cache is used, and the new link will be directly accessed when the old link is accessed next time. Because the short link is not resolved by the server, the short link access statistics cannot be done. - 302: Redirection is just
For the time being
The search engine will consider the redirected address to be temporary and will save the new address. Search engines index both old and new links. The browser will assume a temporary jump, and both new and old links will be accessed.
Therefore, in order to count the number of short link access times, the 302 status code is generally used to jump.
The SMS link jumps directly to the APP
All for operation! How to evoke an App from a promotional SMS link
Actual combat :SpringBoot+Redis high concurrency “Short link converter”
How short Link Converter works:
- Long links are converted to short links
Implementation principle: Long link is converted into short link encryption string key, and then stored in redis hash structure.
- Redirect to the original URL
Implementation principle: by encrypting string key to Redis to find the original URL, and then redirect out
This example is based on a random algorithm to achieve a short link converter
Short link converter
import org.apache.commons.codec.digest.DigestUtils;
/** * md5 generates 32-bit signature string, divided into 4 segments, 8 bytes each * loop through these 4 segments, take 8 bytes, treat it as hexadecimal string and 0x3fffffff(30 bits 1) and operation, * The 30 bits are divided into six segments, each of which takes a particular character as an index of the alphabet. The total MD5 string can be 4 6-bit strings, any one of which can be used as the short URL address of this long URL */
public class ShortUrlGenerator {
/ / 26 + + 10 = 26 62
public static final String[] chars = new String[]{"a"."b"."c"."d"."e"."f"."g"."h"."i"."j"."k"."l"."m"."n"."o"."p"."q"."r"."s"."t"."u"."v"."w"."x"."y"."z"."0"."1"."2"."3"."4"."5"."6"."Seven"."8"."9"."A"."B"."C"."D"."E"."F"."G"."H"."I"."J"."K"."L"."M"."N"."O"."P"."Q"."R"."S"."T"."U"."V"."W"."X"."Y"."Z"};
/** * a long link URL is converted to 4 short keys */
public static String[] shortUrl(String url) {
String key = "";
// Perform MD5 on the address
String sMD5EncryptResult = DigestUtils.md5Hex(key + url);
System.out.println(sMD5EncryptResult);
String hex = sMD5EncryptResult;
String[] resUrl = new String[4];
for (int i = 0; i < 4; i++) {
// Take an 8-bit string, md5 32-bit, cut into 4 groups of 8 characters each
String sTempSubString = hex.substring(i * 8, i * 8 + 8);
0x3FFFFFFF is used to format the first 30 bits
long lHexLong = 0x3FFFFFFF & Long.parseLong(sTempSubString, 16);
String outChars = "";
for (int j = 0; j < 6; j++) {
// what does 0x0000003D stand for? His base 10 is 61, 61 represents the coordinates from 0 to 61 of the chars array length 62.
// 0x0000003D&lhexLong does the bit and operation, that is, formatted to 6 bits, i.e., within 61 digits
// Ensure that index is within 61
long index = 0x0000003D & lHexLong;
outChars += chars[(int) index];
// Each loop moves 5 bits, because of 30 bits of binary, in 6 cycles, i.e., 5 bits to the right each time
lHexLong = lHexLong >> 5;
}
// Store the string into the output array of the corresponding index
resUrl[i] = outChars;
}
return resUrl;
}
/**
* 566ab90fc1bb2c6544ac4353f0d0a364
* niIvAj
* B7jQza
* ryqyia
* AzE7ny
* @param args
*/
public static void main(String[] args) {
/ / long connection
String longUrl = "https://juejin.cn/user/474627279698541/posts";
Return 4 short links after converting 6 bits of code to short links
String[] shortCodeArray = shortUrl(longUrl);
for (int i = 0; i < shortCodeArray.length; i++) {
// Either one can be used as a short link codeSystem.out.println(shortCodeArray[i]); }}}Copy the code
The controller
@RestController
@Slf4j
public class ShortUrlController {
@Autowired
private HttpServletResponse response;
@Autowired
private RedisTemplate redisTemplate;
private final static String SHORT_URL_KEY="short:url";
/** * Long links are converted to short encrypted keys, which are then stored in redis' hash structure. * /
@GetMapping(value = "/encode")
public String encode(String url) {
// A long link URL is converted to 4 short encrypted keys
String [] keys= ShortUrlGenerator.shortUrl(url);
// Take any one of them, we take the first one
String key=keys[0];
// Hash with key= encrypted string, value= original URL
this.redisTemplate.opsForHash().put(SHORT_URL_KEY,key,url);
log.info("Long link ={}, transform ={}",url,key);
return "http://127.0.0.1:9090/"+key;
}
/** * Redirects to the original URL */
@GetMapping(value = "/{key}")
public void decode(@PathVariable String key) {
// Go to Redis to find the original URL
String url=(String) this.redisTemplate.opsForHash().get(SHORT_URL_KEY,key);
try {
// Redirect to the original URL
response.sendRedirect(url);
} catch(IOException e) { e.printStackTrace(); }}}Copy the code
Gleaning-md5 algorithm features
- Compressibility: The length of the CALCULATED MD5 value is fixed for any length of data.
- Easy to calculate: It is easy to calculate the MD5 value from the raw data.
- Modifiability: Any change to the original data, even if it is only 1 byte, will result in a significant difference in the MD5 value.
- Strong collision resistance: Given the original data and its MD5 value, it is very difficult to find a data with the same MD5 value (i.e. forged data).
Redis distributed cache family
- Redis Distributed Cache (1) – Redis Installation (Linux and Docker)
- Redis Distributed cache (ii) – RDB and AOF
- SpringBoot integrates Mybatis-Plus,Redis and Swagger
- Redis distributed cache (4) – SpringCache integrates redis
- Redis Distributed Cache (5) — Common command (String)
- Redis Distributed Cache (6) — Article read volume PV Solution (String)
- Redis Distributed Cache (7) — Distributed Global ID Solution (String)
- Redis Distributed Cache (8) — Highly Concurrent atomic Operations (Redis+Lua)
- Redis Distributed Cache (9) — Hacker Anti-brush Attack Solution (String)
- Redis Distributed Cache (10) common Commands (Hash)
- Redis Distributed Cache (11) store Java Bean object Hash
- The article continues to be updated…