One, foreword

A few days ago when I was sorting out the interview questions, one of the questions was “How to convert a very long URL into a short URL, and realize the conversion between them?” , now think of this is an absolutely not simple problem, need to consider a lot of aspects, today and we study together!

Short url: As the name suggests, shortening a long url into a very short url that can be redirected to the original long url (i.e. the process of restoration). This can achieve the purpose of easy memory, conversion, often used in the word limit of weibo, TWO-DIMENSIONAL code and other scenes.

About the use of short URL scenarios, take a simple example to illustrate the importance of using short URLS in the business!

2. Short address usage scenarios

1. Sina Weibo

When we publish the url on Sina Weibo, Weibo automatically identifies the url and converts it, for example: t.cn/RuPKzRW. Why…

This is because the weibo limit of 140 words a, so if we need to send some link up, but this link is very long, that almost takes up half of our content pages, this is definitely not allowed, or user experience is very poor, so short url arises at the historic moment, this short url service is popular in weibo appeared before! Look down:

(1) First, I will send a microblog with a URL address:

Blog.csdn.net/xlgen157387…
t.cn/RuPKzRW, at this time you…

2, short url QR code

Url in the conversion into a short url, can also generate the corresponding short URL TWO-DIMENSIONAL code, short URL two-dimensional code application, two-dimensional code core is to solve the cross-platform, cross-reality data transmission problem; And two-dimensional code combined with application scenarios, can solve more and more problems.

(1) short URL TWO-DIMENSIONAL code is more convenient than short links, can input less, as little as possible input, even if just a little click on the keyboard, is meaningful.

(2) TWO-DIMENSIONAL code just scan a simple link, open is a world. Imagine buying something from a vending machine with your phone and scanning a QR code is slightly faster than finding the machine and finding the item from your phone, and isn’t that more elegant than searching/finding?

(3) All the commodities in the supermarket use bar codes to determine the uniqueness of commodities, and scan bar codes when paying the bill. Imagine adding more information about production dates, manufacturers, distribution routes, raw materials, etc., wouldn’t it be awesome? Especially for food information traceability, two-dimensional code application scenarios are more extensive.

Three, the benefits of short address

In addition to the advantages of converting long addresses to short addresses (shortening URL length) in the above scenario, short addresses also have many practical advantages, such as:

(1) Save the length of the URL and facilitate social communication. One is to make the URL shorter and more convenient to spread, especially the URL has Chinese and special characters. The short URL solves the problem that the long URL is difficult to remember and not conducive to communication;

(2) Short urls can be well opened and MANAGED in our project. A part of the URL can cover sex, violence, advertising and other information, so that we can through the user’s report, complete management of the connection will not appear in our application, the same URL through encryption algorithm, the address is the same;

(3) Convenient background tracking click volume, geographical distribution and other user statistics. We can carry out traffic and click statistics on a series of websites to dig out the concerns of most users, which is conducive to us making better decisions on the follow-up work of the project;

(4) Avoid keywords, domain name shielding means, hide the real address, suitable for paid promotion links;

(5) When you see a taobao baby link behind 200 “e7x8BV7C8BISDJ” such characters, you will feel comfortable? What’s more, the word count is only 140 characters. If the word count is not enough in microblog or SMS, you can make a lot of space with a short URL.

Iv. Short URL service providing platform

At present, there are many platforms that provide short address service, such as:

  • Sina: sina. Lt /
  • Baidu: DWZ. Cn /
  • 0 x3:0 x3. Me /
  • MRW: MRW. So /

And so on there are a lot of, this can search there will be a lot of! But a note is that if you use a platform of short address service, we must ensure long-term reliable service, otherwise a period of time failure, we have been converted before the URL is over!

Here, take Baidu as an example to convert the address of our above blog into a short address as follows:

Of course, for our business, if they can provide their own short URL service that is better, do not need to be disciplined by others! (Chinese chips need to rise!!)

How to generate short address URL discussion

As for how to generate short address URL, there are many ways on the Internet, some are based on mapping, some are based on Hash, some are based on signature, but in general, it cannot meet the use of most scenarios, or it is a wrong design way. No more wheels here! The following is the discussion on this issue by zhihu user Iammutex. Here are the screenshots to study with you:

Source: author: iammutex links: https://www.zhihu.com/question/29270034/answer/46446911 zhihu copyright owned by the author. Commercial reprint please contact the author for authorization, non-commercial reprint please indicate the source.Copy the code

Generate short address URL need to pay attention to

From the discussion of zhihu user Iammutex on how to correctly generate short address URL, we know that short address can be correctly generated through the number transmitter. Key points of generation algorithm design are as follows:

(1) The initial value is 0. For each short link generated request, increase the value of the transmitter, and then convert the value to base 62 (A-ZA-Z0-9). For example, the value of the transmitter is 0 in the first request, which corresponds to base 62 a. The value of the number allocation unit for the 10001st request is 10000, which corresponds to sBc in base 62.

(2) Connect the domain name of the short link server with the base 62 value of the number allocator, that is, the URL of the short link, such as t.cn/sBc.

(3) Redirection process: after the short link is generated, the mapping relationship between the short link and the long link needs to be stored, that is, sBc -> URL. When the browser visits the short link server, it takes the original link according to the URL Path, and then redirects 302. Mappings can be stored using K-V, such as Redis or Memcache.

7. How to jump to where after generating short address?

For the discussion in this part, we can consider it as the whole interactive process, and the details are as follows:

(1) Users access short links: t.cn/RuPKzRW;

Tc n (2) the short link server receives the request, according to the URL path RuPKzRW long to get into the original link (KV cache database lookup) : blog.csdn.net/xlgen157387…

(3) the server returned a status code 302, set the Location in the response headers to: blog.csdn.net/xlgen157387…

(4) the browser to send the request to the https://blog.csdn.net/xlgen157387/article/details/79863301;

(5) Return the response;

8. Optimization scheme of short address transmitter

1. Algorithm optimization

Using the above algorithm, without judgment, so even for the same original URL, each generation of short links are also different, this will waste storage space (because of the need to store multiple short link to the same URL mapping), if the same short URL mapping into the same link, so you can save the storage space. The main ideas are as follows:

Scheme 1: Query the table

Each time a short link is generated, the mapping table is first searched to see if there is a mapping of the original URL. If there is, the result is directly returned. Obviously, this approach is inefficient.

Scheme 2: Use LRU local cache, space for time

The fixed-size LRU cache is used to store the results of the last N mappings, so that if a link is generated very frequently, the results can be found in the LRU cache and returned directly. This is a compromise between storage space and performance.

2. Scalable and highly available

If the short link generation service is deployed in a single machine, it has the disadvantages of insufficient performance to withstand massive concurrent access, and it becomes a single point of the system. If the machine breaks down, the whole service is unavailable. To solve this problem, the system can be clustered and “sharded”.

In the system architecture described above, if the transmitter is implemented by Redis, Redis is the bottleneck and single point of the system. Therefore, using the design idea of database sharding, multiple transmitter instances can be deployed, and each instance is responsible for the sending number of a specific number segment. For example, 10 Redis can be deployed, and each one is responsible for the sending number ending 0-9 of the number segment. Note that the step size of the transmitter should be set to 10 (number of instances).

In addition, the storage of the mapping between long links and short links can also be fragmented. Since there is no centralized storage location, additional services need to be developed to find the storage node of the original link corresponding to the short link, so that the mapping can be found on the correct node.

How to use code to achieve short address

1. Use random sequences to generate short addresses

Here finally comes to the point, many partners have been unable to resist, sorry to let you down, this is just a simple article, and can not put such a complex system to demonstrate clearly! In keeping with the principle of not reinventing the wheel, here’s one of the few reasonably viable open source projects to implement short addresses: urlshorter

Note: Urlshorter itself is based on random way to generate short address, not a short address sender, so there will be performance problems and conflicts, and zhihu user iammutex description of the implementation is still different! And there is no better open source project for reference about the way of short address sender!

Project address: gitee.com/tinyframewo…

2. Use the SnowFlake sender to generate short addresses

Implementation reference: github.com/beyondfengy… www.wolfbe.com/detail/2016…

Twitter’s SnowFlake algorithm, SnowFlake, is implemented in the Java language.

The SnowFlake algorithm is used to generate 64-bit ids, which can be stored as long integers, and can be used to produce unique ids in distributed systems, with ids generated in rough order. In this implementation, the generated 64-bit ID can be divided into five parts:

0-41 bit timestamp - 5-bit data center ID - 5-bit machine ID - 12-bit serial numberCopy the code

The 5-bit DATA center id and 5-bit machine ID are only allocated in the current implementation. If services have actual requirements, other allocation ratios can be used. For example, the 10-bit machine ID does not need the data center ID.

Java code implementation is as follows:

/** * The decimal conversion tool supports the maximum number of decimal and 62 conversion * 1, the decimal number to the specified string; * @author xuliugen * @date 2018/04/23 */ public class NumericConvertUtils {/** ** */ private static final char[] digits = {private static final char[] digits = {'0'.'1'.'2'.'3'.'4'.'5'.'6'.'7'.'8'.'9'.'a'.'b'.'c'.'d'.'e'.'f'.'g'.'h'.'i'.'j'.'k'.'l'.'m'.'n'.'o'.'p'.'q'.'r'.'s'.'t'.'u'.'v'.'w'.'x'.'y'.'z'.'A'.'B'.'C'.'D'.'E'.'F'.'G'.'H'.'I'.'J'.'K'.'L'.'M'.'N'.'O'.'P'.'Q'.'R'.'S'.'T'.'U'.'V'.'W'.'X'.'Y'.'Z'}; /** * converts a decimal number to a string in the specified base * @param number Decimal number * @param seed Specifies the base * @returnPublic static String toOtherNumberSystem(long number, int seed) {static String toOtherNumberSystem(long number, int seed) {if (number < 0) {
            number = ((long) 2 * 0x7fffffff) + number + 2;
        }
        char[] buf = new char[32];
        int charPos = 32;
        while ((number / seed) > 0) {
            buf[--charPos] = digits[(int) (number % seed)];
            number /= seed;
        }
        buf[--charPos] = digits[(int) (number % seed)];
        returnnew String(buf, charPos, (32 - charPos)); } /** * converts a number of other bases (in string form) to a decimal number * @param number A number of other bases (in string form) * @param seed specifies the base, which is the original base of the argument STR * @return*/ Public static long toDecimalNumber(String number, int seed) {char[] charBuf = number.tochararray ();if (seed == 10) {
            return Long.parseLong(number);
        }

        long result = 0, base = 1;

        for (int i = charBuf.length - 1; i >= 0; i--) {
            int index = 0;
            for(int j = 0, length = digits.length; j < length; J++) {// find the subscript of the corresponding character, which is the specific valueif (digits[j] == charBuf[i]) {
                    index = j;
                }
            }
            result += index * base;
            base *= seed;
        }
        returnresult; }}Copy the code
/** * Twitter's SnowFlake algorithm uses the SnowFlake algorithm to generate an integer, URL * @author beyond * @author xuliugen * @date 2018/04/23 */ public class SnowFlakeShortUrl {/** * */ private final static long START_TIMESTAMP = 1480166465631L; Private final static long SEQUENCE_BIT = 1; private final static long SEQUENCE_BIT = 1; Private final static long MACHINE_BIT = 5; Private final static long DATA_CENTER_BIT = 5; Private final static long MAX_SEQUENCE = -1l ^ (-1l << SEQUENCE_BIT); private final static long MAX_MACHINE_NUM = -1L ^ (-1L << MACHINE_BIT); private final static long MAX_DATA_CENTER_NUM = -1L ^ (-1L << DATA_CENTER_BIT); Private final static long MACHINE_LEFT = SEQUENCE_BIT; private final static long MACHINE_LEFT = SEQUENCE_BIT; private final static long DATA_CENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT; private final static long TIMESTAMP_LEFT = DATA_CENTER_LEFT + DATA_CENTER_BIT; private long dataCenterId; // Private long machineId; Private long sequence = 0L; Private long lastTimeStamp = -1l; * @param dataCenterId dataCenterId * @param machineId machineId */ public SnowFlake(long dataCenterId, long machineId) {if (dataCenterId > MAX_DATA_CENTER_NUM || dataCenterId < 0) {
            throw new IllegalArgumentException("DtaCenterId can't be greater than MAX_DATA_CENTER_NUM or less than 0!");
        }
        if (machineId > MAX_MACHINE_NUM || machineId < 0) {
            throw new IllegalArgumentException("MachineId can't be greater than MAX_MACHINE_NUM or less than 0!"); } this.dataCenterId = dataCenterId; this.machineId = machineId; } /** * produces the next ID * @return
     */
    public synchronized long nextId() {
        long currTimeStamp = getNewTimeStamp();
        if (currTimeStamp < lastTimeStamp) {
            throw new RuntimeException("Clock moved backwards. Refusing to generate id");
        }

        if(currTimeStamp == lastTimeStamp) {// In the same millisecond, sequence = (sequence + 1) &max_sequence; // The number of sequences in the same millisecond has reached the maximumif(sequence == 0L) { currTimeStamp = getNextMill(); }}else{// Set the sequence number to 0 sequence = 0L in different milliseconds; } lastTimeStamp = currTimeStamp;return(currTimeStamp - START_TIMESTAMP) < < TIMESTAMP_LEFT / / part timestamp | dataCenterId < < DATA_CENTER_LEFT/part/data center | machineId < < MACHINE_LEFT / / machine identifier portion | sequence; } private longgetNextMill() {
        long mill = getNewTimeStamp();
        while (mill <= lastTimeStamp) {
            mill = getNewTimeStamp();
        }
        return mill;
    }

    private long getNewTimeStamp() {
        return System.currentTimeMillis();
    }

    public static void main(String[] args) {
        SnowFlake snowFlake = new SnowFlake(2, 3);

        for(int i = 0; i < (1 << 4); I++) {//10 base id = snowfla.nextid (); . / / 62 hexadecimal String convertedNumStr = NumericConvertUtils toOtherNumberSystem (id, 62); // Convert base 10 to base 62 system.out.println ("Base 10:" + id + "Base 62 :"+ convertedNumStr); //TODO performs specific storage operations, which can be stored in Redis, etc. // Base 62 to base 10 system.out.println ("Base 62:" + convertedNumStr + "Base 10 :"+ NumericConvertUtils.toDecimalNumber(convertedNumStr, 62)); System.out.println(); Base 10 :185784275776581632 base 62 :dITqmhW2He base 62 :dITqmhW2He base 10 :185784275776581632 base 10: 185784284689477632 62 base :dITqw17E6k 62 base :dITqw17E6k 10 base :185784284689477632 62 base :dITqw17E6l 62 base: 185784284689477633 DITqw17E6l 10 base :185784284689477633 10 base :185784284689477634 62 base :dITqw17E6m 62 base :dITqw17E6m 10 base :185784284689477634 10 base :185784284689477634 185784284689477635 62 base :dITqw17E6n 62 base :dITqw17E6n 10 base :185784284689477635 62 base :dITqw17E6o 62 base: 185784284689477636 DITqw17E6o 10 base :185784284689477636 10 base :185784284689477637 62 base :dITqw17E6p 62 base :dITqw17E6p 10 base :185784284689477637 10 base :185784284689477637 185784284693671936 62 base :dITqw1pfeo 62 base :dITqw1pfeo 10 base :185784284693671936 10 base: 185784284693671937 62 base :dITqw1pfep 62 base: DITqw1pfep 10 base :185784284693671937 10 base :185784284693671938 62 base :dITqw1pfeq 62 base :dITqw1pfeq 10 base :185784284693671938 10 base :dITqw1pfeq 10 base :185784284693671938 10 base :dITqw1pfeq 10 base :185784284693671938 Base :dITqw1pfer 62 base :dITqw1pfer 10 base: 185784284693671940 base :dITqw1pfes 62 base :dITqw1pfer 62 base :dITqw1pfer 10 base: 185784284693671940 base :dITqw1pfes 62 base: DITqw1pfes base 10 :185784284693671940 base 10 :185784284693671941 Base 62 :dITqw1pfet base 10 :185784284693671941 base 10 :dITqw1pfet base 10 :185784284693671941 base 10 :dITqw1pfet base 10 :185784284693671941 base 10: 185784284693671942 62 base :dITqw1pfeu 62 base :dITqw1pfeu 10 base: 185784284693671943 62 base :dITqw1pfev 62 base :dITqw1pfeu DITqw1pfev base 10 :185784284693671943 Base 10 :185784284693671944 Base 62 :dITqw1pfew base 62 :dITqw1pfew 10 :185784284693671944Copy the code

Last code address: gitee.com/xuliugen/co…

3. A universal ID transmitter is recommended

Code cloud address: gitee.com/robertleepe…

Here directly to you address, not in the introduction, want to know can move to view the document.

Ten,

So far, we learned together what is short address, the advantages of short address, how to choose a right way to achieve our short address, as well as in the code cloud found a fair short address generation project, I believe that at this time you can have a better understanding!


Reference article:

1, www.2cto.com/kf/201601/4…

2, blog.csdn.net/lz0426001/a…

3, blog.sina.com.cn/s/blog_16aa…

4, www.zhihu.com/question/29…

5, github.com/beyondfengy…

Java Backend Technology (ID: JavaITWork)1024, you can get it for free! Includes SSM, Spring family bucket, microservices, MySQL, MyCat, cluster, distributed, middleware, Linux, network, multi-threading, Jenkins, Nexus, Docker, ELK and so on free learning video, continue to update!