Short url in something that is a very popular at present, provide short url service is also quite a few, short url on weibo application more widely, because weibo have a limit to the length of the url, so will be a long url into a short url, is a great idea, its benefits are many, such as characters, less beautiful, Easy to publish and spread, etc.

Short url principle description

Here we take Baidu short url service as an example to explain. https://dwz.cn/XrxeqVqy this URL is my website homepage after the shortened effect. The specific steps are as follows:

  • Enter a short url in your browser’s address bar, or click on a short url in your browser
  • Do DNS resolution, get the IP address of dz.cn, and send a GET request to that address
  • The server gets XrxeqVqy and gets the corresponding long URL based on the short code
  • Redirect to the long URL.

Redirection can use 301 redirection can also use 302 redirection, the difference is that the former is permanent redirection, the latter is temporary redirection, under normal circumstances, short url once generated, will not change, so the use of 301 redirection will be better, can reduce the pressure on the server. If you do not need to count the number of times the link has been visited, or any other information, you can use 302 redirects.

How to shorten web addresses

Shorten the URL actually is to use a certain algorithm will be long URL processing, and then get the only short code, the short code and the long URL is one to one, can not repeat, and then the short code stored, when using the short code access, query the corresponding long URL, redirection can be.

Generally, there are two kinds of algorithms: self-increasing ID method and summary algorithm.

Since the ID method

The method of auto-increment ID is also called never repeat method, which adopts the principle of transmitter to realize, each URL corresponds to a number, and then auto-increment, which can be understood as ID, and then transform THE ID accordingly (such as base conversion), because THE ID is unique, so the converted result is also unique. Short urls are usually set to be six digits long, and each digit is made up of [A-z, A-z, 0-9], a total of 62^6 ~= 56.8 billion combinations, which is almost enough.

Finished with the theory, let’s take a look at the specific implementation algorithm steps:

First of all, obtain the long URL, calculate the long URL into MD5 value, determine whether there is a short code corresponding to the MD5 value in the library (this library can be redis or mysql to obtain noSql database), if so, directly return; If not, store the URL, MD5 to the database, and return the id value of the record, which is used as a basis for generating the short chain. The reason why THE URL is converted to MD5 is that the long URL may be a very long string, and it is time-consuming to query in the database, especially as the key value of Redis, it is not desirable.

And then returns the ID of the conversion of 61 into the system, remove one of the letters or Numbers used as a connector, here we use lowercase letters a, then after joining together to convert hexadecimal string, with random characters make up less than six, random character also played away with the connector to the corresponding character, to ensure that only six short code.

The short code has been generated, so just return it. After that is to input the short code to redirect, we can query the long URL corresponding to the short code in the library, and then redirect to the long URL address.

The flow chart is as follows

Here I will post the algorithm to generate short code, sample code using NodeJS, other business logic is easy to implement, not post:

function EncodeStr(number) { if(! ParseInt (number)){let codemsg = "Please pass in the number type "; return { success:false, msg:codemsg } } let randomInsertStr = 'a'; var chars = '0123456789bcdefghigklmnopqrstuvwxyzABCDEFGHIGKLMNOPQRSTUVWXYZ'.split(''), radix = chars.length, qutient = +number, arr = []; do { mod = qutient % radix; qutient = (qutient - mod) / radix; arr.unshift(chars[mod]); } while (qutient); let codeStr = arr.join(''); codeStr = codeStr.length < 6 ? (codeStr+randomInsertStr + randomWord(false,5-codeStr.length,0,[randomInsertStr])):codeStr; return codeStr; } let randomWord = (randomFlag, min, max,ruledOutStr)=>{ var str = "", range = min, pos='', arr = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']; If (ruledOutStr){ruledOutstr.map (item=>{arr = arr.join("").split(item).join(").split(")})} if(randomFlag){ range = Math.round(Math.random() * (max-min)) + min; } for(var i=0; i<range; i++){ pos = Math.round(Math.random() * (arr.length-1)); str += arr[pos]; } return str; };Copy the code

The algorithm

There is a probability of repetition in algorithms, and the more the probability of repetition increases. First, let’s talk about specific ideas: First, md5 generates 32-bit signature string for long URL, which is divided into 4 segments, 8 bytes for each segment. The four segments are cyclically processed, 8 bytes are taken and treated as hexadecimal string and 0x3FFFFFFf (30 bits 1) and operation, that is, the processing over 30 bits is ignored. The 30 bits are divided into six segments, each of which takes a specific character as an index of the alphabet, followed by a 6-bit string. The total MD5 string can obtain four 6-bit strings, any of which can be used as the short URL address of the long URL. Query whether the short URL exists in the library, if it does, then try again, there is no direct deposit can be.

You are welcome to correct me if there is anything wrong