1. Basic principle of MD5
MD5 is message-digest Algorithm 5, which is used to ensure information consistency. The principle of MD5 is briefly described. Input information is processed in 512-bit packets, and each packet is divided into 16 32-bit packets. After a series of processing, the output of the algorithm consists of four 32-bit packets, which are cascaded to generate a 128-bit hash value.
2. Algorithm flow
The following is an example of an SO library that invokes the standard MD5 algorithm for encryption
F5 is converted to pseudo-C code as follows
A standard MD5 algorithm first evaluates MD5Init(), MD5Update(), MD5Final() in that order, and finally computs a 32-bit hexadecimal string. It then outputs the string in hexadecimal form if necessary (sprintf(str_tmp, “%02x”, decrypt[v1]), otherwise it looks like garbled.
- MD5Init()
This step is the initialization variable. The initial 128-bit value is the initial link variable. These parameters are used for the first round of operation, and are represented in big-enode order: A=0x01234567, B=0x89ABCDEF, C=0xFEDCBA98, D=0x76543210. Each variable gives the value high byte stored at the low memory address and low byte stored at the high memory address, i.e., big-endian byte order. In the program, the values of variables A, B, C and D are 0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476 respectively)
F5 is converted to pseudo-C code as follows
Basically, in the general SO, we can see that the values are initialized to 0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476
- MD5Update (), MD5Final ()
The algorithm flow of each group is as follows:
The first group needs to copy the above four link variables into the other four variables: A to A, B to B, C to C, and D to D. The variables starting from the second group are the results of the previous group, i.e., A = A, B = B, C = C, D = D.
The main cycle has four rounds (MD4 only has three), and each cycle is similar. The first round of 16 operations. Each operation performs a nonlinear function operation on three of a, B, C, and D, and then adds the result to a fourth variable, a subgroup of the text, and a constant. Shift the result to the left by an indeterminate number and add one of a, B, C, or D. Finally replace one of A, B, C or D with the result.
Here are the four nonlinear functions used in each operation (one for each round).
F( X ,Y ,Z ) = ( X & Y ) | ( (~X) & Z )
G( X ,Y ,Z ) = ( X & Z ) | ( Y & (~Z) )
H( X ,Y ,Z ) =X ^ Y ^ Z
I( X ,Y ,Z ) =Y ^ ( X | (~Z) )
(& is And (And), | And (Or), ~ is (Not), ^ is exclusive Or (Xor))
Description of these four functions: If the corresponding bits of X, Y and Z are independent and uniform, then each bit of the result should also be independent and uniform.
F is a bitwise operation. That is, if X, then Y, otherwise Z. The function H is the bitwise parity operator.
Suppose Mj represents the JTH subgroup of the message (from 0 to 15), the constant Ti is the integer part of 4294967296*abs(sin(I)), and I takes the value from 1 to 64 in radians. (4294967296=232)
FF (a, b, c, d, Mj, s, ti) operation for a = b + ((a + F (b, c, d) + Mj + ti) < < s)
GG (a, b, c, d, Mj, s, ti) operation for a = b + ((a + G (b, c, d) + Mj + ti) < < s)
HH (a, b, c, d, Mj, s, ti) operation for a = b + ((a + H (b, c, d) + Mj + ti) < < s)
II (a, b, c, d, Mj, s, ti) operation for a = b + ((a + I (b, c, d) + Mj + ti) < < s)
Note: “<<” indicates left shift of the loop, not left shift.
The four rounds (64 steps in total) are:
The first round
FF(a ,b ,c ,d ,M0 ,7 ,0xd76aa478 )
FF(d ,a ,b ,c ,M1 ,12 ,0xe8c7b756 )
FF(c ,d ,a ,b ,M2 ,17 ,0x242070db )
FF(b ,c ,d ,a ,M3 ,22 ,0xc1bdceee )
FF(a ,b ,c ,d ,M4 ,7 ,0xf57c0faf )
FF(d ,a ,b ,c ,M5 ,12 ,0x4787c62a )
FF(c ,d ,a ,b ,M6 ,17 ,0xa8304613 )
FF(b ,c ,d ,a ,M7 ,22 ,0xfd469501)
FF(a ,b ,c ,d ,M8 ,7 ,0x698098d8 )
FF(d ,a ,b ,c ,M9 ,12 ,0x8b44f7af )
FF(c ,d ,a ,b ,M10 ,17 ,0xffff5bb1 )
FF(b ,c ,d ,a ,M11 ,22 ,0x895cd7be )
FF(a ,b ,c ,d ,M12 ,7 ,0x6b901122 )
FF(d ,a ,b ,c ,M13 ,12 ,0xfd987193 )
FF(c ,d ,a ,b ,M14 ,17 ,0xa679438e )
FF(b ,c ,d ,a ,M15 ,22 ,0x49b40821 )
The second round
GG(a ,b ,c ,d ,M1 ,5 ,0xf61e2562 )
GG(d ,a ,b ,c ,M6 ,9 ,0xc040b340 )
GG(c ,d ,a ,b ,M11 ,14 ,0x265e5a51 )
GG(b ,c ,d ,a ,M0 ,20 ,0xe9b6c7aa )
GG(a ,b ,c ,d ,M5 ,5 ,0xd62f105d )
GG(d ,a ,b ,c ,M10 ,9 ,0x02441453 )
GG(c ,d ,a ,b ,M15 ,14 ,0xd8a1e681 )
GG(b ,c ,d ,a ,M4 ,20 ,0xe7d3fbc8 )
GG(a ,b ,c ,d ,M9 ,5 ,0x21e1cde6 )
GG(d ,a ,b ,c ,M14 ,9 ,0xc33707d6 )
GG(c ,d ,a ,b ,M3 ,14 ,0xf4d50d87 )
GG(b ,c ,d ,a ,M8 ,20 ,0x455a14ed )
GG(a ,b ,c ,d ,M13 ,5 ,0xa9e3e905 )
GG(d ,a ,b ,c ,M2 ,9 ,0xfcefa3f8 )
GG(c ,d ,a ,b ,M7 ,14 ,0x676f02d9 )
GG(b ,c ,d ,a ,M12 ,20 ,0x8d2a4c8a )
In the third round
HH(a ,b ,c ,d ,M5 ,4 ,0xfffa3942 )
HH(d ,a ,b ,c ,M8 ,11 ,0x8771f681 )
HH(c ,d ,a ,b ,M11 ,16 ,0x6d9d6122 )
HH(b ,c ,d ,a ,M14 ,23 ,0xfde5380c )
HH(a ,b ,c ,d ,M1 ,4 ,0xa4beea44 )
HH(d ,a ,b ,c ,M4 ,11 ,0x4bdecfa9 )
HH(c ,d ,a ,b ,M7 ,16 ,0xf6bb4b60 )
HH(b ,c ,d ,a ,M10 ,23 ,0xbebfbc70 )
HH(a ,b ,c ,d ,M13 ,4 ,0x289b7ec6 )
HH(d ,a ,b ,c ,M0 ,11 ,0xeaa127fa )
HH(c ,d ,a ,b ,M3 ,16 ,0xd4ef3085 )
HH(b ,c ,d ,a ,M6 ,23 ,0x04881d05 )
HH(a ,b ,c ,d ,M9 ,4 ,0xd9d4d039 )
HH(d ,a ,b ,c ,M12 ,11 ,0xe6db99e5 )
HH(c ,d ,a ,b ,M15 ,16 ,0x1fa27cf8 )
HH(b ,c ,d ,a ,M2 ,23 ,0xc4ac5665 )
The fourth round of
II(a ,b ,c ,d ,M0 ,6 ,0xf4292244 )
II(d ,a ,b ,c ,M7 ,10 ,0x432aff97 )
II(c ,d ,a ,b ,M14 ,15 ,0xab9423a7 )
II(b ,c ,d ,a ,M5 ,21 ,0xfc93a039 )
II(a ,b ,c ,d ,M12 ,6 ,0x655b59c3 )
II(d ,a ,b ,c ,M3 ,10 ,0x8f0ccc92 )
II(c ,d ,a ,b ,M10 ,15 ,0xffeff47d )
II(b ,c ,d ,a ,M1 ,21 ,0x85845dd1 )
II(a ,b ,c ,d ,M8 ,6 ,0x6fa87e4f )
II(d ,a ,b ,c ,M15 ,10 ,0xfe2ce6e0 )
II(c ,d ,a ,b ,M6 ,15 ,0xa3014314 )
II(b ,c ,d ,a ,M13 ,21 ,0x4e0811a1 )
II(a ,b ,c ,d ,M4 ,6 ,0xf7537e82 )
II(d ,a ,b ,c ,M11 ,10 ,0xbd3af235 )
II(c ,d ,a ,b ,M2 ,15 ,0x2ad7d2bb )
II(b ,c ,d ,a ,M9 ,21 ,0xeb86d391 )
After all this is done, add a, B, C and D to the original, respectively. A = a + A, b = b + b, C = c + c, d = d + D
The key pseudocode is as follows:
void __fastcall MD5Transform(unsigned int *state, unsigned __int8 *block)
{
unsigned int *v2; // r4@1
unsigned int v3; // r5@1
unsigned int v4; // r6@1
unsigned int v5; // ST14_4@1
unsigned int v6; // r7@1
int v7; // r1@1
int v8; // r1@1
int v9; // r2@1
------skip--------
int v132; // r0@1
int v133; // r3@1
unsigned int v134; // r5@1
unsigned int v135; // r3@1
unsigned int x[64]; // [sp+48h] [bp-118h]@1
v2 = state;
v3 = state[1];
v4 = *state;
v5 = state[2];
v6 = state[3];
MD5Decode(x, block, 0x40u);
v7 = __ROR4__(x[0] - 680876936 + v4 + (v6 & ~v3 | v5 & v3), 25);
v8 = v7 + v3;
v9 = __ROR4__(x[1] - 389564586 + v6 + (v5 & ~v8 | v3 & v8), 20);
v10 = v9 + v8;
v11 = __ROR4__(x[2] + 606105819 + v5 + (v8 & v10 | v3 & ~v10), 15);
v12 = v11 + v10;
v13 = __ROR4__(v3 + x[3] - 1044525330 + (v10 & v12 | v8 & ~v12), 10);
v14 = v13 + v12;
v15 = __ROR4__(v8 + x[4] - 176418897 + (v12 & v14 | v10 & ~v14), 25);
v16 = v15 + v14;
v17 = __ROR4__(v10 + x[5] + 1200080426 + (v14 & v16 | v12 & ~v16), 20);
v18 = v17 + v16;
v19 = __ROR4__(v12 + x[6] - 1473231341 + (v16 & v18 | v14 & ~v18), 15);
v20 = v19 + v18;
v21 = __ROR4__(v14 + x[7] - 45705983 + (v18 & v20 | v16 & ~v20), 10);
v22 = v21 + v20;
v23 = __ROR4__(v16 + x[8] + 1770035416 + (v20 & v22 | v18 & ~v22), 25);
v24 = v23 + v22;
v25 = __ROR4__(v18 + x[9] - 1958414417 + (v22 & v24 | v20 & ~v24), 20);
v26 = v25 + v24;
v27 = __ROR4__(v20 + x[10] - 42063 + (v24 & v26 | v22 & ~v26), 15);
v28 = v27 + v26;
v29 = __ROR4__(v22 + x[11] - 1990404162 + (v26 & v28 | v24 & ~v28), 10);
v30 = v29 + v28;
v31 = __ROR4__(v24 + x[12] + 1804603682 + (v28 & v30 | v26 & ~v30), 25);
v32 = v31 + v30;
v33 = __ROR4__(x[13] - 40341101 + v26 + (v30 & v32 | v28 & ~v32), 20);
v34 = v33 + v31 + v30;
v35 = __ROR4__(x[14] - 1502002290 + v28 + ((v31 + v30) & v34 | ~v34 & v30), 15);
v36 = v35 + v34;
v37 = __ROR4__(x[15] + 1236535329 + v30 + (v34 & v36 | ~v36 & (v31 + v30)), 10);
v38 = v37 + v36;
v39 = __ROR4__(x[1] - 165796510 + v32 + (~v34 & v36 | v34 & v38), 27);
v40 = v39 + v38;
v41 = __ROR4__(x[6] - 1069501632 + v34 + (~v36 & v38 | v36 & v40), 23);
v42 = v41 + v40;
v43 = __ROR4__(v36 + x[11] + 643717713 + (v40 & ~v38 | v38 & v42), 18);
v44 = v43 + v42;
v45 = __ROR4__(v38 + x[0] - 373897302 + (v42 & ~v40 | v40 & v44), 12);
v46 = v45 + v44;
v47 = __ROR4__(v40 + x[5] - 701558691 + (v44 & ~v42 | v42 & v46), 27);
v48 = v47 + v46;
v49 = __ROR4__(v42 + x[10] + 38016083 + (v46 & ~v44 | v44 & v48), 23);
v50 = v49 + v48;
v51 = __ROR4__(v44 + x[15] - 660478335 + (v48 & ~v46 | v46 & v50), 18);
v52 = v51 + v50;
v53 = __ROR4__(v46 + x[4] - 405537848 + (v50 & ~v48 | v48 & v52), 12);
v54 = v53 + v52;
v55 = __ROR4__(v48 + x[9] + 568446438 + (v52 & ~v50 | v50 & v54), 27);
v56 = v55 + v54;
v57 = __ROR4__(v50 + x[14] - 1019803690 + (v54 & ~v52 | v52 & v56), 23);
v58 = v57 + v56;
v59 = __ROR4__(v52 + x[3] - 187363961 + (v56 & ~v54 | v54 & v58), 18);
v60 = v59 + v58;
v61 = __ROR4__(v54 + x[8] + 1163531501 + (v58 & ~v56 | v56 & v60), 12);
v62 = v61 + v60;
v63 = __ROR4__(v56 + x[13] - 1444681467 + (v60 & ~v58 | v58 & v62), 27);
v64 = v63 + v62;
v65 = __ROR4__(v58 + x[2] - 51403784 + (v62 & ~v60 | v60 & v64), 23);
v66 = v65 + v64;
v67 = __ROR4__(x[7] + 1735328473 + v60 + (v64 & ~v62 | v62 & v66), 18);
v68 = v67 + v66;
v69 = __ROR4__(x[12] - 1926607734 + v62 + (v66 & ~v64 | v64 & v68), 12);
v70 = v69 + v68;
v71 = __ROR4__(x[5] - 378558 + v64 + (v68 ^ v66 ^ v70), 28);
v72 = v71 + v70;
v73 = __ROR4__(x[8] - 2022574463 + v66 + (v70 ^ v68 ^ v72), 21);
v74 = v73 + v72;
v75 = __ROR4__(x[11] + 1839030562 + v68 + (v72 ^ v70 ^ v74), 16);
v76 = v75 + v74;
v77 = __ROR4__(x[14] - 35309556 + v70 + (v74 ^ v72 ^ v76), 9);
v78 = v77 + v76;
v79 = __ROR4__((v76 ^ v74 ^ v78) + x[1] - 1530992060 + v72, 28);
v80 = v79 + v78;
v81 = __ROR4__((v78 ^ v76 ^ v80) + x[4] + 1272893353 + v74, 21);
v82 = v81 + v80;
v83 = __ROR4__((v80 ^ v78 ^ v82) + x[7] - 155497632 + v76, 16);
v84 = v83 + v82;
v85 = __ROR4__((v82 ^ v80 ^ v84) + x[10] - 1094730640 + v78, 9);
v86 = v85 + v84;
v87 = __ROR4__((v84 ^ v82 ^ v86) + x[13] + 681279174 + v80, 28);
v88 = v87 + v86;
v89 = __ROR4__((v86 ^ v84 ^ v88) + x[0] - 358537222 + v82, 21);
v90 = v89 + v88;
v91 = __ROR4__((v88 ^ v86 ^ v90) + x[3] - 722521979 + v84, 16);
v92 = v91 + v90;
v93 = __ROR4__((v90 ^ v88 ^ v92) + x[6] + 76029189 + v86, 9);
v94 = v93 + v92;
v95 = __ROR4__((v92 ^ v90 ^ v94) + x[9] - 640364487 + v88, 28);
v96 = v95 + v94;
v97 = __ROR4__((v94 ^ v92 ^ v96) + x[12] - 421815835 + v90, 21);
v98 = v97 + v96;
v99 = __ROR4__(v92 + x[15] + 530742520 + (v96 ^ v94 ^ v98), 16);
v100 = v99 + v98;
v101 = __ROR4__(v94 + x[2] - 995338651 + (v98 ^ v96 ^ v100), 9);
v102 = v101 + v100;
v103 = __ROR4__(x[0] - 198630844 + v96 + ((~v98 | v102) ^ v100), 26);
v104 = v103 + v102;
v105 = __ROR4__(x[7] + 1126891415 + v98 + ((~v100 | v104) ^ v102), 22);
v106 = v105 + v104;
v107 = __ROR4__(x[14] - 1416354905 + v100 + ((~v102 | v106) ^ v104), 17);
v108 = v107 + v106;
v109 = __ROR4__(v102 + x[5] - 57434055 + ((~v104 | v108) ^ v106), 11);
v110 = v109 + v108;
v111 = __ROR4__(x[12] + 1700485571 + v104 + ((~v106 | v110) ^ v108), 26);
v112 = v111 + v110;
v113 = __ROR4__(x[3] - 1894986606 + v106 + ((~v108 | v112) ^ v110), 22);
v114 = v113 + v112;
v115 = __ROR4__(x[10] - 1051523 + v108 + ((~v110 | v114) ^ v112), 17);
v116 = v115 + v114;
v117 = __ROR4__(x[1] - 2054922799 + v110 + ((~v112 | v116) ^ v114), 11);
v118 = v117 + v116;
v119 = __ROR4__(x[8] + 1873313359 + v112 + ((~v114 | v118) ^ v116), 26);
v120 = v119 + v118;
v121 = __ROR4__(x[15] - 30611744 + v114 + ((~v116 | v120) ^ v118), 22);
v122 = v121 + v120;
v123 = __ROR4__(x[6] - 1560198380 + v116 + ((~v118 | v122) ^ v120), 17);
v124 = v123 + v122;
v125 = __ROR4__(x[13] + 1309151649 + v118 + ((~v120 | v124) ^ v122), 11);
v126 = v125 + v124;
v127 = __ROR4__(x[4] - 145523070 + v120 + ((~v122 | v126) ^ v124), 26);
v128 = v127 + v126;
v129 = __ROR4__(x[11] - 1120210379 + v122 + ((~v124 | v128) ^ v126), 22);
v130 = v129 + v128;
v131 = __ROR4__(x[2] + 718787259 + v124 + ((~v126 | v130) ^ v128), 17);
v132 = v131 + v130;
v133 = __ROR4__(x[9] - 343485551 + v126 + ((~v128 | v132) ^ v130), 11);
*v2 += v128;
v134 = v2[3];
v2[1] += v132 + v133;
v135 = v2[2];
v2[3] = v134 + v130;
v2[2] = v135 + v132;
}
Copy the code
The algorithm is then continued with the next block of data.
The final output is a cascade of A, B, C, and D
3. IDA dynamic debugging
At MD5Update(), the breakpoint is found before entering
MD5(“[hbzFZ9dew0]@YWROdW09MWJkdXNzPWJpcnRoZGF5PTIwMTgtMDUtMTFjaGFubmVsPWJhaWR1emh1c2hvdWN1aWQ9OEY3ODcxMzNERTdDOTQ1NUZEMk I5MTc3MTFDNzEwMzN8NDAzNTU2NzYwMDk0MzUzZGlnRmxhZz0xaG90PTBpc0FJPTBvdnVsYXRpb25UaW1lPTBwZXJpb2Q9MnBuPTEwcHJlZ1N0PTNxdWVyeT 3lrp3lrp3og47orrBybj0xMHRva2VuPTFfODA1ZmE3MWI2NzU3MDZjYjNmYmNiZjBjYTFjMzc3OTB0eXBlPTB2Yz0yMzZ2b2ljZT0w”) = d7166568860b42b3dcb13c695e0fc5b0
When I came out, it became
And then finally after sprintf formatting the output will become
4. MD5 encryption of Python
import hashlib
m = hashlib.md5()
m.update("123".encode("utf-8"))
result = m.hexdigest() # m.digest() returns a hexadecimal format similar to the one above, before sprintf formatting
print (result)
Copy the code
Note: MD5Update() executed twice is actually a splicing effect (such as md5.update(md5.update(“123″.encode(” utF-8 “))) is actually equal to md5.update(“123123″.encode(” utF-8 “)), So to avoid that we should re-instantiate every time, MD5Init()
5. Reveal the hash function in MD5 encryption & Inspeckage in Java layer
Here is an example of an APK decompilation:
In reverse exists in the use of Java. Java layer security. The MessageDigest class MD5 encryption, and can be used directly Inspeckage to check MD5 encrypted string before and after the difference
- http://repo.xposed.info/module/mobi.acpm.inspeckage
The MD5 string hook is implemented in HashHook. Java
First hook Java. Security. MessageDigest getInstance, for what is a hash type
And then hook Java. Security. MessageDigest. Update, get the string before the update
Finally hook Java. Security. MessageDigest. Digest, a character before formatting output
conclusion
We can check whether there is a value that needs to be initialized in MD5Init, then check whether there is MD5 encryption, and then hook MD5Update() before and after the input and output string.
In addition, you can use the following parameters to quickly test whether it is MD5 algorithm:
MD5(“”) = d41d8cd98f00b204e9800998ecf8427e
MD5(“a”) = 0cc175b9c0f1b6a831c399e269772661
MD5(“abc”) = 900150983cd24fb0d6963f7d28e17f72
reference
- https://baike.baidu.com/item/MD5/212708?fr=aladdin