Candy · 2015/07/06 15:17

Translated Book :Reverse Engineering for Beginners

Author: Dennis Yurichev

Translator: Candy

54.1 introduction


As you all know, Java has a lot of decomcompilers (or JVM bytecodes) because JVM bytecodes are easier to decompile than other X86 low-level code.

A). Much more information about relevant data types.

B). The JVM (Java Virtual Machine) memory model is more stringent and generalized.

C) The.java compiler does not do any optimization (the JVM JIT is not real-time), so the bytecode in class files is usually clearer and more readable.

When is JVM bytecode knowledge useful?

A). File fast rough patching task, class files do not need to recompile the decompilation results.

B). Analyze the obfuscated code

C). Create your own obfuscation.

D). Create a compiler code generator (backend) target.

We’ll start with a short piece of code and, unless otherwise stated, we’ll use JDK1.7

The command used to decompile class files is ubiquitous: javap -c verbase.

Many of the examples provided in this book use this.

54.2 returns a value

Probably the simplest Java function is to return some value, OH, and we must note that in one case, there are no isolated functions in Java. They are “methods.” Each method is associated with some class, so methods are not defined outside the class. But I still call them functions, that’s how I use them.

#!java
public class ret
{
    public static int main(String[] args)
    {
        return 0;
    }
}
Copy the code

Compile it.

javac ret.java
Copy the code

Decompile using Java standard tools.

javap -c -verbose ret.class
Copy the code

You get the result:

#! java public static int main(java.lang.String[]); flags: ACC_PUBLIC, ACC_STATIC Code: stack=1, locals=1, args_size=1 0: iconst_0 1: ireturnCopy the code

For Java developers, 0 is the most frequently used constant in programming. Because the difference between a short byte short iconst_0 instruction pushing 0, iconst_1 instruction (pushing), iconst_2, and so on, up to iconst5. You can also have iconst_m1, push -1.

As in MIPS, separate a register to the 0 constant: 3.5.2 on page 3.

The stack is used in the JVM to pass parameters and return values when a function is called. So, iconst_0 is the push 0, ireturn instruction, (I stands for integer.) Returns an integer value from the top of the stack.

Let’s write a simple example, now we return 1234:

#!java
public class ret
{
    public static int main(String[] args)
    {
        return 1234;
    }
}
Copy the code

We get:

List:

Public static int main(java.lang.string []); flags: ACC_PUBLIC, ACC_STATIC Code: stack=1, locals=1, args_size=1 0: sipush 1234 3: ireturnCopy the code

Sipush (shot INTEGER) If the stack value is 1234, the slot name implies that a 16bytes value will be pushed. Sipush (short integer) 1234 Specifies the 16-bit value for confirmation.

#!java
public class ret
{
    public static int main(String[] args)
    {
        return 12345678;
    }
}
Copy the code

What’s the larger value?

Listing 54.3 Constant section

. #2 = Integer 12345678 ...Copy the code

The top 5

public static int main(java.lang.String[]);
flags: ACC_PUBLIC, ACC_STATI
Code:
stack=1, locals=1, args_size=1
0: ldc #2 // int 12345678
2: ireturn
Copy the code

Opcodes The JVM’s opcodes cannot be encoded to 32 digits, and the developers abandoned this possibility. Therefore, the 32-bit number 12345678 is stored in a place called the constants area. Let’s say (most used constants (including characters, objects, etc.)) are for us.

Passing constants is not unique to the JVM, nor is it possible for MIPS ARM and other RISC CPUS to encode 32-bit operations into 32-bit numbers, so RISC CPUS (including MIPS and ARM) to construct a value requires a series of steps, or they are stored in a data segment: 28. Three on page 654.291 on page 695.

MIPS codes also have a traditional literal pool. Literal pools are called “lit4″(for 32-bit single-precision floating-point constant storage) and lit8(64-bit double-precision floating-point integer constant pool).

The Boolean

#!java
public class ret
{
    public static boolean main(String[] args)
    {
        return true;
    }
}
Copy the code

public static boolean main(java.lang.String[]);
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=1, args_size=1
0: iconst_1
Copy the code

This JVM bytecode is different from the returned integer, 32-bit data, which is used as a logical value in the parameter. Like C/C++, but not like using an integer or viceversa to return a Boolean, the type information is stored in a class file and checked at run time.

The same is true for 16-bit short integers.

#!java
public class ret
{

    public static short main(String[] args)
    {
        return 1234;
    }
}
Copy the code

public static short main(java.lang.String[]);
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=1, args_size=1
0: sipush 1234
3: ireturn
Copy the code

And char?

#!java
public class ret
{
    public static char main(String[] args)
    {
        return 'A';
    }
}
Copy the code

public static char main(java.lang.String[]);
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=1, args_size=1
0: bipush 65
2: ireturn
Copy the code

Bipush stands for “push byte”. Needless to say, Java char is A 16-bit UTF16 character, equivalent to short, and the single-ASCII A character is 65, which may use instructions to transfer bytes onto the stack.

Let’s try Byte.

#!java
public class retc
{
    public static byte main(String[] args)
    {
        return 123;
    }
}
Copy the code

public static byte main(java.lang.String[]);
flags: ACC_PUBLIC, ACC_STATIC

Code:
stack=1, locals=1, args_size=1
0: bipush 123
2: ireturn
909
Copy the code

Why, you might ask, do bits bother using two 16-bit integers for 32-bit? Why char is also used for char data types and short int types?

The simple answer is data type control and code readability. Char may be essentially the same as short, but we quickly get to grips with its placeholder, 16-bit UTF character, and unlike other integer values. Using short to show you that the range of variables is limited to 16 bits. It’s also a good idea to use Boolean types where needed. The substitution of a C-style int serves the same purpose.

The 64-bit data type of INTEGER in Java.

#!java
public class ret3
{
    public static long main(String[] args)
    {
        return 1234567890123456789L;
    }
}
Copy the code

Listing 54.4 Constant section

. #2 = Long 1234567890123456789l ... public static long main(java.lang.String[]); Flags: ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=1, args_size=1 0: ldc2_w #2 // long ⤦ C 1234567890123456789l 3: lreturnCopy the code

The 64 bits are also stored in the constant area, ldC2_w loads it, and LReturn returns it. The LDC2_W instruction also loads a double-precision floating-point number from the constant area of memory. (also 64-bit)

#! Java public class ret {public static double main(String[] args) {return 123.456d; }}Copy the code

Listing 54.5 Constants section

. #2 = Double 123.456d... public static double main(java.lang.String[]); Flags: ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=1, args_size=1 0: ldc2_w #2 // double 123.456⤦ C d 3: Dreturn dreturn stands for "return double"Copy the code

Finally, single-precision floating point numbers:

#! Java public class ret {public static float main(String[] args) {return 123.456f; }}Copy the code

Listing 54.6 Constant section

. #2 = Float 123.456f... public static float main(java.lang.String[]); Flags: ACC_PUBLIC, ACC_STATIC Code: stack=1, locals=1, args_size=1 0: LDC #2 // float 123.456f 2: freturnCopy the code

The LDC instructions here are loaded from the constant area in the same way as 32-bit integer data. Freturn means “return float”

So what else can the function return?

#!java
public class ret
{
    public static void main(String[] args)
    {
        return;
    }
}
Copy the code

public static void main(java.lang.String[]);
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=0, locals=1, args_size=1
0: return
Copy the code

This means that the return control instruction does not return the actual value, and knowing this makes it easy to deduce the return type of the function (or method) from the last instruction.

54.3 Simple calculation function


Let’s move on to a simple function.

#!java
public class calc
{
    public static int half(int a)
    {
        return a/2;
    }
}
Copy the code

In this case icont_2 will be used.

public static int half(int);
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: iload_0
1: iconst_2
2: idiv
3: ireturn
Copy the code

Iload_0 takes zero as an argument to the function and pushes it onto the stack. Iconst_2 pushes 2 onto the stack, and when these two instructions are executed, the stack looks something like this.

+---+
TOS ->| 2 |
+---+
| a |
+---+
Copy the code

An idiv carries two values at the top of the stack. There is only one value which returns a result at the top of the stack.

+--------+
TOS ->| result |
+--------+
Copy the code

Ireturn gets the ratio of return. Let’s deal with double precision floating point integers.

#! Public static double half_double(double a) {return a/2.0; }}Copy the code

Listing 54.7 Constant section

. #2 = Double 2.0d... public static double half_double(double); Flags: ACC_PUBLIC, ACC_STATIC Code: stack=4, locals=2, args_size=1 0: dload_0 1: ldc2_w #2 // double 2.0d 4: ddiv 5: dreturnCopy the code

Similarly, except that the LDC2_W directive loads 2.0 from the constant section. In addition, all the other three directives have a D prefix, meaning that they work under the double data type.

We now use a function with two arguments.

#!java
public class calc
{
    public static int sum(int a, int b)
    {
        return a+b;
    }
}
Copy the code

#! bash public static int sum(int, int); flags: ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=2, args_size=2 0: iload_0 1: iload_1 2: iadd 3: ireturnCopy the code

Iload_0 loads the first function argument (a), iload_2 the second argument (b), and the stack looks like this:

+---+
TOS ->| b |
+---+
| a |
+---+
Copy the code

Iadds two values and returns the result at the top of the stack.

+--------+ TOS ->| result | +--------+
Copy the code

Let’s extend this example to long integer data types.

#! java public static long lsum(long a, long b) { return a+b; }Copy the code

Here’s what we get:

#! java public static long lsum(long, long); flags: ACC_PUBLIC, ACC_STATIC Code: stack=4, locals=4, args_size=2 0: lload_0 1: lload_2 2: ladd 3: lreturnCopy the code

The second (load) instruction takes the second parameter from the second parameter slot. This is because the value of the 64-bit long integer takes up bits and uses another 2-bit parameter slot.

A slightly more complicated example

#!java
public class calc
{
    public static int mult_add(int a, int b, int c)
    {
        return a*b+c;
    }
}
Copy the code

public static int mult_add(int, int, int);
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=3, args_size=3
0: iload_0
1: iload_1
2: imul
3: iload_2
4: iadd
5: ireturn
Copy the code

The first is multiplication, where the product is stored at the top of the stack.

+---------+
TOS ->| product |
+---------+
Copy the code

Iload_2 loading the third parameter (C) is pushed.

+---------+
TOS ->| c |
+---------+
| product |
+---------+
Copy the code

The iAdd directive can now add two values.

54.4 JVM memory model


While X86 and other low-level environment systems use stacks to pass parameters and store local variables, the JVM is slightly different.

The local variable array (LVA) is used to store parameters of incoming functions and local variables. The iload_0 directive loads the value from there, and istore stores the value from there. First, the function argument arrives: starting from 0 or 1(if the 0 argument is used by this pointer). Then the local local variable is allocated.

Each slot is 32 bits in size, so both long and double data types have two slots.

Operand stacks (or just “stacks”) are used to evaluate and pass arguments when other functions are called. Unlike low-level X86 environments, it cannot access the stack without explicitly using Intermediate and POPS instructions to perform stack access operations.

54.5 Simple function calls


Mathrandom () returns a pseudo-random number in the range of “0.0…1.0”, but for some reason we often need to design a function that returns a value in the range of “0.0…0.5”.

#!java
public class HalfRandom
{
    public static double f()
    {
        return Math.random()/2;
    }
}
Copy the code

The constant area

. #2 = Methodref #18.#19 // Java /lang/ math. ⤦ C random ()D 6(Java) Local Variable Array #3 = Double 2.0 D... #12 = Utf8 ()D ... #18 = Class #22 // java/lang/Math #19 = NameAndType #23:#12 // random:()D #22 = Utf8 java/lang/Math #23 = Utf8 random public static double f(); flags: ACC_PUBLIC, ACC_STATIC Code: stack=4, locals=0, args_size=0 0: Invokestatic #2 // Method Java /⤦ C lang/ math.random :()D 3: ldc2_w #3 // double 2.0 D 6: ddiv 7: dreturnCopy the code

The Java native variable array 916 is statically executed by calling math.random(), which returns the value at the top of the stack. The result is returned by 0.5, but how is the function name encoded? Code in the constant area using a methodres expression that defines the name of the class and method. The first methodref field points to an expression, and second, to a normal text character (” Java /lang/math”). The second methodref expression points to a name and type expression, linking two characters at the same time. The first method has a string of the name “random” and the second string is “()D” to encode the function type. It assumes that these two values (and therefore D is a string) are in such a way that the JVM can check the correctness of the data type: 2) The Java decompiler can modify the data type from the compiled class file.

Finally, we tried using “Hello, world!” As an example.

#!java
public class HelloWorld
{
    public static void main(String[] args)
    {
        System.out.println("Hello, World");
    }
}
Copy the code

The constant area

917 constant LDC line offset 3, pointing to “Hello, world!” String, and push it onto the stack, which in Java is called drinking, which is really just a pointer, or an address.

. #2 = Fieldref #16.#17 // Java /lang/ system. ⤦ C out:Ljava/ IO /PrintStream; #3 = String #18 // Hello, World #4 = Methodref #19.#20 // Java/IO /⤦ V ... #16 = Class #23 // Java /lang/System #17 = NameAndType #24:#25 // out:Ljava/ IO /⤦ C PrintStream; #18 = Utf8 Hello, World #19 = Class #26 // Java/IO /⤦ C PrintStream #20 = NameAndType #27:#28 // println:(Ljava/⤦ C lang/String) V ... #23 = Utf8 java/lang/System #24 = Utf8 out #25 = Utf8 Ljava/io/PrintStream; #26 = Utf8 java/io/PrintStream #27 = Utf8 println #28 = Utf8 (Ljava/lang/String;) V ... public static void main(java.lang.String[]); flags: ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=1, args_size=1 0: Getstatic # 2 / / Field Java/lang ⤦ C/System. Out: Ljava/IO/PrintStream; 3: LDC #3 // String Hello, ⤦ C World 5: invokevirtual #4 // Method Java/IO ⤦ C/printstream.println: Ljava/lang/String;) V 8: returnCopy the code

The common invokevirtual directive takes the information from the constant area and calls the pringln() method. It seems that we know println() is applicable to all types of data. My version of the println() function presets the string type.

But what does the first getStatic directive do? This directive takes a reference or address to a field of object information. So inside print method takes two arguments: 1 to the object’s this pointer and 2) the address of the string “hello, world”. Indeed, println() is called outside of the object being initialized, For convenience, Javap uses tools to write all the information into comments.

54.6 Calling beep()


This is probably the easiest way to call two functions with no arguments.

#! java public static void main(String[] args) { java.awt.Toolkit.getDefaultToolkit().beep(); };Copy the code

#! bash public static void main(java.lang.String[]); flags: ACC_PUBLIC, ACC_STATIC Code: stack=1, locals=1, args_size=1 0: Invokestatic # 2 / Java/Method / ⤦ C awt/Toolkit getDefaultToolkit: () Ljava/awt/Toolkit; 3: Invokevirtual #3 // Method Java /⤦ C awt/Toolkit. Beep :()V 6: returnCopy the code

First, invokeStatic invokes the Javaawt.Tooltool.getDefaulttookKit () function at line 0 offset, which returns a reference to the Toolkit class object, and invokedvirtualIFge invokes the beep () method of that class at line 3 offset.