Lessons learned from porting 50,000 lines of Java code to Go

Original address:

Lessons learned porting 50k loc from Java to Go
Originally written by Krzysztof Kowalczyk
Source: https://blog.kowalczyk.info
Permanent links to this article:

https://github.com/gocn/translator/blob/master/2019/w15_lessions_learned_porting_50k_loc_from_java_to_go_translation.md
Translator: cvley
Proofreading: Ryan,

I once signed a work contract to migrate a large Java code base to Go.

This code is the Java client of RavenDB, the NoSQL JSON document database. It contains about 50,000 lines of test code.

The result of the migration is a Go client.

This article describes what I learned during this migration process.

Testing, code coverage

Large projects can benefit greatly from automated testing and code coverage tracking.

I used TravisCI and AppVeyor for testing. Codecov. IO is used to detect code coverage. There are many other similar services.

I use both AppVeyor and TravisCI because Travis stopped supporting Windows a year ago and AppVeyor doesn’t support Linux.

If I had to go back to these tools now, I would only use AppVeyor because it now supports testing on Linux and Windows, while TravisCI’s future is uncertain after it was acquired by a private equity firm and fired the original development team.

Codecov is barely competent at code coverage detection. For Go, it treats non-code lines (such as comments) as unexecuted code. You can’t get 100% code coverage with this tool. Coveralls seem to have the same problem.

Better than nothing, but these tools can make things better, especially for Go programs.

Go’s race detection is great

Part of the code uses concurrency, which is error-prone.

Go provides a race detector, which can be turned on at compile time using the -race field.

It slows the program down, but additional checks can detect whether the same memory location is being modified at the same time.

I’ve been running tests with -Race running all the time, and with its alarms, I can fix those race problems very quickly.

Build specific tools for testing

Large projects are difficult to verify through visual inspection. There’s too much code for your brain to remember all at once.

When a test fails, it is also a challenge to find the cause only from the information about the test failure.

The database client driver is connected with the RavenDB database server using HTTP protocol, and the results of the transmitted commands and responses are encoded using JSON.

When porting Java test code to Go, this information would be useful if you could capture Java client and server HTTP traffic and compare it with the HTTP traffic generated by code ported to Go.

I built specific tools to help me do this.

To capture HTTP traffic from Java clients, I built a Logging HTTP proxy using Go, which the Java client uses to interact with the server.

For the Go client, I built a hook that intercepts HTTP requests. I use it to record traffic in a file.

I can then compare the HTTP traffic generated by a Java client with that generated by a Go ported client.

The process of migration

You can’t start migrating 50,000 lines of code at random. I was sure that if I didn’t test and verify each small step, I would be overwhelmed by the complexity of the overall code.

I was new to RavenDB and the Java code base. So my first step was to get a deeper understanding of how the Java code works.

The core of the client is to interact with the server through HTTP. I captured and studied the traffic and wrote the simplest Go code to interact with the server.

When this works, I’m confident I can replicate these features.

My first milestone was porting enough code to pass the test of porting the simplest Java test code.

I used a combination of bottom up and top down.

The bottom-up part is that I locate and migrate the code at the bottom of the call chain that is used to send commands to the server and parse responses.

The top to bottom part is where I step by step trace the test code to be migrated to identify the parts of the functional code that need to be migrated.

After successfully migrating the first step, all that remains is to migrate one test at a time, along with all the code needed to pass the test.

After the test port and test passed, I made some improvements to make the code more Go style.

I believe that this step by step approach is important for completing the migration.

From a psychological point of view, it is important to set short intermediate milestones when dealing with a long project. Achieving these milestones keeps me motivated.

It’s also good to keep your code compilable, runnable, and testable all the time. When you’re finally faced with accumulated flaws, it’s hard to fix them.

The challenges of porting Java to Go

The goal of migration is to be as consistent with the Java code base as possible, because the migrated code needs to be in sync with future changes in Java.

Sometimes I’m surprised by the amount of code I migrate line by line. The most time consuming part of the migration process is reversing the declaration order of variables. Java’s declaration order is Type Name, while Go’s declaration order is Name Type. I really wish I had the tools to do this for me.

String vs. string

In Java, a String is an object that is essentially a reference (pointer). Therefore, the string can be null.

String is a value type in Go. It can’t be nil, it’s just empty.

This is not a big deal, and in most cases I can mindlessly replace null with “”.

Errors vs. exceptions

Java uses exceptions to pass errors.

Go returns the value of the error interface.

The migration is not difficult, but a large number of function signatures need to be modified to allow error values to be returned and propagated on the call stack.

The generic

Go (currently) does not support generics.

Porting generic interfaces is the biggest challenge.

Here is an example of a generic method in Java:

public <T> T load(Class<T> clazz, String id) {Copy the code

Caller:

Foo foo = load(Foo.class, "id")Copy the code

In Go, I use two strategies.

One of them is the use of interface{}, which is made up of values and types, similar to Object in Java. This method is not recommended. While valid, interface{} is not appropriate for users of this library.

In some cases I can use reflection, the above code can be ported as:

func Load(result interface{}, id string) errorCopy the code

I can use reflection to get the type of result and create the value of that type from the JSON document.

Caller’s code:

var result *Fooerr := Load(&result, "id")Copy the code

Function overloading

Go does not (and most likely never will) support function overloading.

I’m not sure I’ve found the right way to port this code.

In some cases, overloading is used to create shorter help functions:

void foo(int a, String b) {}void foo(int a) { foo(a, null); }Copy the code

Sometimes I just throw away the shorter help function.

Sometimes I write two functions:

func foo(a int) {}func fooWithB(a int, b string) {}Copy the code

When the number of potential parameters is large, I sometimes do this:

type FooArgs struct {    A int    B string}func foo(args *FooArgs) { }Copy the code

inheritance

Go is not an object-oriented language and has no inheritance.

Inheritance in the simple case can be migrated using nested methods.

class B : A { }Copy the code

Sometimes it can be ported to:

type A struct { }type B struct {    A}Copy the code

We embed A into B, so B inherits all of A’s methods and fields.

This method does not work for virtual functions.

There is no good way to migrate code that uses virtual functions.

One way to simulate virtual functions is to nest a structure and a function pointer. This essentially reimplements the virtual tables that Java provides for free as part of the Object implementation.

Another way is to write a separate function that schedules the correct function for a given type by type judgment.

interface

Java and Go both have interfaces, but they are different things, like apples and salami.

On rare occasions, I do create interface types for Go to replicate the Java interface.

In most cases, I’ve abandoned interfaces and exposed concrete structures in apis.

Cyclic introduction of dependent packages

Java allows cyclic introduction of packages.

Go is not allowed.

As a result, I couldn’t replicate the package structure of the Java code in migration.

For simplicity, I use a package. This approach is not ideal because the package ends up being bloated. In fact, the package is so bloated that under Windows Go 1.10 cannot handle so many source files within a single package. Fortunately, Go 1.11 fixes this problem.

Private, public, protected

Go designers are underrated. Their ability to simplify concepts is unparalleled, and access control is one example.

Other languages tend to have fine-grained permission controls (fields and methods per class) that specify the smallest possible granularity of public, private, and protected.

The result is that when external code uses the library, the library implements some functions that have the same access rights as other classes in the library.

Go simplifies this concept by having only public and private access limited to the package level.

That makes more sense.

When I want to write a library to, say, parse markdown, I don’t want to expose the internal implementation to the users of the library. But hiding these internal implementations for myself had the opposite effect.

Java developers are aware of this problem and sometimes use interfaces as a technique to fix overly leaky classes. By returning an interface rather than a concrete class, consumers of that class cannot see some of the public interfaces available.

concurrent

Simply put, Go’s concurrency is the best, and the built-in race detector is very helpful to solve the concurrency problem.

As I said, the first migration I did was to emulate the Java interface. For example, I implemented a copy of the Java CompletableFuture class.

Only after the code was ready to run would I reorganize the code to make it more Go style.

Smooth function chain calls

RavenDB has sophisticated query capabilities. Java clients build queries using chained methods:

List<ReduceResult> results = session.query(User.class)                        .groupBy("name")                        .selectKey()                        .selectCount()                        .orderByDescending("count")                        .ofType(ReduceResult.class)                        .toList();Copy the code

Chained calls work only in languages where error interactions occur through exceptions. When a function returns an additional error, the chain call cannot be made as above.

To replicate chain calls in Go, I used a stateful error method:

type Query struct { err error}func (q *Query) WhereEquals(field string, val interface{}) *Query { if q.err ! = nil { return q } // logic that might set q.err return q}func (q *Query) GroupBy(field string) *Query { if q.err ! = nil { return q } // logic that might set q.err return q}func (q *Query) Execute(result inteface{}) error { if q.err ! = nil { return q.err } // do logic}Copy the code

The chain call could be written like this:

var result *Fooerr := NewQuery().WhereEquals("Name", "Frank").GroupBy("Age").Execute(&result)Copy the code

JSON parsing

Java has no built-in JSON parsing functions, and the client uses the Jackson JSON library.

Go has JSON support in the standard library, but it does not provide enough hook functions to show the process of JSON parsing.

I didn’t try to match all the Java functionality, because Go’s built-in JSON support seems flexible enough.

Go code is shorter

Brevity is not a property of Java, but of the culture that writes code that conforms to language conventions.

Setter and getter methods are common in Java. For example, Java code:

class Foo {    private int bar;    public void setBar(int bar) {        this.bar = bar;    }    public int getBar() {        return this.bar;    }}Copy the code

The Go language versions are as follows:

type Foo struct {    Bar int}Copy the code

3 rows versus 11 rows. When you have a large number of classes with a large number of members in them, doing this will add up the classes over and over again.

Most of the other code ends up being about the same length.

Use Notion to organize work

I am a heavy user of Notion. So. At its simplest, Notion is a multilevel note-taking application. Think of it as a cross between Evernote and wiki, carefully designed and implemented by top software designers.

Here is how I use Notion tissue Go transplants to work:

Here are the details:

I have a page with a calendar view, not shown above, for taking short notes of what was done and how much time was spent at a particular time. Since this contract is billed by the hour, the number of hours worked is important information. Thanks to these notes, I know I spent 601 hours on this development over 11 months.
Customers like to know what’s going on. I have a page with a summary of my monthly work that looks like this:

These pages are shared with customers.

A short todo list is useful when starting your day.

I even manage invoices with Notion pages, using the ‘Export to PDF’ function to generate a PDF version of the invoice.

Go programmers for hire

Do you still need Go developers in your company? You can hire me

Additional resources

To address the problem, I provide some additional notes:

Hacker News discussion
/r/golang discussion

Other information:

If you need a NoSQL, JSON document database, try RavenDB. It has a complete set of advanced features.
If you program with Go, you can read the Essential Go programming book for free.
Python: python: python: python: python: python: python: python: python: python: python: python: python: python: python

I reversed the Notion API
I wrote an unofficial Go library for the Notion API
All content on this site is written in Notion and published using my custom toolchain.