The Top 100 Java Libraries in 2017-based on 259,885 Source Files. Original author :Henn Idan
A year has passed, as if we had just analyzed the 2016 Top Java libraries on GitHub yesterday. This year, we used Google’s BigQuery for data retrieval to get more accurate results.
BigQuery is a fully hosted petabyte low-cost enterprise data warehouse designed by Google specifically for data analytics needs. The service allows developers to run SQL statements against large databases using Google’s architecture. BigQuery scans 1 TERabyte of data in seconds and 1 petabyte of data in minutes.
First, we pulled the top 1,000 Java repositories from GitHub, ranked by star number, and then filtered out Android projects, leaving 477 pure Java projects.
Our analysis is based on these 477 pure Java projects. We counted all the class library imports after de-duplication. A more in-depth introduction to statistical methods is provided at the bottom of the article.
Without further ado, let’s take a look at the most popular Java libraries of 2017. And who’s holding the no. 1 spot this year?
Top 20 Most popular Java class libraries
The number one class library, as it was last year, is stillJUnit
. Based on itJUnit Runner
Take the second place, or even the older onejunit.framework
This time, he is also in third place. That is to say,JUnit
All the top three.
Mockito, the open source mock testing framework, ranks fourth.
Mockito is a powerful Mock testing framework for Java development that allows you to create and configure Mock objects to simplify the testing of classes that have external dependencies.
Slf4j, the logging component in Java, ranks fifth. This is a sign that today’s developers love logging. It can also be seen that Java developers have low usage of the java.util.logging library. We’ve also looked at some of the logging habits and preferences of Java developers. Organize it in your eBook.
The rise of the Hamcrest library shows that developers really need a better testing environment.
Hamcrest is a framework that assists in writing software tests in the Java language. It supports the creation of custom assertion matchers (the name “Hamcrest” is an anemic formation of “matchers”), allowing declarative definition of matching rules. These matchers are useful in unit testing frameworks such as JUnit and jMock.
Analyzing the top libraries, we found that testing is very important for writing better code. This speaks to the fact that online problems are the last thing developers want, so we do everything we can to avoid them. (There are also some ads for the author’s website in this section, which I won’t translate.)
Google’s Guava library ranks seventh. The most popular JSON class library is Jackson. At number 20 on the list is a new class library: org.w3c.dom. It provides a series of interfaces for manipulating the DOM.
Other libraries that deserve our attention
Looking at the top 100, we found that Spring did very well. The following eight libraries make the top 100:
# 57 - org. Springframework. Beans. Factory. The annotation # 60 - org. Springframework. Context # 65 - Org. Springframework. Context. The annotation # 66 - org. Springframework. Stereotype # 68 - org. Springframework. Util # 81 - Org. Springframework. Test. The context. Junit 4 # 85 - org. Springframework. Beans. Factory # 91 - org.springframework.web.bind.annotationCopy the code
In addition to Spring, the Apache class library has a wide range of applications:
# 16 - org.apache.com mons. IO # 22 - org. Apache. HTTP # 24 - org.apache.com mons. Lang # 25 - org. Apache. HTTP. Impl. Client # 30 - . Org. Apache. HTTP client # 33 - org. Apache.. HTTP client. The methods # 34 - org.. Apache log4j # 35 - mons. Org.apache.com codec. The binary # 45 - org.apache.com mons. Lang3 # 53 - org. Apache. HTTP. Entity # 61 - org. Apache. HTTP. Util # 64 - org.apache.com mons. Logging Org.apache.http. Message #88 -- org.apache.zookeeper #95 -- org.apache.hadoop.conf #98 -- org.apache.hadoop.conf Org. Apache. HTTP. Client. Config # 100 - org. Apache.. HTTP client. UtilsCopy the code
I’m glad to see that the Apache class library is doing so well. I’m a big fan of not reinventing wheels, and some of the methods we might use in everyday development are best implemented in Apache’s library. Such as processing IO streams, processing collections, and so on.
AssertJ, a significant improvement on last year’s ranking, provides Java with Fluent assertions. This year it climbed to number 50.
We were also found in the list of the javax.mail. The script and org. Apache.. HTTP client. Utils these two scripts API.
The scripting API is used by application programmers who want to execute programs written in the scripting language in their Java applications.
The Top 100 Java libraries in 2017
Analysis method
As mentioned in this article, we used Google’s BigQuery to process data this year. We pulled 1000 copies of the repository code through the GitHub API. After filtering out Android, Arduino, and some outdated repositories, we still have 259,885 Java source files left. After we de-duplicated the libraries used in the same repository, 25,788 libraries remained.
How do we actually do that?
First, we create a repository table to store the top star repositories, named JAVA_top_repos_filtered: javA_top_repos_filtered
SELECT full_name FROM java_top_repos_1000 WHERE NOT ((LOWER(full_name) CONTAINS 'android') OR (LOWER(full_name) CONTAINS 'arduino')) AND ((description IS null) OR (NOT ((LOWER(description) CONTAINS 'android') OR (LOWER(description) CONTAINS 'arduino') OR (LOWER(description) CONTAINS 'deprecated'))));Copy the code
Now we have the names of the top-ranked libraries, and we pull them all down:
SELECT
repo_name,
content
FROM
[bigquery-public-data:github_repos.contents] AS contents
INNER JOIN
(
SELECT
id,
repo_name
FROM
[bigquery-public-data:github_repos.files] AS files
INNER JOIN
java_top_repos_filtered AS top_repos
ON
files.repo_name = top_repos.full_name
WHERE
path LIKE '%.java'
) AS files_filtered
ON
contents.id = files_filtered.id;
Copy the code
Now that we have the source code for each project, we need to filter out the deduplicated import statements and then extract the package name.
SELECT
package,
COUNT(*) count
FROM
( //extract package name (exclude last point of data) and group with repo name (to count each package once per repo)
SELECT
REGEXP_EXTRACT(import_line, r' ([a-z0-9\._]*)\.') package,
repo_name
FROM
( //extract only 'import' code lines from *.java files
SELECT
SPLIT(content, '\n') import_line,
repo_name
FROM
java_relevant_data
HAVING
LEFT(import_line, 6) = 'import'
)
GROUP BY
package,
repo_name
)
GROUP BY
package
ORDER BY
count DESC;
Copy the code
Finally, we filter again to make sure there are no native libraries provided by Android, Arduino, outdated, or Java.
SELECT
*
FROM
java_top_package_count
WHERE
NOT ((LEFT(package, 5) = 'java.') OR
(LOWER(package) CONTAINS 'android'))
ORDER BY
count DESC;
Copy the code
At this point, you have a list of the Top 100 Java libraries of 2017.
One last thought
One main takeaway: Those libraries that were popular in 2016 will remain popular in 2017. This shows that the developers, teams, or companies behind these libraries are working hard to make them better.
This also means that if you’re going to start writing your own Java project, or if you’re in daily development, our spreadsheet can provide some good advice. These top-ranked libraries are a good choice.