This article has participated in the good article call order activity, click to see: back end, big front end double track submission, 20,000 yuan prize pool for you to challenge!
Let’s look at the results
Take July 1 as an example
Top10 total likes for July 1
The user | Always won the praise |
---|---|
Carmelo Anthony | 234 |
chokcoco | 185 |
Hand tore red black tree | 148 |
The sea has | 114 |
Little Jay the Programmer | 104 |
Month with the flying fish | 103 |
LBJ | 100 |
Tao department front end team | 92 |
Code farmer on the island | 89 |
The cloud of the world | 80 |
Top10 most viewed times for July 1
The user | Always browse |
---|---|
Hand tore red black tree | 6037 |
chokcoco | 5689 |
Los bamboo | 5495 |
Small Y’s of dumb code | 5254 |
Huawei Developer Forum | 4829 |
Carmelo Anthony | 4445 |
An old liu | 3967 |
The sea has | 3784 |
alphardex | 3459 |
Tao department front end team | 3448 |
Top10 likes in a single time segment on July 1
The user | Period of time | Won the praise |
---|---|---|
Month with the flying fish | 07-01 00:41 07-01 00:46 | 45 |
Little Jay the Programmer | 07-01 09:59-07-01 10:04 | 37 |
The world of mortals refined heart | 07-01 21:42-07-01 21:47 | 27 |
LBJ | 07-01 21:32-07-01 21:37 | 26 |
Fat DA | 07-01 17:41-07-01 17:46 | 20 |
The village son | 07-01 01:31-07-01 01:36 | 17 |
Code farmer on the island | 07-01 08:43-07-01 08:49 | 15 |
Xiao ray | 07-01 22:07-07-01 22:12 | 13 |
Peeling my shell | 07-01 17:31-07-01 17:36 | 10 |
An old liu | 07-01 19:01-07-01 19:06 | 10 |
Top10 views in a single time range on July 1
The user | Period of time | By browsing |
---|---|---|
Huawei Developer Forum | 07-01 11:14-07-01 11:19 | 874 |
Fishing experts | 07-01 15:25-07-01 20:52 | 210 |
Honest1y | 07-01 16:45-07-01 16:50 | 157 |
mPaaS | 07-01 11:44-07-01 20:06 | 103 |
An old liu | 07-01 13:15-07-01 13:20 | 101 |
Los bamboo | 07-01 09:44-07-01 09:49 | 89 |
chokcoco | 07-01 13:47-07-01 13:55 | 89 |
Hand tore red black tree | 07-01 09:34-07-01 09:39 | 89 |
alphardex | 07-01 10:24-07-01 10:29 | 85 |
Love tinkering program ape | 07-01 11:29-07-01 11:34 | 84 |
Is the data accurate? A: Very accurate
Curious how you did it?
Open to
- Monitor the authors, pull the data, where to find the authors, I’m using the author list here,
- The downside is that you lose some authors who aren’t on the list
-
Fetching the data
- We see a lot of categories, so let’s focus on categories first
- I’m not going to explain this. Look at the code
private static List<Category> getAllCategory(a) { String res = Http.get("https://api.juejin.cn/tag_api/v1/query_category_briefs?show_type=1"); // Fetch all tags JSONArray data = JSONUtil.parseObj(res).getJSONArray("data"); return JSONUtil.toList(data, Category.class); } // Use an inner class static class Category implements Serializable { private String category_id; private String category_name; private String category_url; // Omit the get set method } Copy the code
- Pull all authors
public static void run(a) { System.out.println("Pull classification"); List<Category> categoryList = getAllCategory(); while (true) { String now = LocalDateTime.now().format(dateFormat); HashMap<String, Author> authorHashMap = new HashMap<>(); System.out.println("Start pulling data:" + now); // Get all authors for (Category category : categoryList) { try { List<Author> authorList = getAllAuthor(category); for(Author author : authorList) { author.setTime(now); authorHashMap.put(author.getUser_id(), author); }}catch(Exception e) { e.printStackTrace(); }}// Save the data authorHashMap // Append to the end of the file, otherwise there is not enough memory try { String path = "./j-" + LocalDate.now().format(dayFormat) + ".json"; initFile(path); FileWriter fw = new FileWriter(path, true); PrintWriter pw = new PrintWriter(fw); pw.println(JSONUtil.toJsonStr(MapUtil.of(now, authorHashMap.values()))); // No newline is required at the end of the string pw.close(); fw.close(); } catch (IOException e) { e.printStackTrace(); } // Wait for a while and continue to pull System.out.println("End of data pull:" + now); try { Thread.sleep(pullTime * 1000); } catch(InterruptedException e) { e.printStackTrace(); }}}// Get all authors private static String getUrl(String categoryId) { return "https://api.juejin.cn/user_api/v1/author/recommend?category_id=" + categoryId + "&cursor=0&limit=100"; } private static List<Author> getAllAuthor(Category category) { try { String res = Http.get(getUrl(category.getCategory_id())); JSONArray data = JSONUtil.parseObj(res).getJSONArray("data"); return JSONUtil.toList(data, Author.class); } catch (Exception e) { e.printStackTrace(); } return Collections.emptyList(); } static class Author implements Serializable { private String user_id; private String user_name; private String got_digg_count; private String got_view_count; private String avatar_large; private String company; private String job_title; private String level; private String description; private String author_desc; private String time; // Get set... } Copy the code
- This is the end of pulling data
Analyze the data
This analysis is using Scala, Java is too chicken ribs, inconvenient
The data is about 60M a day, and the results are produced in seconds
- The first step is to read the data file
val map: mutable.Map[String.List[Author]] = mutable.ListMap(a)def load() :Unit = {
val lineList = new util.ArrayList[String] ()IoUtil.readLines(new FileInputStream("./j-20210630.json"), StandardCharsets.UTF_8, lineList)
lineList.forEach(line => {
val type1: Type = new TypeReference[util.Map[String, util.List[Author]]] {}.getType
val bean: util.Map[String, util.List[Author]] = JSONUtil.toBean(line, type1, true)
bean.asScala.foreach(entry => map.put(entry._1, entry._2.asScala.toList))
})
}
Copy the code
- More demand analysis data
// There are a lot of places to optimize, but the data is small
def main(args: Array[String) :Unit = {
// Loading data is the method above
load()
//1. Obtain all values (each time period, all authors)
/ / 2. Flat
//3. Create more user groups
//4. Each user's all data, more time sorting
//5. Calculate the data
val map1 = map.values.flatten.groupBy(_.getUser_id).map(m => {
(m._1, m._2.toList.sortBy(_.getTime))
}).map(m => {
val value: List[Author] = m._2
// Find the total number of likes and total number of views
val day_got_digg_count = value.last.getGot_digg_count.toInt - value.head.getGot_digg_count.toInt
val day_got_view_count = value.last.getGot_view_count.toInt - value.head.getGot_view_count.toInt
// Find the time period with the most likes and views
var max_got_digg_count = 0;
var max_got_digg_count_time = ""
value.sliding(2.2).foreach(l => {
val head = l.head
val last = l.last
val value1 = last.getGot_digg_count.toInt - head.getGot_digg_count.toInt
if (value1 > max_got_digg_count) {
max_got_digg_count = value1
max_got_digg_count_time = s"${getOutTime(head.getTime)} - ${getOutTime(last.getTime)}"}})var max_got_view_count = 0
var max_got_view_count_time = ""
value.sliding(2.2).foreach(l => {
val head = l.head
val last = l.last
val value1 = last.getGot_view_count.toInt - head.getGot_view_count.toInt
if (value1 > max_got_view_count) {
max_got_view_count = value1
max_got_view_count_time = s"${getOutTime(head.getTime)} - ${getOutTime(last.getTime)}"}})// Packaging results
val head = value.head
(m._1, Map(
"user_name" -> head.getUser_name,
"user_id" -> head.getUser_id,
"day_got_digg_count" -> day_got_digg_count,
"day_got_view_count" -> day_got_view_count,
"max_got_digg_count" -> max_got_digg_count,
"max_got_digg_count_time" -> max_got_digg_count_time,
"max_got_view_count" -> max_got_view_count,
"max_got_view_count_time" -> max_got_view_count_time,
))
})
// here, take all the results, invert them according to demand, and take out the Top10
println("\n----------------- Top10 praises of the day ------------------")
printf("|%-12s\t|%-5s|\n"."User"."Always liked.")
printf("|%-12s\t|%-5s|\n"."-" * 12."-" * 5)
map1.values.toList.sortBy(value => value("day_got_digg_count").asInstanceOf[Int]) (Ordering.Int.reverse).take(10).foreach(value => {
printf("|%-12s\t|%-5s|\n", value("user_name"), value("day_got_digg_count"))
})
println(\n----------------- Top10------------------)
printf("|%-12s\t|%-5s|\n"."User"."General View")
printf("|%-12s\t|%-5s|\n"."-" * 12."-" * 5)
map1.values.toList.sortBy(value => value("day_got_view_count").asInstanceOf[Int]) (Ordering.Int.reverse).take(10).foreach(value => {
printf("|%-12s\t|%-5s|\n", value("user_name"), value("day_got_view_count"))
})
println("\ n -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- on the day of the single period won praise may need -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --")
printf("|%-12s\t|%-25s\t|%-5s|\n"."User"."Time period"."Praise")
printf("|%-12s\t|%-25s\t|%-5s|\n"."-" * 12."-" * 25."-" * 5)
map1.values.toList.sortBy(value => value("max_got_digg_count").asInstanceOf[Int]) (Ordering.Int.reverse).take(10).foreach(value => {
printf("|%-12s\t|%-25s\t|%-5s|\n", value("user_name"), value("max_got_digg_count_time"), value("max_got_digg_count"))
})
println("\ n -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the single time browsing need -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --")
printf("|%-12s\t|%-25s\t|%-5s|\n"."User"."Time period"."Visited")
printf("|%-12s\t|%-25s\t|%-5s|\n"."-" * 12."-" * 25."-" * 5)
map1.values.toList.sortBy(value => value("max_got_view_count").asInstanceOf[Int]) (Ordering.Int.reverse).take(10).foreach(value => {
printf("|%-12s\t|%-25s\t|%-5s|\n", value("user_name"), value("max_got_view_count_time"), value("max_got_view_count"))})}Copy the code
The final result
Interested in discussion can be discussed together 🥰 welcome big guy directions want to do a real-time display, there are want to help the front end big guy, welcome DD