This article has participated in the good article call order activity, click to see: back end, big front end double track submission, 20,000 yuan prize pool for you to challenge!

Let’s look at the results

Take July 1 as an example

Top10 total likes for July 1

The user Always won the praise
Carmelo Anthony 234
chokcoco 185
Hand tore red black tree 148
The sea has 114
Little Jay the Programmer 104
Month with the flying fish 103
LBJ 100
Tao department front end team 92
Code farmer on the island 89
The cloud of the world 80

Top10 most viewed times for July 1

The user Always browse
Hand tore red black tree 6037
chokcoco 5689
Los bamboo 5495
Small Y’s of dumb code 5254
Huawei Developer Forum 4829
Carmelo Anthony 4445
An old liu 3967
The sea has 3784
alphardex 3459
Tao department front end team 3448

Top10 likes in a single time segment on July 1

The user Period of time Won the praise
Month with the flying fish 07-01 00:41 07-01 00:46 45
Little Jay the Programmer 07-01 09:59-07-01 10:04 37
The world of mortals refined heart 07-01 21:42-07-01 21:47 27
LBJ 07-01 21:32-07-01 21:37 26
Fat DA 07-01 17:41-07-01 17:46 20
The village son 07-01 01:31-07-01 01:36 17
Code farmer on the island 07-01 08:43-07-01 08:49 15
Xiao ray 07-01 22:07-07-01 22:12 13
Peeling my shell 07-01 17:31-07-01 17:36 10
An old liu 07-01 19:01-07-01 19:06 10

Top10 views in a single time range on July 1

The user Period of time By browsing
Huawei Developer Forum 07-01 11:14-07-01 11:19 874
Fishing experts 07-01 15:25-07-01 20:52 210
Honest1y 07-01 16:45-07-01 16:50 157
mPaaS 07-01 11:44-07-01 20:06 103
An old liu 07-01 13:15-07-01 13:20 101
Los bamboo 07-01 09:44-07-01 09:49 89
chokcoco 07-01 13:47-07-01 13:55 89
Hand tore red black tree 07-01 09:34-07-01 09:39 89
alphardex 07-01 10:24-07-01 10:29 85
Love tinkering program ape 07-01 11:29-07-01 11:34 84

Is the data accurate? A: Very accurate

Curious how you did it?

Open to

  1. Monitor the authors, pull the data, where to find the authors, I’m using the author list here,
    • The downside is that you lose some authors who aren’t on the list

  1. Fetching the data

    • We see a lot of categories, so let’s focus on categories first
    • I’m not going to explain this. Look at the code
        private static List<Category> getAllCategory(a) {
            String res = Http.get("https://api.juejin.cn/tag_api/v1/query_category_briefs?show_type=1");
            // Fetch all tags
            JSONArray data = JSONUtil.parseObj(res).getJSONArray("data");
            return JSONUtil.toList(data, Category.class);
        }
    
        // Use an inner class
        static class Category implements Serializable {
    
            private String category_id;
            private String category_name;
            private String category_url;
    
            // Omit the get set method
        }
    
    Copy the code
    • Pull all authors

    public static void run(a) {
        System.out.println("Pull classification");
        List<Category> categoryList = getAllCategory();
        while (true) {
            String now = LocalDateTime.now().format(dateFormat);
            HashMap<String, Author> authorHashMap = new HashMap<>();
    
            System.out.println("Start pulling data:" + now);
    
    // Get all authors
            for (Category category : categoryList) {
                try {
                    List<Author> authorList = getAllAuthor(category);
                    for(Author author : authorList) { author.setTime(now); authorHashMap.put(author.getUser_id(), author); }}catch(Exception e) { e.printStackTrace(); }}// Save the data authorHashMap
            // Append to the end of the file, otherwise there is not enough memory
            try {
                String path = "./j-" + LocalDate.now().format(dayFormat) + ".json";
                initFile(path);
                FileWriter fw = new FileWriter(path, true);
                PrintWriter pw = new PrintWriter(fw);
                pw.println(JSONUtil.toJsonStr(MapUtil.of(now, authorHashMap.values())));  // No newline is required at the end of the string
                pw.close();
                fw.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
               
            // Wait for a while and continue to pull
            System.out.println("End of data pull:" + now);
            try {
                Thread.sleep(pullTime * 1000);
            } catch(InterruptedException e) { e.printStackTrace(); }}}// Get all authors
    private static String getUrl(String categoryId) {
        return "https://api.juejin.cn/user_api/v1/author/recommend?category_id=" + categoryId + "&cursor=0&limit=100";
    }
    private static List<Author> getAllAuthor(Category category) {
        try {
            String res = Http.get(getUrl(category.getCategory_id()));
            JSONArray data = JSONUtil.parseObj(res).getJSONArray("data");
            return JSONUtil.toList(data, Author.class);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return Collections.emptyList();
    }
    
    static class Author implements Serializable {
        private String user_id;
        private String user_name;
        private String got_digg_count;
        private String got_view_count;
        private String avatar_large;
        private String company;
        private String job_title;
        private String level;
        private String description;
        private String author_desc;
        private String time;
    
        // Get set...
    }
    
    Copy the code
    • This is the end of pulling data

Analyze the data

This analysis is using Scala, Java is too chicken ribs, inconvenient

The data is about 60M a day, and the results are produced in seconds

  • The first step is to read the data file
    val map: mutable.Map[String.List[Author]] = mutable.ListMap(a)def load() :Unit = {
        val lineList = new util.ArrayList[String] ()IoUtil.readLines(new FileInputStream("./j-20210630.json"), StandardCharsets.UTF_8, lineList)
        lineList.forEach(line => {
            val type1: Type = new TypeReference[util.Map[String, util.List[Author]]] {}.getType
            val bean: util.Map[String, util.List[Author]] = JSONUtil.toBean(line, type1, true)
            bean.asScala.foreach(entry => map.put(entry._1, entry._2.asScala.toList))
        })
    }
Copy the code
  • More demand analysis data
    // There are a lot of places to optimize, but the data is small
    def main(args: Array[String) :Unit = {
        // Loading data is the method above
        load()
        //1. Obtain all values (each time period, all authors)
        / / 2. Flat
        //3. Create more user groups
        //4. Each user's all data, more time sorting
        //5. Calculate the data
        val map1 = map.values.flatten.groupBy(_.getUser_id).map(m => {
            (m._1, m._2.toList.sortBy(_.getTime))
        }).map(m => {
            val value: List[Author] = m._2
            // Find the total number of likes and total number of views
            val day_got_digg_count = value.last.getGot_digg_count.toInt - value.head.getGot_digg_count.toInt
            val day_got_view_count = value.last.getGot_view_count.toInt - value.head.getGot_view_count.toInt
            // Find the time period with the most likes and views
            var max_got_digg_count = 0;
            var max_got_digg_count_time = ""
            value.sliding(2.2).foreach(l => {
                val head = l.head
                val last = l.last
                val value1 = last.getGot_digg_count.toInt - head.getGot_digg_count.toInt
                if (value1 > max_got_digg_count) {
                    max_got_digg_count = value1
                    max_got_digg_count_time = s"${getOutTime(head.getTime)} - ${getOutTime(last.getTime)}"}})var max_got_view_count = 0
            var max_got_view_count_time = ""
            value.sliding(2.2).foreach(l => {
                val head = l.head
                val last = l.last
                val value1 = last.getGot_view_count.toInt - head.getGot_view_count.toInt
                if (value1 > max_got_view_count) {
                    max_got_view_count = value1
                    max_got_view_count_time = s"${getOutTime(head.getTime)} - ${getOutTime(last.getTime)}"}})// Packaging results
            val head = value.head
            (m._1, Map(
                "user_name" -> head.getUser_name,
                "user_id" -> head.getUser_id,
                "day_got_digg_count" -> day_got_digg_count,
                "day_got_view_count" -> day_got_view_count,
                "max_got_digg_count" -> max_got_digg_count,
                "max_got_digg_count_time" -> max_got_digg_count_time,
                "max_got_view_count" -> max_got_view_count,
                "max_got_view_count_time" -> max_got_view_count_time,
            ))
        })
        
        // here, take all the results, invert them according to demand, and take out the Top10
        
        println("\n----------------- Top10 praises of the day ------------------")
        printf("|%-12s\t|%-5s|\n"."User"."Always liked.")
        printf("|%-12s\t|%-5s|\n"."-" * 12."-" * 5)
        map1.values.toList.sortBy(value => value("day_got_digg_count").asInstanceOf[Int]) (Ordering.Int.reverse).take(10).foreach(value => {
            printf("|%-12s\t|%-5s|\n", value("user_name"), value("day_got_digg_count"))
        })

        println(\n----------------- Top10------------------)
        printf("|%-12s\t|%-5s|\n"."User"."General View")
        printf("|%-12s\t|%-5s|\n"."-" * 12."-" * 5)
        map1.values.toList.sortBy(value => value("day_got_view_count").asInstanceOf[Int]) (Ordering.Int.reverse).take(10).foreach(value => {
            printf("|%-12s\t|%-5s|\n", value("user_name"), value("day_got_view_count"))
        })

        println("\ n -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- on the day of the single period won praise may need -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --")
        printf("|%-12s\t|%-25s\t|%-5s|\n"."User"."Time period"."Praise")
        printf("|%-12s\t|%-25s\t|%-5s|\n"."-" * 12."-" * 25."-" * 5)
        map1.values.toList.sortBy(value => value("max_got_digg_count").asInstanceOf[Int]) (Ordering.Int.reverse).take(10).foreach(value => {
            printf("|%-12s\t|%-25s\t|%-5s|\n", value("user_name"), value("max_got_digg_count_time"), value("max_got_digg_count"))
        })

        println("\ n -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the single time browsing need -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --")
        printf("|%-12s\t|%-25s\t|%-5s|\n"."User"."Time period"."Visited")
        printf("|%-12s\t|%-25s\t|%-5s|\n"."-" * 12."-" * 25."-" * 5)
        map1.values.toList.sortBy(value => value("max_got_view_count").asInstanceOf[Int]) (Ordering.Int.reverse).take(10).foreach(value => {
            printf("|%-12s\t|%-25s\t|%-5s|\n", value("user_name"), value("max_got_view_count_time"), value("max_got_view_count"))})}Copy the code

The final result

Interested in discussion can be discussed together 🥰 welcome big guy directions want to do a real-time display, there are want to help the front end big guy, welcome DD