planning

Recently, I had a discussion with several teachers in our group about the ranking of github’s followers in China. You Yuxi ranked first with 7.4W followers, Ruan Yifeng ranked second with 6.7W followers, and Liao Xuefeng ranked third with 3.4W followers.

In addition, Yuxi is the first in China and the second in the world. No. 1 in the world is Linus, the founder of Linux

Although there have been a lot of websites on the market to do this ranking statistics, but I always feel what shortcomings, such as:

  • Historical statistics

If it takes long enough, we can even recover a mileage map of the Open source community in China. I have to crawl the data myself, otherwise I have to wait for someone else to do it.

  • View your rankings and generate posters

If I want to share my ranking, I can only take screenshots, rather than create a nice poster myself.

design

Search the leaderboards on Petal or Dribbble

Find a style that works and refer to the layout. As an experienced front-end, you already have CSS in mind when you look at various designs, right?

Selection of crawler scheme

As always, I don’t want to use my own server for this crawler. I want to keep it running as regularly as possible in a free and stable cloud service.

  • Solution 1: uniCloud cloud service
advantages Can schedule tasks, can directly connect to the cloud database, cloud storage
disadvantages Timeout time is only 60 seconds, and github thousands of data crawlers, it takes about 2 minutes to finish crawling. And accessing the Github API from a domestic server is not stable.
  • Option 2: Github Actions
advantages Can be scheduled tasks, with github services to climb github data very fast
disadvantages Cannot directly relate to my cloud database in uniCloud

Since the timeout of plan 1 is fixed and relatively short, sticking with it can also design a crawl scheme, but it is too cumbersome. Take this opportunity to learn about GithubActions, and send the data back to uniCloud cloud function after crawling, and then input the data into the database through the cloud function.

To get it

Nodejs calls to the Github API are relatively simple, with only two points to note.

  1. Users can search for a maximum of 1,000 items of data
  2. Access to the frequency

The basic licensing mode only allows single-digit requests per minute. We need to climb to the top 1000, which is not enough. We need to go to Github to generate our own TOKEN, and the request frequency of using TOKEN suddenly reaches 5000 times per minute, which is completely enough.

Bring your own token into the header

async function githubApiGet(url,data){
	if(! data)data={};let res;
	try{
            res  = await axios.get(url,{
                    headers: {"Authorization":'Token '+token,
                            "Accept":"application/vnd.github.v3+json"
                    },
                    params:data
            })
	}catch(err){
            console.log(err);
	}
	
	return res;
}
Copy the code

Since I want to record the ranking history in China, the number of files in this warehouse may be incomparably large in the future. Because one is being made every day. So you have to design the directory structure

Separate the year and month into separate directories, so that the maximum number of files in a single directory is 31.

Github Actions

See the comments in the code block for the purpose of the configuration item, and here I only show the parameters I use.

# The name of this workflow
name: ROCSchedule

on:
  # Regular tasks, using international standard time, 0 18 this setting stands for 2 am in China
  schedule:
    - cron: '0 18 * * *'
  Whether to manually trigger workflow in the warehouse panel, yes.
  workflow_dispatch:

jobs:
  build:
    # Running Linux system
    runs-on: ubuntu-latest

    steps:
      # Fetch the code from the repository
      - uses: actions/checkout@v2
      Set up the Node environment
      - name: Setup Node.js environment
        uses: The actions/[email protected]
      # NPM install install dependencies
      - name: Install NPM dependencies
        run: 
          npm install axios
      # Put our code to work
      - name: Run
        run:
          MYTOKEN is used to solve the problem of github API request frequency, POSTURL is used to send data after the completion of the crawl interface URL
          MYTOKEN=${{secrets.MYTOKEN}} POSTURL=${{secrets.POSTURL}} node index.js
          
      - name: Add & Commit
        Since we will generate a JSON file in the warehouse, we need to push the warehouse
        uses: EndBug/[email protected]
        with:
          github_token: ${{secrets.MYTOKEN}}
Copy the code

Secrets

My MYTOKEN and POSTURL should not be disclosed, but I use them in my project, so how can I open source? GithubActions’ Secrets variable works so well that I can open source the entire project without compromising my own private data. Users of each fork can also build services using their own tokens

UniCloud Cloud function receives POST

Create a uniCloud cloud function and turn on urlization.

After urlization, the cloud function receives the parameters in the event. Body, which can be used by using JSON parsing

if(event.body){
		//github action post
		
		const data = JSON.parse(event.body);
		
		await db.collection('githubroc').add({
			record_date:data.record_date,
			total_users:data.total_users,
			rank_list:data.rank_list
		});
		
		return;
	}
Copy the code

Obtain ranking data from the cloud database

We look for matches in the database in the format of the current date 2021-09-11. If the specified date is not matched, the latest entry is taken. There has to be at least one, so I’m not going to make a judgment about none

const date = event.date;

dbRes = await db.collection('githubroc').where({
        record_date:dbCmd.eq(date)
}).get();

if(dbRes.affectedDocs<=0){
        dbRes = await db.collection('githubroc').limit(1).get();
}

return utils.responseData(0."",dbRes.data[0]);
Copy the code

Small program long list rendering

Since the storage structure of the database is not suitable for paging, the total number of entries is only 1000, so let’s put it to the front end. But if you setData with 1000 pieces of data ata time, it’s still going to be slow. So only 100 entries are displayed the first time, scroll to the bottom and concat the next 100 entries.

this.ranklist = this.ranklist.concat(this.orginlist.slice(this.rankpage*100, (this.rankpage+1) *100));
Copy the code

Small program poster generation

The most troublesome part of the whole case is the poster. It is not difficult to draw the poster completely in front of canvas. It is very trivial, so we need to draw the layout bit by bit with Canvas API.

Wx. DownloadFile also needs to configure the request whitelist in the background of the small program management, and the configuration requires the domain name to be configured with a domestic record…

The Github avatar is an address

https://avatars2.githubusercontent.com/u/499550?s=140
Copy the code

When I configured it to the whitelist, I got this hint

Are you ripping me off?!

Where am I going to file for GithubuserContent.com?

I am a person from planning, design, development to do a day and night, before the development of the URL whitelist was not detected, so the problem was not found at the earliest. In the end, the poster didn’t work? Unable to publish? I got 10,000 horses running inside me!

It must be done!

Cloud function urlization to transfer avatar

Here I think about two approaches

  1. When you get the data from Github, move all the avatars to the cloud and replace the original URL

  2. I read the avatar in my cloud function and return the image, without local storage, which is equivalent to using the cloud function to do a transfer.

I chose plan 2. Let’s do it!

Scheme 2 uses the GET request of the cloud function, and the transfer mode is as follows

Cloud function url? Url = head is making the urlCopy the code

The cloud functions handle HTTP GET request input parameters

if(event.queryStringParameters){
        //github avatar_url fix

        var qs = event.queryStringParameters;

        var imageRes = await uniCloud.httpclient.request(qs.avatar_url);

        let buff = new Buffer(imageRes.data);
        let base64data = buff.toString('base64');

        return {
                mpserverlessComposedResponse: true.// Using Ali Cloud to return the integration response requires this field to be true
                isBase64Encoded: true.statusCode: 200.headers: {
                        'content-type': 'image/jpeg'
                },
                body: base64data
        }
}
Copy the code

Afterword.

A person from planning, design, development of a dragon to complete such a small function, and climbed a lot of pits, thinking and solving some difficulties, very sense of achievement.

Open source

Github.com/ezshine/git…

Warehouse instructions video tutorial

Small program part of the technical implementation is not difficult, all technical difficulties are explained in the article, we can achieve their own, here I mainly open source Github action and ranking json file, please friends click follow bar ~

I want to be ranked, too! I want to be ranked, too! I want to be ranked, too!


The design and development time is not as long as the final arrangement of this article. It is not easy to write and share this article. Please give encouragement!

  • Hand to hand teach you to do iOS reverse analysis, break through wechat group send multiple selection limit of 12 likes
  • 🎑 in advance I wish you a happy Mid-Autumn Festival, teach you to do a [Mid-Autumn lantern wish] 💖 website 69 praise
  • Product Manager: Can you use div to draw me a dragon? 2373 great
  • Three kinds of front end to achieve VR panorama house! You might need it some day! 2643 great