preface
You dig friends, happy New Year, today is the first day of 2022, the growing popularity of the Denver nuggets author voting, and list has ended, anything to do with me, of course, all have no, my New Year’s Flag, is the nuggets rank to the V4, and for the vast majority of readers, whether there is “learning” in the New Year’s Flag this item, for me, Me, too. So I had an idea. I wanted to tally up the nuggets’ articles of the year.
- Bookmark your favorite articles and study them slowly.
- Is to learn through these articles, which articles are suitable for readers, where are the advantages of these articles? How should we write articles?
Annual active authors statistics
We can count the active authors of this year through the end of the year voting page, which is a scroll page, has_more to determine if there is a next page, so we can get all author ids through Nodejs.
const axios = require("axios");
const _ = require("lodash");
const fs = require("fs");
const url = "https://api.juejin.cn/list_api/v1/annual/list";
const headers = {
"content-type": "application/json; charset=utf-8"."user-agent":
"Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36".cookies:
'xxx'};let userId = [];
const fetchUserId = (cursor = 0) = > {
console.log("Request clause" + cursor+'pages');
axios.post(
url,
{ annual_id: "2021".list_type: 0.cursor: cursor + "".keyword: "" },
{
headers,
}
)
.then((res) = > {
const data = res.data;
userId = userId.concat(_.map(data.data, "user_id"));
if (data.has_more && userId.length < 1000) {
fetchUserId(cursor + 10);
} else {
fs.writeFileSync("./0-1000.json".JSON.stringify(userId)); }}); }; fetchUserId();Copy the code
Cookies can be copied in the browser so that the top 1000 authors can be counted, in order to prevent the mining backend interface limitation. After three runs, the result shows that there are 2035 authors who have signed up this time. Of course, this data may not be accurate. Next, we can obtain the articles of each author according to all user ids.
Gets a list of articles for each author
We can get a list of each author’s articles from the poll details page. I have to make fun of the Nuggets interface, the front end only shows 3 articles, the back end gives all the data… 😅
Let’s take a look at each one:
The posts here are sorted by popularity by default, but we don’t know if they’re sorted by likes or favorites, we don’t know.
Fortunately, we can get the article from each gold digger from the reader’s page, as shown below:
Again, the user_info data is repeated N times. This interface has a few likes, comments, and favorites. What does digg_count mean? Which word is the prefix?
Build table statistics
Json is not possible to store such a large amount of data. I use PSQL and PRISma for ORM. For those who are not familiar with this, please refer to my previous translation article “Complete ORM prisma for Node.js and TypeScript”.
Establish a schema
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
generator client {
provider = "prisma-client-js"
}
model Article {
article_id String @id
title String
brief_content String
content String?
cover_image String
user_id String
ctime String
digg_count Int
view_count Int
comment_count Int
collect_count Int
author_id String
author Author @relation(fields: [author_id], references: [id])
category_id String
category Category @relation(fields: [category_id], references: [id])
tags TagsOnArticles[]
}
model Author {
id String @id
name String
avatar_large String
articles Article[]
}
model Category {
id String @id
name String
articles Article[]
}
model Tag {
id String @id
name String
articles TagsOnArticles[]
}
model TagsOnArticles {
article Article @relation(fields: [article_id], references: [article_id])
article_id String
tag Tag @relation(fields: [tag_id], references: [id])
tag_id String
@@id([article_id, tag_id])
}
Copy the code
Table relationships
- Articles and users – many to one
- Follow and categorize articles – many to one
- Articles and tags — many to many
Gets the user’s article list code
/** * Get the user's article list *@param userId
* @returns* /
const fetchList = async (userId: string) => {
console.log("Start collecting" + userId);
return new Promise((reslove) = > {
setTimeout(async() = > {await axios
.post(
"https://api.juejin.cn/content_api/v1/article/query_list?aid=2608&uuid=6899676175061648910",
{
user_id: userId,
sort_type: 1.cursor: "0",
},
{headers}
)
.then((res: any) = > {
const data = res.data.data;
if (data && data.length) {
// Insert the database
insert(data)
.catch((e) = > {
console.error(e);
process.exit(1);
})
.finally(() = > {
reslove("");
});
} else {
reslove(""); }}); },2000);
});
};
Copy the code
To prevent too many submissions, I set a 2 second delay on my side.
Insert database code
/** * Insert database *@param data* /
async function insert(data: any) {
for (const item of data) {
const article_info = _.pick(item.article_info, [
"article_id"."title"."brief_content"."cover_image"."user_id"."ctime"."digg_count"."view_count"."comment_count"."collect_count",]);const author_user_info = await prisma.author.findUnique({
where: {
id: item.author_user_info.user_id,
},
});
if(! author_user_info) {await prisma.author.create({
data: {
id: item.author_user_info.user_id,
name: item.author_user_info.user_name,
avatar_large: item.author_user_info.avatar_large,
},
});
}
const category = await prisma.category.findUnique({
where: {
id: item.category.category_id,
},
});
if(! category) {await prisma.category.create({
data: {
id: item.category.category_id,
name: item.category.category_name,
},
});
}
const article = await prisma.article.findUnique({
where: {
article_id: article_info.article_id,
},
});
const creates_tags = _.map(item.tags, (tag: any) = > {
return {
tag: {
connectOrCreate: {
create: {
id: tag.tag_id,
name: tag.tag_name,
},
where: {
id: tag.tag_id,
},
},
},
};
});
if(! article) {console.log("create---" + article_info.title);
await prisma.article.create({
data: {
...article_info,
author_id: item.article_info.user_id,
category_id: item.category.category_id,
tags: {
create: creates_tags, }, }, }); }}}Copy the code
FetchList fetchList fetchList fetchList fetchList fetchList fetchList fetchList fetchList fetchList fetchList fetchList fetchList
We can’t do it with promise.all, because promise.all will execute all promises synchronously, and the back end will reject your request to prevent overloading. We need to take every request, every 2s, and save it to the database. What method do we use? (This is a standard interview question. How do you make multiple promises work?) If you see someone here, leave a comment in the comments section.
The effect
After all the running is complete, we save all the articles of the author of the year to the database. Run the following command to view the data through Prisma Studio
npx prisma studio
Copy the code
The query creation time is greater than 2021-01-01
new Date("2021/01/01").getTime() / / 1609430400000
Copy the code
In descending order of likes, we have our list of highly liked articles.
summary
Based on these results, I also concluded a few points, namely, how to write a great article?
-
Have a broad readership
Write ES6 > Vue > React as I did in my previous article how to Test React asynchronous components. , the amount of reading can be imagined, will certainly do not need to see your article, not also do not have this demand.
-
The article must be easy to understand, must let the reader understand the knowledge point.
As the author Lin Sanxin said
It’s my motto to say the hardest things in the most common terms.
The last
Dear friends, have you understood my article? Please give me a thumbs-up. Your thumbs-up is the biggest support for me.
I hope this article was helpful to you, and you can also refer to my previous articles or share your thoughts and insights in the comments section.