This article is participating in Python Theme Month. See the link to the event for more details
preface
As shown in the picture, the demand of friends is to remove the link about the activity in the article after attending the activity.
It’s an interesting need, and I’m sure many people have similar needs, since they want to be pure (nuggets BB don’t hit me). So the purpose of today’s article is to automate some of our boring tasks.
The purpose of writing this article is not only to fulfill the requirements, but also to hope that you have learned the following.
-
If you want to design a CMS system, hopefully you’ll be able to see how nugget’s articles are getting published.
-
If you are a novice, I hope my analytical thinking can bring you into the programmer’s simple life.
-
If you want to learn Python but don’t know what to do with it, I hope you can see the advantages of Python in crawler/automation.
-
If you are a back-end boss, I hope you can give the author more advice, your advice is my best progress.
Technology dependence
- Requests: Let HTTP serve humans
- Python regular expression operations
Thought analysis
Here’s the general flow of nuggets articles:
-
- Draft box
- New articles are stored in the draft box by default
- Update the article, directly update the draft box article content
-
- The article
- The draft box article is released, the operation audit is completed, and the article is made public
- Article editing, updating articles directly back to the draft box and so on
So for requirements, here’s what we do:
- Get article details
- Update the post to delete the activity information
- Post again
1. Article publishing
Through the page operation can also be found that nuggets do two things, update the article, publish the article
- There is nothing to say about updating an article. Update the title, content, tag, etc.
- The important thing to note when publishing an article is that if you have a bound column you need to submit the ID for the column
So we publish the article to know the articledraft_id
与 Updated article
2. Draft details
This interface will return the detailed data of the article, including the columns bound to it, etc. Too much data will be returned.
Just get the draft details here.
3. Article details
This interface can obtain the corresponding article IDdraft_id
Now the problem is to get the article ID.
4. List of articles
The interface for getting a list of nuggets is shown below. The data returned is as follows:
We can access the list of articles to get the ID of the article.
To sum up, what we need to do is:
- Get all article ids
- Obtain the draft box ID of the corresponding article from the article ID
- Get the details in the draft box and update it
- Post updated articles
Code implementation
The code is based on automatically Posting articles for nuggets
1. Encapsulation of interfaces
The interfaces to be requested are all the interfaces analyzed in the thought analysis
import requests
from requests import cookies
class Juejin(object) :
# Nugget post URL
publish_url = "https://api.juejin.cn/content_api/v1/article/publish"
# Nugget draft box article URL
article_draft_url = "https://api.juejin.cn/content_api/v1/article_draft/query_list"
# Nugget draft box article details
article_draft_detail_url = "https://api.juejin.cn/content_api/v1/article_draft/detail"
# Nugget draft box article details
article_draft_update_url = "https://api.juejin.cn/content_api/v1/article_draft/update"
# Nugget draft box article details
article_detail_url = "https://api.juejin.cn/content_api/v1/article/detail"
# Post list
article_list_url = "https://api.juejin.cn/content_api/v1/article/query_list"
# Obtain user information
user_url = "https://api.juejin.cn/user_api/v1/user/get"
def __init__(self, driver_cookies=None, cookie_obj=None) :
self.session = requests.session()
if driver_cookies:
def push_draft_last_one(self) :
article_draft = self.get_draft().get("data"[]),if not article_draft:
raise Exception("The article draft is empty")
draft_id = article_draft[0].get("id")
result = self.draft_publish(draft_id)
print(result)
if result.get("err_no"."") != 0:
err_msg = result.get("err_msg"."")
raise Exception(f"Juejin push article error, error message is {err_msg} ")
return result.get("data", {})
def request(self, *args, **kwargs) :
response = self.session.request(*args, **kwargs)
ifresponse.status_code ! =200:
raise Exception("Request error")
return response.json()
def get_user(self) :
return self.request("get", self.user_url)
def get_article_list(self, user_id, cursor="0") :
data = {
"user_id": user_id,
"sort_type": 2."cursor": cursor
}
return self.request("post", self.article_list_url, json=data)
def get_draft(self) :
return self.request('post', self.article_draft_url)
def get_draft_detail(self, draft_id) :
return self.request("post", self.article_draft_detail_url, json={"draft_id": draft_id})
def get_article_detail(self, article_id) :
return self.request("post", self.article_detail_url, json={"article_id": article_id})
def draft_update(self, article_info) :
return self.request('post', self.article_draft_update_url, json=article_info)
def draft_publish(self, draft_id, column_ids=None) :
if column_ids is None:
column_ids = []
json = {
"draft_id": draft_id,
"sync_to_org": False."column_ids": column_ids
}
result = self.request('post', self.publish_url, json=json)
return result
Copy the code
2. Execute the release task
The following is the execution logic of the script, there is no difficulty, just follow the previous thought analysis and reverse implementation, need to pay attention to:
- The time of the activity cannot be changed arbitrarily, as subsequent logic will filter articles and end scripts based on it.
- If the activity link in the article is an official link, no changes can be made. If there are links in multiple formats, configure regular expressions.
- Added a little sleep time to the script for fear that nuggets would have interface restrictions.
- For simplicity, this script does not implement login; You can copy the cookie of the browser. The following figure shows the location of the cookie.
See another of my articles about the implementation of automatic logins for nuggetsThe realization of automatic login of nugget 。
def update_and_republish() :
# Define active time
act_start_datetime = "The 2021-06-02 00:00:00"
act_end_datetime = "The 2021-06-30 23:59:59"
# Define active link re
pattern1 = re.compile(R "this is the first I participate in the challenge of more \ d * day, to see full details of: \ [challenge \] \ [HTTPS \ : / / juejin \. Cn/post / 6967194882926444557 \) \ n")
pattern2 = re.compile(R "this is the first I participate in the challenge of more \ d * day, to see full details of: \ [challenge \] \ [HTTPS \ : / / juejin \. Cn/post / 6967194882926444557 \)")
# Set session ID by yourself
session_id = ""
cookie = requests.cookies.create_cookie(
domain=".juejin.cn",
name="sessionid",
value=session_id
)
juejin = Juejin(cookie_obj=cookie)
user_id = juejin.get_user().get("data", {}).get("user_id")
start_flag = True
cursor = "0"
has_more = True
act_start_time = time.mktime(time.strptime(act_start_datetime, '%Y-%m-%d %H:%M:%S'))
act_end_time = time.mktime(time.strptime(act_end_datetime, '%Y-%m-%d %H:%M:%S'))
patterns = [pattern1, pattern2]
# Get the list of articles
def art_info() :
nonlocal cursor, has_more
response = juejin.get_article_list(user_id, cursor)
time.sleep(1)
has_more = response.get("has_more")
cursor = response.get("cursor")
return response.get("data")
Update the post and post after removing the active link
def do_update_and_republish(article_id) :
# if article_id ! = '6969119163293892639' :
# return
draft_id = juejin.get_article_detail(article_id).get("data", {}).get("article_info", {}).get("draft_id")
if not draft_id:
return False
data = juejin.get_draft_detail(draft_id).get("data", {})
article_draft = data.get("article_draft")
columns = data.get("columns")
column_ids = [column.get("column_id") for column in columns]
def mark_content_replace(mark_content) :
for pattern in patterns:
mark_content = re.sub(pattern, "", mark_content)
return mark_content
article = {
"brief_content": article_draft.get("brief_content"),
"category_id": article_draft.get("category_id"),
"cover_image": article_draft.get("cover_image"),
"edit_type": article_draft.get("edit_type"),
"html_content": article_draft.get("html_content"),
"is_english": article_draft.get("is_english"),
"is_gfw": article_draft.get("is_gfw"),
"link_url": article_draft.get("link_url"),
"mark_content": mark_content_replace(article_draft.get("mark_content")),
"tag_ids": [str(tag_id) for tag_id in article_draft.get("tag_ids")]."title": article_draft.get("title"),
"id": article_draft.get("id"),}print(article)
# juejin.draft_publish(draft_id, column_ids)
time.sleep(1)
# juejin.draft_publish(draft_id, column_ids)
time.sleep(1)
# Primary scheduler function
def do(data) :
for art in data:
ctime = int(art.get("article_info", {}).get("ctime"))
if ctime and act_end_time < ctime:
continue
elif ctime and act_start_time > ctime:
nonlocal start_flag
start_flag = False
break
a_id = art.get("article_id")
do_update_and_republish(a_id)
while start_flag and has_more:
do(art_info())
Copy the code
3. Log on to nuggets to see the results
Generally good luck, the article will second review; But bad luck may have to wait.
4. Look directly at the source code
juejin.py
If you think my project is helpful to you, please click ❤️❤️❤️.
5. Why is deployment not supported
Considering that this kind of task is a one-time task and the official activity link/time can be changed, this task is more dependent on offline partners to deploy by themselves.
Of course, if you don’t have a local environment to execute it, you can fork my code and modify the script to use the GitHub Action to execute the task.
Matters needing attention
- 1. If you want to remove user-defined activity links, please fill in the corresponding activity time and activity links by yourself.
- 2. If you make changes to the active copy, you need to adjust the corresponding regular expression.
- 3. Pay attention to the time of the activity. If there are articles not within the time of the activity, you need to adjust them manually.
- 4. Test with one article before batch execution, success is not guaranteed (nuggets may update some interfaces).
- 5. Please confirm receipt of the prize before executing the script to avoid any changes.
The last
If you find my articles helpful, you can follow my column about me and the Nuggets. This column will continue to write some interesting nuggets code 🤔.
Of course, you can also check out my GitHub, where I’ll be documenting all the code for this column at 🙏.