Nuggets article activity links bulk removed | Python theme month

This article is participating in Python Theme Month. See the link to the event for more details

preface

As shown in the picture, the demand of friends is to remove the link about the activity in the article after attending the activity.

It’s an interesting need, and I’m sure many people have similar needs, since they want to be pure (nuggets BB don’t hit me). So the purpose of today’s article is to automate some of our boring tasks.

The purpose of writing this article is not only to fulfill the requirements, but also to hope that you have learned the following.

If you want to design a CMS system, hopefully you’ll be able to see how nugget’s articles are getting published.
If you are a novice, I hope my analytical thinking can bring you into the programmer’s simple life.
If you want to learn Python but don’t know what to do with it, I hope you can see the advantages of Python in crawler/automation.
If you are a back-end boss, I hope you can give the author more advice, your advice is my best progress.

Technology dependence

Requests: Let HTTP serve humans
Python regular expression operations

Thought analysis

Here’s the general flow of nuggets articles:

1. Draft box
- New articles are stored in the draft box by default
- Update the article, directly update the draft box article content
1. The article
- The draft box article is released, the operation audit is completed, and the article is made public
- Article editing, updating articles directly back to the draft box and so on

So for requirements, here’s what we do:

Get article details
Update the post to delete the activity information
Post again

1. Article publishing

Through the page operation can also be found that nuggets do two things, update the article, publish the article

There is nothing to say about updating an article. Update the title, content, tag, etc.

The important thing to note when publishing an article is that if you have a bound column you need to submit the ID for the column

So we publish the article to know the articledraft_id 与 Updated article

2. Draft details

This interface will return the detailed data of the article, including the columns bound to it, etc. Too much data will be returned.

Just get the draft details here.

3. Article details

This interface can obtain the corresponding article IDdraft_idNow the problem is to get the article ID.

4. List of articles

The interface for getting a list of nuggets is shown below. The data returned is as follows:

We can access the list of articles to get the ID of the article.

To sum up, what we need to do is:

Get all article ids
Obtain the draft box ID of the corresponding article from the article ID
Get the details in the draft box and update it
Post updated articles

Code implementation

The code is based on automatically Posting articles for nuggets

1. Encapsulation of interfaces

The interfaces to be requested are all the interfaces analyzed in the thought analysis

import requests
from requests import cookies

class Juejin(object) :

    # Nugget post URL
    publish_url = "https://api.juejin.cn/content_api/v1/article/publish"

    # Nugget draft box article URL
    article_draft_url = "https://api.juejin.cn/content_api/v1/article_draft/query_list"

    # Nugget draft box article details
    article_draft_detail_url = "https://api.juejin.cn/content_api/v1/article_draft/detail"

    # Nugget draft box article details
    article_draft_update_url = "https://api.juejin.cn/content_api/v1/article_draft/update"

    # Nugget draft box article details
    article_detail_url = "https://api.juejin.cn/content_api/v1/article/detail"

    # Post list
    article_list_url = "https://api.juejin.cn/content_api/v1/article/query_list"

    # Obtain user information
    user_url = "https://api.juejin.cn/user_api/v1/user/get"


    def __init__(self, driver_cookies=None, cookie_obj=None) :
        self.session = requests.session()
        if driver_cookies:
       

    def push_draft_last_one(self) :
        article_draft = self.get_draft().get("data"[]),if not article_draft:
            raise Exception("The article draft is empty")
        draft_id = article_draft[0].get("id")

        result = self.draft_publish(draft_id)
        print(result)
        if result.get("err_no"."") != 0:
            err_msg = result.get("err_msg"."")
            raise Exception(f"Juejin push article error, error message is {err_msg} ")
        return result.get("data", {})

    def request(self, *args, **kwargs) :

        response = self.session.request(*args, **kwargs)
        ifresponse.status_code ! =200:
            raise Exception("Request error")
        return response.json()

    def get_user(self) :
        return self.request("get", self.user_url)

    def get_article_list(self, user_id, cursor="0") :
        data = {
            "user_id": user_id,
            "sort_type": 2."cursor": cursor
        }
        return self.request("post", self.article_list_url, json=data)

    def get_draft(self) :
        return self.request('post', self.article_draft_url)

    def get_draft_detail(self, draft_id) :
        return self.request("post", self.article_draft_detail_url, json={"draft_id": draft_id})

    def get_article_detail(self, article_id) :
        return self.request("post", self.article_detail_url, json={"article_id": article_id})

    def draft_update(self, article_info) :
        return self.request('post', self.article_draft_update_url, json=article_info)

    def draft_publish(self, draft_id, column_ids=None) :

        if column_ids is None:
            column_ids = []

        json = {
            "draft_id": draft_id,
            "sync_to_org": False."column_ids": column_ids
        }
        result = self.request('post', self.publish_url, json=json)
        return result
Copy the code

2. Execute the release task

The following is the execution logic of the script, there is no difficulty, just follow the previous thought analysis and reverse implementation, need to pay attention to:

The time of the activity cannot be changed arbitrarily, as subsequent logic will filter articles and end scripts based on it.
If the activity link in the article is an official link, no changes can be made. If there are links in multiple formats, configure regular expressions.
Added a little sleep time to the script for fear that nuggets would have interface restrictions.
For simplicity, this script does not implement login; You can copy the cookie of the browser. The following figure shows the location of the cookie.

See another of my articles about the implementation of automatic logins for nuggetsThe realization of automatic login of nugget 。

def update_and_republish() :

    # Define active time
    act_start_datetime = "The 2021-06-02 00:00:00"
    act_end_datetime = "The 2021-06-30 23:59:59"
    
    # Define active link re
    pattern1 = re.compile(R "this is the first I participate in the challenge of more \ d * day, to see full details of: \ [challenge \] \ [HTTPS \ : / / juejin \. Cn/post / 6967194882926444557 \) \ n")
    pattern2 = re.compile(R "this is the first I participate in the challenge of more \ d * day, to see full details of: \ [challenge \] \ [HTTPS \ : / / juejin \. Cn/post / 6967194882926444557 \)")

    # Set session ID by yourself
    session_id = ""
    
    cookie = requests.cookies.create_cookie(
        domain=".juejin.cn",
        name="sessionid",
        value=session_id
    )
    juejin = Juejin(cookie_obj=cookie)

    user_id = juejin.get_user().get("data", {}).get("user_id")
    start_flag = True
    cursor = "0"
    has_more = True

    act_start_time = time.mktime(time.strptime(act_start_datetime, '%Y-%m-%d %H:%M:%S'))
    act_end_time = time.mktime(time.strptime(act_end_datetime, '%Y-%m-%d %H:%M:%S'))

    patterns = [pattern1, pattern2]

    # Get the list of articles
    def art_info() :
        nonlocal cursor, has_more
        response = juejin.get_article_list(user_id, cursor)
        time.sleep(1)
        has_more = response.get("has_more")
        cursor = response.get("cursor")
        return response.get("data")
    
    Update the post and post after removing the active link
    def do_update_and_republish(article_id) :
        # if article_id ! = '6969119163293892639' :
        # return
        draft_id = juejin.get_article_detail(article_id).get("data", {}).get("article_info", {}).get("draft_id")
        if not draft_id:
            return False
        data = juejin.get_draft_detail(draft_id).get("data", {})
        article_draft = data.get("article_draft")
        columns = data.get("columns")
        column_ids = [column.get("column_id") for column in columns]

        def mark_content_replace(mark_content) :
            for pattern in patterns:
                mark_content = re.sub(pattern, "", mark_content)
            return mark_content

        article = {
            "brief_content": article_draft.get("brief_content"),
            "category_id": article_draft.get("category_id"),
            "cover_image": article_draft.get("cover_image"),
            "edit_type": article_draft.get("edit_type"),
            "html_content": article_draft.get("html_content"),
            "is_english": article_draft.get("is_english"),
            "is_gfw": article_draft.get("is_gfw"),
            "link_url": article_draft.get("link_url"),
            "mark_content": mark_content_replace(article_draft.get("mark_content")),
            "tag_ids": [str(tag_id) for tag_id in article_draft.get("tag_ids")]."title": article_draft.get("title"),
            "id": article_draft.get("id"),}print(article)
        # juejin.draft_publish(draft_id, column_ids)
        time.sleep(1)
        # juejin.draft_publish(draft_id, column_ids)
        time.sleep(1)

    # Primary scheduler function
    def do(data) :
        for art in data:
            ctime = int(art.get("article_info", {}).get("ctime"))
            if ctime and act_end_time < ctime:
                continue
            elif ctime and act_start_time > ctime:
                nonlocal start_flag
                start_flag = False
                break

            a_id = art.get("article_id")
            do_update_and_republish(a_id)

    while start_flag and has_more:
        do(art_info())
Copy the code

3. Log on to nuggets to see the results

Generally good luck, the article will second review; But bad luck may have to wait.

4. Look directly at the source code

juejin.py

If you think my project is helpful to you, please click .

5. Why is deployment not supported

Considering that this kind of task is a one-time task and the official activity link/time can be changed, this task is more dependent on offline partners to deploy by themselves.

Of course, if you don’t have a local environment to execute it, you can fork my code and modify the script to use the GitHub Action to execute the task.

Matters needing attention

1. If you want to remove user-defined activity links, please fill in the corresponding activity time and activity links by yourself.
2. If you make changes to the active copy, you need to adjust the corresponding regular expression.
3. Pay attention to the time of the activity. If there are articles not within the time of the activity, you need to adjust them manually.
4. Test with one article before batch execution, success is not guaranteed (nuggets may update some interfaces).
5. Please confirm receipt of the prize before executing the script to avoid any changes.

The last

If you find my articles helpful, you can follow my column about me and the Nuggets. This column will continue to write some interesting nuggets code .

Of course, you can also check out my GitHub, where I’ll be documenting all the code for this column at .