Nuggets automatically publishes an optimized collection of articles | Python theme month

This article is participating in Python Theme Month. See the link to the event for more details

background

The following two articles go from logging in to Posting. The reason there are two articles is because the later functions were not complete when the article was written. Today’s article is an optimization and complement to the overall project.

The realization of automatic login of nugget
The realization of automatic publishing articles by Digg

The text start

At present, this project has realized automatic deployment, email reminder, switch configuration and other functions.

The project home page provides detailed usage documentation, and if you have a need to welcome the automatic publication of the Fork Nuggets article.

Slider drag optimization

Effect of contrast

As mentioned in the previous article, the drag and drop of the slider is slow (and even times out), let’s look at the comparison before and after optimization:

Before optimization

The optimized

Optimization analysis

This is the curve of the positive distribution that we’re using and it’s a perfect picture of acceleration and deceleration. Here’s a comparison of the trajectories of the two approaches:

The total distance is 100 px
The abscissa is displacement, and the ordinate is displacement distance

In general, the former method is slower and the error is larger. The latter corresponds to the process of accelerating and then decelerating. The latter should be more of a drag-and-drop logic. (It’s not good or bad here, just the results. If the latter is detected one day, you can still use the former)

Code implementation

See track.py for the full code

Here is the code for the optimized generated trajectory:

def gen_normal_track(distance) :
    def norm_fun(x, mu, sigma) :
        pdf = np.exp(-((x - mu) ** 2)/(2 * sigma ** 2)) / (sigma * np.sqrt(2 * np.pi))
        return pdf

    result = []
    The number of moves can be flexibly configured according to distance
    for i in range(-10.10.1):
        result.append(norm_fun(i, 0.1) * distance)
    # To reduce errors
    result.append(sum(result) - distance)
    return result
Copy the code

Optimize page waiting

There is a lot of time.sleep() used in projects, and there is no more elegant way to do it.

Because of lazy. Furthermore, the project itself is relatively simple, the release is a one-time task, there is no strict limit on the execution time of the program, so the lazy directly ignore these details. The code below will wait 10 seconds and return immediately if an element is found within 10 seconds, otherwise it will raise a TimeoutException, and by default WebDriverWait calls the ExpectedCondition every 500 milliseconds until it returns a success.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://juejin.cn/")
try:
    element = WebDriverWait(driver,10).until(
        EC.presence_of_element_located((By.XPATH, '''//div[@class="sc-kkGfuU bujTgx"]''')))finally:
    driver.quit()
Copy the code

Of course, there is an easier way to directly use the global parameter configuration, the default waiting time is set to 10 seconds.

driver.implicitly_wait(10) # seconds
Copy the code

Error retry mechanism

Due to the failure probability of the identification and dragging of the slider, the login failure code is shown in Juejin.py

Added successful login judgment
Added the retry mechanism for login failures

The core code is as follows:


# Determine whether the login succeeded
JUEJIN_NICKNAME = "Fried rice with tomatoes and Eggs."
juejin_avatar_alt = JUEJIN_NICKNAME + "Head of"
driver.find_element(By.XPATH, f'//img[@alt="{juejin_avatar_alt}"] ')


# retry
for retry in range(self.retry):
    #...
    get_cookies()
    #...
    try:
        avatar = self.driver.find_element(By.XPATH, //img[@alt=" @img "] "'")
        if avatar:
            break
    except NoSuchElementException:
        pass
Copy the code

New email function

See mail.py for the full code

Add the email notification function to complete the article release. The email code is as follows:

Please set the mailbox configuration by yourself
def send(content: str, subject: str, mail_from: str, mail_to: list) :
    msg_root = MIMEMultipart('related')
    msg_text = MIMEText(content, 'html'.'utf-8')
    msg_root.attach(msg_text)
    msg_root['Subject'] = subject
    msg_root['From'] = mail_from
    msg_root['To'] = ";".join(mail_to)

    try:
        stp = smtplib.SMTP_SSL(MAIL_HOST, MAIL_PORT)
        # stp.set_debuglevel(1)
        stp.ehlo()
        stp.login(MAIL_USER, MAIL_PASSWORD)
        stp.sendmail(MAIL_ADDRESS, mail_to, msg_root.as_string())
        stp.quit()
    except Exception as e:
        print(traceback.format_exc(e))
Copy the code

Note that the POP3/SMTP/IMAP service must be enabled for the email. In this document, the POP3/SMTP service is enabled. You can set the email account and password based on the email configuration.

variable	describe	The sample
MAIL_USER	User name of the sender email	xxx.qq.com
MAIL_ADDRESS	Sender email Address	xxx.qq.com
MAIL_HOST	Sender Email Server	smt.qq.com
MAIL_PASSWORD	Sender email Password	xxxxxx
MAIL_PORT	Mailbox server Port	465
MAIL_TO	Receiving your email	xxx.qq.com

Note: We have tested the configuration of netease mailbox and QQ mailbox, and can send emails normally.

The email result is as follows:

Write in the last

How to deal with the low success rate of slider detection?

There are many algorithms for slider detection, such as edge detection based on CV2; And target recognition technology based on YOLO. There is no expansion here. The success rate of target recognition based on machine learning can reach 99%. Interested partners can search by themselves.

About the handling of unknown errors in the project.

You can directly view the results of the GitHub Action, which prints all exceptions.

If you think my project is helpful to you, please click ❤️❤️❤️.

See all the code for this projectMy GitHub repositoryWelcome to Star Fork.