This article is participating in Python Theme Month. See the link to the event for more details
background
The following two articles go from logging in to Posting. The reason there are two articles is because the later functions were not complete when the article was written. Today’s article is an optimization and complement to the overall project.
- The realization of automatic login of nugget
- The realization of automatic publishing articles by Digg
The text start
At present, this project has realized automatic deployment, email reminder, switch configuration and other functions.
The project home page provides detailed usage documentation, and if you have a need to welcome the automatic publication of the Fork Nuggets article.
Slider drag optimization
Effect of contrast
As mentioned in the previous article, the drag and drop of the slider is slow (and even times out), let’s look at the comparison before and after optimization:
- Before optimization
- The optimized
Optimization analysis
This is the curve of the positive distribution that we’re using and it’s a perfect picture of acceleration and deceleration. Here’s a comparison of the trajectories of the two approaches:
- The total distance is 100 px
- The abscissa is displacement, and the ordinate is displacement distance
In general, the former method is slower and the error is larger. The latter corresponds to the process of accelerating and then decelerating. The latter should be more of a drag-and-drop logic. (It’s not good or bad here, just the results. If the latter is detected one day, you can still use the former)
Code implementation
See track.py for the full code
Here is the code for the optimized generated trajectory:
def gen_normal_track(distance) :
def norm_fun(x, mu, sigma) :
pdf = np.exp(-((x - mu) ** 2)/(2 * sigma ** 2)) / (sigma * np.sqrt(2 * np.pi))
return pdf
result = []
The number of moves can be flexibly configured according to distance
for i in range(-10.10.1):
result.append(norm_fun(i, 0.1) * distance)
# To reduce errors
result.append(sum(result) - distance)
return result
Copy the code
Optimize page waiting
There is a lot of time.sleep() used in projects, and there is no more elegant way to do it.
Because of lazy. Furthermore, the project itself is relatively simple, the release is a one-time task, there is no strict limit on the execution time of the program, so the lazy directly ignore these details. The code below will wait 10 seconds and return immediately if an element is found within 10 seconds, otherwise it will raise a TimeoutException, and by default WebDriverWait calls the ExpectedCondition every 500 milliseconds until it returns a success.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://juejin.cn/")
try:
element = WebDriverWait(driver,10).until(
EC.presence_of_element_located((By.XPATH, '''//div[@class="sc-kkGfuU bujTgx"]''')))finally:
driver.quit()
Copy the code
Of course, there is an easier way to directly use the global parameter configuration, the default waiting time is set to 10 seconds.
driver.implicitly_wait(10) # seconds
Copy the code
Error retry mechanism
Due to the failure probability of the identification and dragging of the slider, the login failure code is shown in Juejin.py
- Added successful login judgment
- Added the retry mechanism for login failures
The core code is as follows:
# Determine whether the login succeeded
JUEJIN_NICKNAME = "Fried rice with tomatoes and Eggs."
juejin_avatar_alt = JUEJIN_NICKNAME + "Head of"
driver.find_element(By.XPATH, f'//img[@alt="{juejin_avatar_alt}"] ')
# retry
for retry in range(self.retry):
#...
get_cookies()
#...
try:
avatar = self.driver.find_element(By.XPATH, //img[@alt=" @img "] "'")
if avatar:
break
except NoSuchElementException:
pass
Copy the code
New email function
See mail.py for the full code
Add the email notification function to complete the article release. The email code is as follows:
Please set the mailbox configuration by yourself
def send(content: str, subject: str, mail_from: str, mail_to: list) :
msg_root = MIMEMultipart('related')
msg_text = MIMEText(content, 'html'.'utf-8')
msg_root.attach(msg_text)
msg_root['Subject'] = subject
msg_root['From'] = mail_from
msg_root['To'] = ";".join(mail_to)
try:
stp = smtplib.SMTP_SSL(MAIL_HOST, MAIL_PORT)
# stp.set_debuglevel(1)
stp.ehlo()
stp.login(MAIL_USER, MAIL_PASSWORD)
stp.sendmail(MAIL_ADDRESS, mail_to, msg_root.as_string())
stp.quit()
except Exception as e:
print(traceback.format_exc(e))
Copy the code
Note that the POP3/SMTP/IMAP service must be enabled for the email. In this document, the POP3/SMTP service is enabled. You can set the email account and password based on the email configuration.
variable | describe | The sample |
---|---|---|
MAIL_USER | User name of the sender email | xxx.qq.com |
MAIL_ADDRESS | Sender email Address | xxx.qq.com |
MAIL_HOST | Sender Email Server | smt.qq.com |
MAIL_PASSWORD | Sender email Password | xxxxxx |
MAIL_PORT | Mailbox server Port | 465 |
MAIL_TO | Receiving your email | xxx.qq.com |
Note: We have tested the configuration of netease mailbox and QQ mailbox, and can send emails normally.
The email result is as follows:
Write in the last
- How to deal with the low success rate of slider detection?
There are many algorithms for slider detection, such as edge detection based on CV2; And target recognition technology based on YOLO. There is no expansion here. The success rate of target recognition based on machine learning can reach 99%. Interested partners can search by themselves.
- About the handling of unknown errors in the project.
You can directly view the results of the GitHub Action, which prints all exceptions.
If you think my project is helpful to you, please click ❤️❤️❤️.
See all the code for this projectMy GitHub repositoryWelcome to Star Fork.