preface

Recently, Rahook education is holding activities, and VIP is 19 per month (strongly recommended to buy). I found that there are 89 columns to subscribe, but it is far from enough time to read these columns in one month. Therefore, I have been struggling with them for a month and decided to write a plug-in to help Rahook education increase VIP sales.

The premise

There must be pull educationvip!!!!!!!!!

use

  1. Log on to Pull education

  2. Copy headers for successful login request interface

  3. Replace headers in crawl/crawl_list.py and crawl/crawl_content

  4. Crawl:

    • One-click subscription: Runcrawl/crawl_list.pyIn thelessions_subscription()methods
    • Full crawl: Runcrawl/crawl_content.pyspider.crawl_all()methods
    • Incremental crawl: Runcrawl/crawl_content.pyspider.cral_increase()methods
    • Convert to PDF: Runhtmltopdf.pyIn themainmethods

Project description

  1. The first run uses a full crawl, and the project logs undownloaded and unupdated columns if the columns are updated.
  2. Incremental update is the update feature of an unfinished column
  3. At present, you need to manually maintain PDF files on the Baidu cloud web disk
  4. Incremental updates require viewing the log and modifying the converted PDF folder,pdf_paths = []According to the updated ID in the log, view the

https://kaiwu.lagou.com/course/courseInfo.htm?courseId=# {id} and modify the update id to need to update the folder

  1. downloads.txtThe files are columns that need to be downloaded after a new subscription.unreleased.txtFile is not updated column, do not change these two files without authorization, otherwise it will be wrong!!

Matters needing attention

  1. The first run must be full crawl
  2. At present, relevant files need to be manually deleted: After converting PDF successfully, the updated/unupdated folders need to be deleted. Otherwise, the columns that have not been updated will appear in both the unupdated and updated folders after updatingbugSince I am still looking for a job, maybe this bug will not be fixed for a long time.

The flow chart

Project completion

  • Climb the tick course
  • Generating PDF
  • One-click access to all VIP column subscriptions
  • Download all columns in one click
  • Multi-threaded crawl column
  • Full crawl column
  • Incremental crawl column
  • Update unupdated columns and record the columns from unupdated to updated
  • Log in function

The project address

code

Project presentations

VIP purchase link