Scrapyd Windows management client

ScrapydManage

GitHub address: github.com/kanadebliss… Yards cloud: gitee.com/kanadebliss…

Scrapyd is a Windows management client, the software is just a scrapyd API integration into the EXE file, software is written by aardio, GitHub has source code, can be compiled, also can download GitHub release has compiled the EXE file.

Host Management Page

Right-click menu:

Add the host

Adding a host to a scrapyd API is like adding a scrapyd API address, such as 127.0.0.1:6800. Don’t understand how scrapyd use can refer to the official document: scrapyd. Readthedocs. IO/en/stable/I… Install scrapyd and type scrapyd on the command line, or create scrapyd.conf in the current directory, modify some configuration parameters and run it in scrapyd. [Reference Configuration] :

[scrapyd] eggs_dir = D:/scrapyd/eggs logs_dir = D:/scrapyd/logs items_dir = D:/scrapyd/items jobs_to_keep = 5 dbs_dir = D:/scrapyd/ DBS max_proc = 0 max_proc_per_CPU = 4 finished_to_keep = 100 poll_interval = 5.0 bind_address = 0.0.0.0 http_port = 6800 debug = off runner = scrapyd.runner application = scrapyd.app.application launcher = scrapyd.launcher.Launcher webroot = scrapyd.website.Root node_name = localhost [services] schedule.json = scrapyd.webservice.Schedule cancel.json = scrapyd.webservice.Cancel addversion.json = scrapyd.webservice.AddVersion listprojects.json = scrapyd.webservice.ListProjects listversions.json = scrapyd.webservice.ListVersions listspiders.json  = scrapyd.webservice.ListSpiders delproject.json = scrapyd.webservice.DeleteProject delversion.json = scrapyd.webservice.DeleteVersion listjobs.json = scrapyd.webservice.ListJobs daemonstatus.json = scrapyd.webservice.DaemonStatusCopy the code

Modify the three directories eggs_dir, logs_dir, and dbs_dir. Modify the other directories as required.

Refresh list status

Just make a request to all hosts to update the status and node name columns, which should make sense from the first figure

Synchronize all projects to all hosts

As the name implies, the default version number is the current timestamp

Viewing the Task Queue

Listjobs Returns information about pending, RUNNING, and FINISHED crawls on the scrapyd server

Delete the host

As the name implies

Project Management Interface

Project management is basically reading the projects folder in the same directory as the current exe file. Right-click has three functions: refresh project list, synchronize all projects to, and synchronize to (need to right-click an item)

Creating a Task Page

Right-click has two functions: create a task and cancel a taskNote that the software steps are as follows: Select host -> Software request server to return all items under this host -> Select Item -> Software request server to return all crawlers under this item

The run time can be in the string format shown in the figure, indicating that the run time is specified. It can also be a number (in seconds) that runs after a specified number of seconds. The time interval indicates whether to run the crawler repeatedly. The time interval only supports numbers (in seconds), such as 86400 if the crawler runs once a day.

Because you need to wait for the server to return the data, that is, to use multithreading also need to wait for the return value, so there will be a lag after selecting the host or project, the lag time depends on the return delay, local words very fast.

If you need any other functions, you can develop them by yourself, the source code is already available, secondary development should not be difficult, the Aardio syntax is similar to other languages, I also understand that it will be very easy to use soon.

ScrapydManage

Host Management Page

Add the host

Refresh list status

Synchronize all projects to all hosts

Viewing the Task Queue

Delete the host

Project Management Interface

Creating a Task Page

Related Posts

Little Tree’s 2021 year-end summary

Nginx Network Epoll Multiprocess series: Application layer protocol implementation series (3) — DESIGN and implementation of FTP server

This “young hacker” uses black technology to protect the elderly who live alone