An overview of

Crawlab is based on Golang’s distributed crawler management platform, supporting a variety of programming languages and a variety of crawler frameworks. This is a beta release of the next official release, V0.6.0. This beta release is not recommended for use in production because it has not been fully tested and is not stable enough. In addition, some utility features (such as Git, Scrapy, and notification) are not scheduled for release in this beta, but will be integrated into the official release as plug-ins.

Upgrade to optimize

As a major release, Crawlab V0.6 (including beta) consists of a number of significant feature upgrades, including numerous optimizations for performance, stability, robustness, and ease of use. This beta should theoretically be more robust than the older version, especially when it comes to task execution, file synchronization, and node communication. However, we do recommend that users test different crawler tasks more fully on the new version of Crawlab.

The back-end

  • File synchronization: Move file synchronization from MongoDB GridFS to SeaweedFS to improve the stability and robustness of file synchronization and crawler deployment.
  • Node communication: Migrating node communication from RPC based on Redis shell to gRPC. The work node indirectly interacts with the MongoDB database by sending gRPC requests to the master node.
  • Task queues. Migration of task queues from Redis lists to MongoDB collections to increase flexibility, such as priority queues.
  • Logs. Migrate log storage to SeaweedFS to address performance issues in MongoDB databases.
  • SDK integration. migrate the resulting data store from the native SDK to the task processor set and import it to the database.
  • Task-related logic is abstracting into task scheduler, task processor and task executor to reduce coupling degree of the system and improve scalability and maintainability.
  • Componentization. Dependency injection framework is introduced to modularize modules, services and subsystems.

The front end

  • Vue 3. Moved to the latest front-end framework, Vue 3, to support more advanced features such as composite apis and TypeScript.
  • UI framework – moved from vue-Element-Admin to Vue 3-based UI framework Element-Plus for more flexibility and functionality.
  • Advanced file editor. Support for more advanced file editor features, including drag and drop, copy, move, rename, delete, file editing, code highlighting, navigation TAB, and more.
  • Customizable tables. More advanced features are built in, including custom columns, batch operations, search, filter, sort, and more.
  • Navigation TAB. You can use multiple navigation tabs to view different pages.
  • Batch creation Objects, including crawlers, projects, and scheduled tasks, can be created in batches.
  • Detail navigation. Sidebar navigation in a detail page.
  • More optimized dashboard. More data charts in the home page dashboard.

pending

As you may already know, this is a beta release, so some of the existing utility features, such as Git and Scrapy integration, are not yet supported. However, since some of the basic features are already in the code, we are working to include them in the official version of V0.6.0. We will only add them to the stable release after they have been fully tested.

  • Plug-in framework. Advanced functions are integrated into Crawlab as plug-ins.
  • Git integration. Will exist as a plug-in.
  • Scrapy integration. Will exist as a plug-in.
  • Message notifications. Will exist as plug-ins.
  • If the task execution mode is “All nodes” or “Specify nodes”, there will be a primary task and a subtask.
  • Crontab editor. Visualize the front-end component for Crontab editing.
  • The result is to lose weight.
  • Environment variables.
  • International. Chinese is supported.
  • Front-end usability optimization. More advanced features, such as table saving.
  • Logs are automatically cleared.
  • The document.

Future plans

This beta release is intended as a preview of the core functional testing of Crawlab V0.6. We sincerely hope that you will download, install and run more test crawler tasks. After the major issues found in the beta are resolved, and after the plug-in framework and other important features are completed and tested, we will release the official release. Therefore, there will probably be a second, more complete beta before then.

reference

  • Website: www.crawlab.cn
  • Making: github.com/crawlab-tea…
  • Demo: crawlab.cn/demo

community

If you find Crawlab useful for your daily development or for your company, feel free to star on Github, and if you encounter any problems, feel free to mention an issue on Github. In addition, you are welcome to make development contributions to Crawlab. At the same time, you can also add wechat TikazyQ1 to the Crawlab technology exchange group to communicate and discuss with other developers on technology development and deployment.