Recently, I need a large amount of data to train the model, and many websites have to crawl mechanism, crawl and stop for a while, so it is very slow. If I use a computer for 24 hours to crawl, it takes too much electricity, so I thought of using mobile phones

Then it occurred to me that the Flutter, old Android and Apple phones could be used for data collection.

Ok, and then I started to do it, and I spent the whole day making a general framework. Finally can easily add crawler task, integrated configuration, log, data persistence these basic functions, and then encapsulate the network request, web page parsing function, with the line.

Dart is a single-threaded language that provides isolate, but does not share memory. Use flutter to bind hands and legs, and then disable reflection! The Google team is awesome. I got so mad at this spicy chicken thing after an afternoon of work that I quit.

If you want to implement multithreaded crawlers, it seems that you need to write some native code, so I might as well use all native code.

Ps: put a good Android native don’t use, with what spicy chicken flutter do crawler? That’s ridiculous. I’m so mad.

App screenshots

System architecture

I drew a couple of pictures

SpiderTask base class, from which all crawlers derive. SpiderTask itself maintains a TaskConfig task configuration object and a log object. The diagram below:

For data persistence, I use another class that uses singleton mode and initializes at app startup.

The working process

  1. Register the crawler task in the task list in home
  2. Manage each crawler task on the main page after starting the APP
  3. Select a task to start
  4. Entering the details page will automatically bind with the logging object of the crawler task and you can see the log output

Simple example

Write a simple example, after running will crawl CN Radio website news.

import 'package:flutter/src/widgets/framework.dart';
import 'package:flutter_spider_fx/framework/index.dart';

class CNRadioNewsSpider extends SpiderTask {
  var url = 'http://news.cnr.cn/';

  CNRadioNewsSpider(BuildContext context) : super(context, 'CNRadioNewsSpider');

  @override
  start() async {
    super.start();
    var dom = await CatHttp.getDocument(url, encoding: 'gb2312');
    var links = dom.querySelectorAll('.contentPanel .lh30 a');
    links.forEach((link) {
      logging.debug(link.attributes['href']); logging.info(link.text); }); }}Copy the code

The framework code

See the project code on GitHub for more details

Project address: github.com/Deali-Axy/f…

One final note: Don’t be superstitious about Google’s technology for the Flutter, it’s still a bit of a dud. Write a simple GUI for A CRUD, but forget about advanced gameplay.

Welcome to communicate

Please leave a message in the background of wechat public account. I will reply to every message

  • Wechat official number: star painting master
  • Play code studio: live.bilibili.com/11883038
  • Zhihu: www.zhihu.com/people/deal…
  • Column: zhuanlan.zhihu.com/deali
  • Jane: www.jianshu.com/u/965b95853…