The thing is like this: yesterday party A let our leaders help to climb some data, saying that the recent fund loss of severe, to get some data analysis under the dealer cut leek move. The leader said it was easy and handed this job to our department manager. The department manager saw my hair and arranged this job for me. To be honest, I really don’t want to take this job, after all, what resources are not provided, and no working hours are arranged, which seriously affects my fishing. With the manager at ordinary times to bring a few fat boy happy water, I found a self-thought very hard reason, want to hint at the manager.

The manager said it was all right. There was someone above party A.

Looks like there’s no getting out.

solution

To solve this kind of temporary demand for party A, usually the first idea is to have an alternative. Don’t start with Scrapy crawlers, IP proxy pools, target crawlers, etc. So many open source wheels, find a few to reassemble, quickly solve the problem, continue to touch my fish, watch my play, it doesn’t smell good? A worker should have the consciousness of a worker.

I will do three things:

  • Find a tool that automatically opens the specified URL and copies data from the corresponding page to the specified storage medium
  • Be able to structure and merge the data copied each time into a form that party A can understand (it is estimated that he only knows CSV)
  • You can’t spend money. You can’t spend money in this life

After some searching, two open source tools were targeted

  • Webscrapers. IO/You can record the crawl steps without writing code and with some web knowledge
  • www.nocodb.com/ do not need to write code, understand some web knowledge can be built to use

Specific steps

Transform webscraper

After the webscraper crawl, I can export a CSV or JSON file, which I have to store on my own computer each time, and I have to send it to my manager, who sends it to my dad. I came up with the idea of adding a webscraper function, where I can directly store the data in a free, cloud-based library, where I can look at it myself when I need it. Oh, my dad didn’t understand the English plug-in, so he customized the webscraper. Download the plugin at github.com/wh0197m/dac…

  • Open Chrome Plugin developer mode, unzip the plugin, and upload the unzip package

Open chrome Developer Tools. If you see the dac-worker panel at the end, the installation is successful

Configuring the Cloud Database

I found a free cloud database MemfireDB.com that I have been using before on the Internet (I don’t know what the company relies on to make money, anyway, it has been used for more than half a year, it is quite stable, mainly convenient, don’t expect me to install the database locally, lost is my pot, with third-party services, lost can still pull a fight, maybe not good can make money)

  • Get a free cloud database

  • The plug-in configuration options page is displayed, and select the database as the storage medium

Start the API

Chrome Extension does not currently use the TCPSocket protocol directly, so there is no way that Extension can access an online database using the Postgres protocol. The easiest way is to start a local Web service and forward the database. The code is simple, the warehouse address: github.com/wh0197m/dac… NPM install && NPM start

Starting a Desktop Application

It is not necessary to create a desktop application, but the father of party A does not know how to deploy the service, so he can package the service into a desktop application, using the V8 engine of the browser to run nodeJS service is also an idea, in this case, just need to provide a working desktop application, directly start the service for him. He can use the desktop application to directly access the database where he has previously inserted data, as long as he finally gets what he wants. Desktop download address: github.com/wh0197m/dac…

  • Fill in the free database you just got from MemfireDB

  • After entering the correct connection information, you can see the data that has been crawled into the database

conclusion

To be honest, it used to be programming for Google and StackOverflow, now it’s open source programming. Times have changed. Tabnine, an ARTIFICIAL intelligence code plug-in, has taken over much of my coding, and third-party services are getting cheaper and even free. There is no need to build and maintain the database by ourselves. If there is a problem, the supplier will compensate. The crawler does not need to write by himself. Instead, it records an operation stream and the machine handles it. I’m starting to wonder if I’ll have to change careers before I’m 35. It’s not easy being a part-time worker.

~ old iron people, that cloud database also more than a few invitation codes, need to call me in the comment area. ~