Source: Vivo Fast Application

Author: @Author Dadong


In view of the impact of the epidemic, I have been staying at home for too long, so I combined nodeJS Puppeteer Cheerio and other technical tools to create a serverless epidemic hot search app. This article takes SCF as an example to introduce the process of developing a project with fast application as the carrier at the front end and Serverless as the support at the back end.


The origin of

This year, the epidemic has become more and more influential and has become a worldwide problem. The development of the epidemic has always touched everyone’s heart. It is precisely because of the epidemic that THIS year, as a dog working overtime, I suddenly relive the feeling of “winter vacation”. I wanted to do something when I stayed at home for too long, so I came up with the idea of an epidemic hot search app. After two days of conception and development, we stepped on many pits, and an epidemic hot search app was born.


conceived

Let’s start with the technology

Back-end: Nodejs Puppeteer Cheerio

Front end: Quick application (of course, small programs are no problem)


Let’s talk about some of the reasons for adopting these techniques

  • Nodejs: As for me, as a front-end, it makes perfect sense to write on the server side.


  • Puppeteer: Why this library? First of all, of course, is to crawl data, so some children are going to say, crawl data and other libraries? Why do you have to use him? Yes, I started with Crawler, but the library doesn’t allow me to crawl single-page applications, which is the first pit to be stepped into, more on that later. Puppeteer is Google’s official Node library that controls Headless Chrome through the DevTools protocol. It can do everything the browser can, so it’s no problem to crawl single-page applications.


  • Cheerio: a lightweight jQuery core implementation designed for the server to filter selected page data.


  • Fast application: as a special case in the family of small programs, the only frame rendered with native components, natural performance than other small programs do not know where to go, as long as it is in the domestic manufacturer’s Android machine, basically can run.


Then there’s Serverless

The birth of Serverless technology allows developers to focus more on business, without considering the operation and scalability of system performance. In the past, when we want to develop and deploy an application, we generally need to prepare a server, configure the corresponding project environment and deploy the corresponding project. In this process, there are many steps that need to be paid attention to. If a problem occurs in one place, the entire application will be unavailable. With the Serverless architecture, we just upload the core code to the service provider, and then we don’t have to worry about anything. The application is pay-as-you-run and can scale automatically without worrying about a sudden increase in traffic that will make the service unavailable. Suitable for developers unfamiliar with operations to deploy their own projects.


Finally, the architecture and implementation methods of the whole project are discussed

  • Capture and analyze the epidemic hot search data of Baidu through NodeJS and Puppeteer
  • Deploy the project to the platform of function computing service provider (HERE I use SCF of Tencent Cloud, with the same free quota as Function computing of Ali)
  • Expose services by configuring the API gateway
  • Develop a quick application that invokes services to display data


practice

Having said the technical architecture and ideas, let’s begin the process of introducing development practices:


Preparing the development environment

Here to Tencent cloud SCF service as an example, other cloud platforms are basically the same.


1. Install the Tencent Cloud SCF CLI tool

PIP install SCF # This tool is written in Python, so the development machine must have a Python environment and be at least python2.7


2. Install the VScode plug-in of Tencent Cloud Serverless

If you are too lazy to configure the python environment and prefer visual operations, you can install vscode plug-ins instead

Install Tencent Serverless Toolkit for VS Code



Either of the above two tools is installed.


Initialize the project

Initialize a project using the SCF command line mode, leaving vsCode plug-in mode alone, and visual operation as prompted.

SCF init -r nodejs8.9 –name virus-search # Initialize a project named virus-search with nodejs8.9


After initializing the project, the project structure looks like this:

└─ virus-Search ├─ ├─ index.js // import file ├─ template.yaml // project config file


As of this writing, both the SCF command line and the vscode plug-in support the nodejs version of the creation project up to 8.9. Tencent Cloud actually supports Node10.15, but the development tools are not open yet.


Installation project dependencies

Next, install the project dependencies to use

npm install puppeteer cheerio –save


Pupeteer will install Chromium, the package has 130+MB, it is suggested to change NPM to CNPM or replace taobao source, which will be much faster.


With the dependencies installed, the project structure looks like this

└ ─ ─ virus get – search

├ ─ ─ the README, md

├ ─ ─ node_modules /

├ ─ ─ package. Json

├─ index.js // Import file

├ ── template.yaml // function config file



Writing crawler logic

The data source here is the popular epidemic search on Baidu. The page looks like this:




The data I want is the top search list of this page. Open it in Chrome and use DevTools to view the structure of the page:



A simple analysis of page elements and network requests shows that this is a one-page application written by React. Why did I use puppeteer? At first, I used Crawler. When I climbed down, I found that the page was a pile of JS and I could not parse the elements and data inside, so I changed to Puppeteer. The code for puppeteer page crawlers looks like this:

const puppeteer = require(‘puppeteer’);

 async function getPage() { 

        const browser = await puppeteer.launch({args: [‘–no-sandbox’]}); 

        const page = await browser.newPage(); 

        await page.goto(‘https://voice.baidu.com/act/virussearch/virussearch?               from=osari_map&tab=0&infomore=1’); const content = await page.content(); 

     console.log(‘page content’, content); 

     await browser.close(); 

};


After the method is executed, you can see that the content output is the consistent content we saw in the Element in DevTools. Next we need to parse the data from the filtered pages. Here I’m using Cheerio, and this library is Fast, flexible, and lean implementation of Core jQuery designed Specifically for the server. The code for puppeteer is as follows:

const puppeteer = require(‘puppeteer’); 

const cheerio = require(‘cheerio’); 


 async function getPage() { 

 const browser = await puppeteer.launch({args: [‘–no-sandbox’]});

 const page = await browser.newPage(); 

 await page.goto(‘https://voice.baidu.com/act/virussearch/virussearch?from=osari_map&tab=0&infomore=1’); 

const content = await page.content(); // Get the HTML of the page

const $ = cheerio.load(content); // Load the page HTML into cheerio


const list = []; // Save the filtered data

$(‘ # ptab – 0. VirusHot_1-5-5 _32ay4f ‘). Each ((independence idx, elem) = > {/ / traverse filtering data const arr = [];

 $(elem).find(‘a’).each((idx, item) => { 

const title = $(item).find(‘span.VirusHot_1-5-4_24HB43’).contents().filter((idx, content) => {return content.nodeType === 3; }).text();

 const rank = $(item).find(‘span.VirusHot_1-5-4_3BslNU’).text();

 arr.push({url: $(item).attr(‘href’), title: title, rank: rank}) }); 

 list.push({ category: $($(elem).children(‘header’)[0]).text(), data: arr }); })

 await browser.close(); 

 return list; };


That’s all the code for filtering data. Now let’s plug in the Serverless code. The complete index.js looks like this:

const puppeteer = require(‘puppeteer’); 

const cheerio = require(‘cheerio’); async function getPage() { 

 const browser = await puppeteer.launch({args: [‘–no-sandbox’]});

 const page = await browser.newPage(); 

 await page.goto(‘https://voice.baidu.com/act/virussearch/virussearch?from=osari_map&tab=0&infomore=1’); 

 const content = await page.content();

 const $ = cheerio.load(content); 


 const list = [] 

 $(‘#ptab-0 .VirusHot_1-5-5_32AY4F’).each((idx, elem) => { 

 const arr = []; 

 $(elem).find(‘a’).each((idx, item) => {

const title = $(item).find(‘span.VirusHot_1-5-5_24HB43’).contents().filter((idx, content) => {return content.nodeType === 3; }).text();

 const rank = $(item).find(‘span.VirusHot_1-5-5_3BslNU’).text(); 

 arr.push({url: $(item).attr(‘href’), title: title, rank: rank}) 

 });

 list.push({ category: $($(elem).children(‘header’)[0]).text(), data: arr }); }) 

 await browser.close(); 

 return list; 

}; 

 exports.main_handler = async (event, context, callback) => {

 console.log(“%j”, event);

 const list = await getData(); 

 return list; };


Now you can try debugging your code locally

SCF Native invoke –no-event // Local test function runs


Error found in console:


The template.yaml file contains the configuration of the function. Let’s change the default configuration:

Resources: 

 default: 

 Type: TencentCloud::Serverless::Namespace 

 virus-search:

 Type: TencentCloud::Serverless::Function

 Properties: 

 CodeUri: ./ 

 Type: Event 

 Description: This is a template function

 Environment: 

 Variables: 

 ENV_FIRST: env1 

 ENV_SECOND: env2 

 Handler: index.main_handler 

 MemorySize: 128 

The Runtime: Nodejs8.9

# change the function Timeout to 10 seconds

Globals: 

 Function: 

 Timeout: 10


Now run it again:

Ok, the local function runs and the data returns normally. Now you can deploy the function remotely.


The deployment of function

Configure a Tencent cloud account


Upload deployment to remote

$ scf deploy

 Package name: default-virus-search-latest.zip, package size: 130 mb

.

 [o] Deploy function ‘virus-search’ success

 [o] Deploy trigger ‘api’ success

 [+] Function Base Information: 

 Name: virus-search

.

 [+] Trigger Information: 

 > APIGW – virus-search_apigw: 

 ModTime: 2020-03-01 12:01:13 

 Type: apigw 

.

 service: 

 serviceId: 

service-qnwxxxxxx 

 serviceName: SCF_API_SERVICE 

 subDomain: https://service-qnw3irqg-xxxxxxxxxxx.gz.apigw.tencentcs.com/release/virus-search

.

[o] Deploy success


Here we’ll see that SCF packages functions and dependencies and uploads them for you. Can see the function including 130 + MB, rely on the upload takes a long time, you can open it COS uploading to speed up the process, but the actual experience let me wait for a long time, tencent cloud is currently in closed beta online installation depend on the ability of behind should be open, so can greatly improve the upload deployment experience.


Here I step on a bunch of potholes and spend several times the code development time to climb out of them. Instead of describing the process, I list the potholes below and give the solution:


The first pit is that after uploading, the execution fails due to insufficient memory. This problem is not found in my local test, SCF local run display using only 50+MB memory, the solution is to modify the function execution environment configuration, configure:




The second trap is to find that the nodejs configuration in template.yaml is running version 8.9, which will cause puppeteer to not run and requires a lot of additional configuration. For details, please refer to this article to run puppeteer in SCF. However, this configuration is really too painful, not to mention the various installation dependencies, the installation will also lead to the function package becomes larger, each upload wait time is very speechless, and Tencent’s upload function package does not have a progress bar, here to make fun of, can only wait silly. Therefore, I checked the documentation of Puppeteer, and found that puppeteer is on node10 or higher, so there is no need to install these dependencies, so I decided to modify the Node running environment to solve the problem. However, it is found that Tencent SCF and VScode plug-ins do not support nodejs10.15 version of the project upload, will directly report an error, but you can directly create nodejs10.15 project in the webpage, here is also to make a mockery.


  • Web page to create nodejs10.15 function project



Choose to upload the code package locally on the web page


  • Repackage the local function project package, this step is very important, because the Tencent cloud nodejs10.15 environment comes with puppeteer environment, so we do not need to install the local project node_modules, which greatly reduces the size of the project package. The puppeteer dependency on node_modules was removed and packaged, and then re-uploaded. Very fast!


After the above pit step, modify the newly created function configuration environment according to the previous environment configuration, and run the online test function:



Finally made it! Tears of joy!!


Configuring the API Service

Once the function is successfully tested online, we expose the service through the API for other side calls. The configuration of this is much simpler, directly on the web page dot dot, configuration is good.



Then go to Tencent cloud API gateway management page you can see the API service created above



Now we are developing this function, from the Internet access address is the API service default domain name + function name


Above, our back-end service is configured. If you have your own domain name, you can also modify the public domain name through custom domain name binding.


Develop fast applications

With the server-side data, you can now think about presentation in quick applications. If you are not familiar with fast application development, you can take a look at the official fast application documentation. If you are interested in fast application development, you can try the Apex-UI fast application component library to help you quickly develop a fast application. I will not do the details of development here, but directly on the page code:


<template> 

 <div class=”wrap”> 

 <div class=”cover”> 

<text> </text>

 </div> 

 <div class=”tabs”>

 <text for=”{{list}}” class=”{{active === $idx? ‘active’ : ”}}” onclick=”gotoIndex($idx)”>{{$item.category}}</text> 

 </div> 

 <list class=”list” id=”list”>

 <list-item class=”module” for=”{{(index, item) in list}}” type=”module”> 

 <text class=”category” onappear=”appearHandler(index)”>{{item.category}}</text>

 <div class=”content {{$idx? ‘bt’: ”}}” for=”{{title in item.data}}” onclick=”{{routeDetail(title.url)}}”> 

 <div> 

 <text class=”index top-{{$idx+1}}”>{{$idx+1}}</text> 

< text class = “rumor” show = “{{index = = = 3}}” > rumor < / text >

 <text class=”title”>{{title.title}}</text> 

 <text class=”rank”>{{title.rank}}</text>

 </div>

<text class=”hot” show=”{{! Heat $independence idx}} “> < / text >

 </div>

 </list-item> 

 </list> 

 </div>

 </template>


 <script>

 import router from ‘@system.router’

 import fetch from ‘@system.fetch’ 


 export default { 

 data() { 

 return { 

 active: 0, 

 list: [] 

 }

 }, 

 async onInit() { 

 this.list = await this.getListData(); 

 }, 

 getListData() {

 return new Promise((resolve, reject) => { 

 fetch.fetch({ 

 url: ‘http://your.domain.name/puppeteer’ 

 }).then((res)=> { 

 console.log(res); 

 resolve(JSON.parse(res.data.data));

 }).catch((err)=> {reject(err)})

 })

 }, 

 routeDetail(url) { 

 router.push({

 uri: url

 })

 }, 

 gotoIndex(index) {

 this.$element(‘list’).scrollTo({index: index}) 

 this.active = index

 }, 

 appearHandler(index) {

 this.active = index

 } 

 }

 </script> 

 <style lang=”less”>

 .wrap {

 flex-direction: column; 

 .cover {

 height: 100px;

 width: 750px; 

 background-color: #0ba6af;

 color: #ffffff; 

 padding: 0 20px; 

 text {

 color: #FFFFFF; 

 font-size: 40px;

 } 

}.

tabs { 

 height: 100px;

 justify-content: space-around; 

 align-items: center;

 text { 

 font-weight: bold;

 height: 80px;

 }

 } 

 .list {

 padding: 0 20px; 

 .module {

 background: linear-gradient(‘#b5f2f3 0%’, ‘#ffffff 20%’); 

 padding: 20px; 

 border: 1px solid #DCDCDC;

 border-radius: 30px;

 margin-bottom: 20px;

 margin-top: 20px;

 flex-direction: column; 

 .category { 

 align-self: center; 

 font-size: 40px; 

 color: #000000; 

 font-weight: bold; 

 line-height: 80px;

 } 

 .content { 

 height: 80px; 

 justify-content: space-between;

 .rumor { 

 height: 24px; 

 padding: 0 10px; 

 margin-right: 10px; 

 background-color: #ff1845; 

 color: #FFFFFF; 

 border-radius: 12px; 

 font-size: 16px;

 align-self: center; 

 }

 .hot { 

 background-color: #ff792f; 

 font-size: 20px; 

 padding: 0 10px; 

 height: 28px; 

 line-height: 28px;

 border-radius: 14px;

 text-align: center;

 align-self: center; 

 color: #FFFFFF;

 }

 .index {

 width: 50px; 

 font-weight: bold;

 }

 .top-1 { 

 color: red;

 } 

 .top-2 { 

 color: coral;

 } 

 .top-3 {

 color: sandybrown; 

 } 

 .title {

 margin-right: 20px; 

 color: #000000;

 font-weight: bold; 

 } 

 .rank {

 color: #9c9c9c;

 font-size: 20px;

 } 

 }

 .content:active { 

Background-color: rgba(11, 168, 175, 0.1);

 } 

 }

 }

 } 

 .bt { 

 border-top: 1px solid #DCDCDC;

 } 

 .active { 

 border-bottom: 4px solid #0ba6af; 

 color: #0ba6af; 

 }

 </style>


After running, the project has the following effects:




Scan the code experience with the Fast Application debugger

Above, after stepping on many pits, we have finally completed such a quick application of epidemic hot search. Here is the technical summary.

Technical summary

1. The serverless NodeJS runtime environment needs to be nodeJs10 or higher, otherwise there will be a bunch of dependencies missing and online functions will not run.

2. When you upload the puppeteer function, remove the dependency of puppeteer. Otherwise, the function package is too large, which takes a long time to upload.

3. If the development machine does not have python environment, try to use VScode plug-in development, which can avoid many problems in environment configuration and save a lot of time.

4. Serverless, as a new technology, needs to be used with caution. At present, there are still some problems, such as long cold start response time, different service providers have their own feature standards, which is not convenient for project migration and so on.


References and resource links:

Tencent Cloud SCF document

(https://cloud.tencent.com/product/scf/developer)

Puppeteer document (https://github.com/puppeteer/puppeteer/blob/v2.1.1/docs/api.md)

Cheerio document (https://cheerio.js.org/)

Quick Application Development Documentation (https://doc.quickapp.cn/)

Fast application component library apex – UI (https://vivoquickapp.github.io/apex-ui-docs/)

In SCF Puppeteer (https://cloud.tencent.com/developer/article/1410471)


Scan code attention, please look forward to fast application more exciting content!