Use Google Cloud Scheduler, Pub/Sub, Functions, Storage and other Cloud services to build the quality and performance benchmark CI Job workflow of PageSpeed Insights front-end website.

1. PageSpeed Insights

1.1 introduction

PageSpeed Insights is a Web performance monitoring optimization tool from Google that generates actual performance reports on web pages for mobile and desktop devices and provides suggestions on how to improve them. It uses the best practices provided by Google Lighthouse as its benchmark, and uses the Blink rendering tool (Google Chrome’s rendering engine) to capture targeted web pages for optimization analysis, emulating mobile and desktop devices. Hereinafter referred to as PSI.

1.2 Version History

version Release time Update function
V5 Q4 2018 The current version is the latest. Update 2019.05.08 Uses Lighthouse 5.0 as its analysis engine.
V4 In January 2018 It will be discontinued by Q3 2019
V2 In January 2015 Has been discontinued
V1 More early Has been discontinued

1.3 Analysis report composition

1.3.1 Overall speed score

Grades and grades:

  • More than 90 points
  • Medium 50-90 points
  • Less than 50 minutes slower

The V5 version uses Lighthouse to calculate a composite weighted score for a number of performance metrics. V4 and previous versions combine real user speed measurement data from Chrome’s User Experience Reporting database to calculate scores and ratings. Mainly refer to the following two indicators.

  • FCP (First Contentful Paint) First content painting, used to measure when a user sees a visible response from the corresponding web page. The shorter the time, the more likely you are to retain your users.
  • DCL document content loading is a measure of when an HTML document is finished loading and parsing. The shorter the time, the lower the bounce rate.

1.3.2 Measured data

Combined with measured data from other web pages in the Chrome User Experience Report over the past 30 days.

1.3.3 Laboratory data

The absolute time consuming data of the following indicators are given:

  • First Contentful Paint Time when the First content is drawn
  • First Meaningful Paint When it was First effectively drawn
  • Speed Index Indicates the Speed Index
  • First CPU Idle Indicates the First Idle time of the CPU
  • Time to Interactive Indicates the Time before the interaction
  • Estimated Input Latency Indicates the longest potential FID

1.3.4 Optimization suggestions on how to speed up web page loading

1.3.5 Detailed diagnostic recommendations for Web development best practices.

1.3.6 Approved reviews that conform to best practices

1.4 Actual Cases

Take ctrip ticket H5 flight dynamic home page of an online version as an example, intuitive view analysis report: m.ctrip.com/webapp/flig…

1.5 Usage

PSI API is one of Google RESTful APIs that requires only one HTTP request and the response returns a JSON Ojbect. Extremely simple to use.

HTTP Request

GET www.googleapis.com/pagespeedon…

This parameter is mandatory.

  • url: Links to target analysis pages

Six parameters are optional:

  • Category:accessibility.best-practices.performance.pwa.seo. The default isperformance.
  • Locale: Returns the localized language version of the resulting text. Currently, 40 types are supported. The default Englishen.
  • Strategy:desktopOptimized analysis for desktop browsers,mobileOptimize the browser for mobile devices.
  • Utm_campaign: AD series name
  • Utm_source: AD series source
  • Fields: Customize the Response content field.

HTTP Response

Return a JSON Object. For details, see the documentation on the official website.

Simplest command line call

The curl www.googleapis.com/pagespeedon…

2. Google Cloude Platform (GCP)

2.1 System Flow Chart

2.2 Cloud Scheduler

Cloud Scheduler is a fully managed enterprise cron job scheduling service of GCP. App Engine, Cloud Pub/Sub, and arbitrary HTTP endpoints are supported, allowing jobs to trigger Compute Engine, Google Kubernetes Engine, and local resources. Create a Job using the Google Cloud Console. There are three targets: HTTP, Pub/Sub, and App Engine HTTP. Select Pub/Sub here. Set the automatic trigger at 22:00 every day.

View the deployment status after the creation is successful. After the deployment is successful, you can click Run Now to view logs and verify that the system runs properly.

2.3 Cloud Pub/Sub

Cloud Pub/Sub is a simple, reliable, scalable GCP that can be used as the basis for data flow analysis and event-driven computing systems. Two themes are created here, pSI-job for event data transfer of Cloude Scheduler job and PSI-Single for event data transfer of concurrent HTTP requests of Cloud Functions.

2.4 Cloud Functions provides

There are several ways to implement concurrent PageSpeed Insights checks for a large number of web pages. You can use Google App Engine, Google Compute Engine. Since the PSI API is a simple HTTP RESTful API that is context-free, Cloud Functions Serverless is the best and simplest implementation. Cloud Functions is an event-driven serverless computing platform of GCP. Rapid development and deployment can be achieved by building small, independent units of functionality that focus on doing one thing well, and then combining these units into a system. Support for building and deploying services at the level of a single function rather than the entire application, container, or virtual machine.

Against 2.4.1. Write the Function

Currently, the following solutions are supported:

language JavaScript
The runtime Node.js 6(deprecated), 8, 10 (beta)
HTTP framework Express
The HTTP functions Express Request & Response Context
Backstage function (data, context, callback)
Dependency management npm/yarn + package.json
language Python
The runtime 3.7.1
HTTP framework Flask
The HTTP functions Flask Request Object. Return value: Any object that conforms to flask.make_response ().
Backstage function (data, context)
Dependency management pip + requirements.txt
language Go
The runtime Go to 1.11
HTTP framework HTTP.HandlerFunc standard interface
The HTTP functions request: *http.Request. response: http.ResponseWriter.
Backstage function (ctx, Event)
Dependency management go.mod/vendor

2.4.2 deployment Function

Currently, the following modes are supported:

  • Deploy from the local machine.Use the gcloud command-line tool.
  • Deploy through source control systems.Associated Source repository (such as GitHub or Bitbucket) via OAuth using Google Cloud Source Repositories.
  • Deploy using the GCP Console.
    • Web page embedded editor.Write function code directly online.
    • Upload a local ZIP file.The folder directory structure is consistent with the source code project structure for dependency management described above.
    • Import the ZIP file from Cloud Storage.Same as above.
    • Reference Google Cloud Source Repositories Source code project.
  • Deploy through CI/CD.Build continuous integration and deployment systems using Cloud Build.

2.4.3 monitoring Function

Stackdriver provides service Monitoring tools, including Debugger, Monitoring, Trace, Logging, Error Reporting, and Profiler.

3. PSI Functions implementation

After creating a Scheduler Job and two Pub/Sub topics, implement the two corresponding Functions.

3.1 psi – single function

Psi-single () is responsible for calling the PSI API to get JSON results for a specific single URL. Google APIs support multiple invocation methods.

3.1.1 Use the Google API Client. The Discovery API is used to obtain the encapsulated Service and invoke the specific interface.

from googleapiclient.discovery import build

def run(url):
    pagespeedonline = build(
        serviceName = 'pagespeedonline',
        version = 'v5',
        developerKey = API_KEY
    )
    response = pagespeedonline.pagespeedapi().runpagespeed(url = url).execute()
    print(response)
    return 'OK'
Copy the code

3.1.2 Direct call for simple interfacesHTTP RESTful API

import requests
GAPI_PSI = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"

def run(url):
    try:
        payload = {"url": url,
                   "key": API_KEY
                   }
        with requests.Session() as session:
            response = session.get(url=GAPI_PSI, params=payload)
            print(response.status_code)
            print(response.json())
    except requests.RequestException as _e:
        print(_e)
    return 'OK'
Copy the code

The format of the event message is detailed in the official website document. The data attribute is a base64-encoded ByteArray, which carries the actual data content.

import base64

def run_pubsub(event, context):
    pubsub_message = base64.urlsafe_b64decode(event['data']).decode('utf-8')
    return run(pubsub_message)
Copy the code

3.2 psi – job function

Psi-job () is triggered by the Scheduler job, which dispatches all urls to be reviewed in parallel to pSI-single () in the form of Pub/Sub events.

from google.cloud import pubsub_v1

def run(event, context):
    publisher = pubsub_v1.PublisherClient()
    topic = publisher.topic_path(PROJECT_ID, TOPIC_NAME)
    for url in URL_DICT:
        data = url.encode('utf-8')
        publisher.publish(topic, data)
    return 'OK'
Copy the code

3.3 Environment Variables and dependencies

To avoid the leakage of security-sensitive information, key information can be written to Functions environment variables and local environment variables (for local development and debugging). API_KEY, PROJECT_ID and other data in the above code are retrieved from os.getenv(). Cloude Functions has built-in libraries for common dependencies, as described in the documentation. To add dependencies, configure project files for each language. The code above references two dependent libraries.

# requirements.txt
# Function dependencies
requests==2.21. 0
google-cloud-pubsub==0.40. 0
Copy the code

4. Storage

Print () in the above code writes to the StackDriver log library for subsequent filtering analysis. Since the result of each URL review is a JSON Object string, you can further write BigTable, use BigQuery for query analysis, and then import Google Data Studio for visual report presentation. Here, Cloud Storage is used to store JSON strings as a single file.

from urllib import parse
from google.cloud import storage
from google.cloud.storage import Blob

def save(url, report):
    '''Save to https://console.cloud.google.com/storage/browser/[bucket-id]/'''
    client = storage.Client()
    bucket = client.get_bucket("psi-report")
    blob = Blob(f"${parse.quote_plus(url)}.json", bucket)
    blob.upload_from_string(report, "application/json")
Copy the code

Add dependencies.

# requirements.txt
# Function dependencies
google-cloud-storage==1.15. 0
Copy the code

5. The source code

Github.com/9468305/pyt…

6. Document links

  1. PageSpeed Insights developers.google.com/speed/pages…
  2. Google Lighthouse developers.google.com/web/tools/l…
  3. Google Cloud Scheduler

    cloud.google.com/scheduler/
  4. Google Cloud Pub/Sub

    cloud.google.com/pubsub/
  5. Google Cloud Functions

    cloud.google.com/functions/
  6. Google Cloud Storage

    cloud.google.com/storage/
  7. Google Cloud Build cloud.google.com/cloud-build…
  8. Google Stackdriver cloud.google.com/stackdriver…