Python doesn't just crawl data! It can also be used for these things, you know?

With this beginner-friendly guide, you can build your own custom Python scripts to automatically measure your site’s key speed and performance metrics.

Over the past month, Google has announced a number of ways to measure user experience through key speed and performance metrics.

Coincidentally, I’ve been trying to write a Python script that uses the Google PageSpeed Insights(PSI)API to collect metrics for multiple pages at once without having to run tests for each individual URL.

After receiving Google’s announcement, I thought it would be a good time to share it, and explained how to create beginner friendly Python scripts.

The best thing about the script is that once you’ve established the foundation, you can extract a number of different metrics that can be found in page speed tests as well as Lighthouse analysis.

Overview of important network indicators

In early May, Google introduced Core Web Vitals, which is part of its key Web Vitals metrics.

These metrics are used to provide guidance on the quality of the user experience on the site.

Google describes them as a way to “help quantify your site experience and identify opportunities for improvement,” further underscoring their shift to a focus on user experience.

Core Network Vitality is a real, user-centric metric used to measure key aspects of the user experience. Load time, interactivity and stability.

In addition, Google announced last week that they will introduce a new search ranking signal that combines these metrics with existing page experience signals, such as mobile device friendliness and HTTPS security, to ensure that they continue to serve high quality sites to users.

Monitoring performance Indicators

This update is expected to be available in 2021, and Google has confirmed that no immediate action is required.

However, to help us prepare for these changes, they updated the tools used to measure page Speed, including PSI, Google Lighthouse, and the Google Search Console Speed Report.

Where does the Pagespeed Insights API start?

Google’s PageSpeed Insights is a useful tool for viewing summaries of web results, using both field and lab data to generate results.

This is a good way to get a few URL summaries because it is used page by page.

However, if you are working on a large site and want to gain large-scale insights, the API can make it easier to analyze multiple pages at once without having to insert separate urls.

Python scripts to measure performance

I created the following Python script to measure key performance metrics on a large scale to save time spent manually testing each URL.

The script uses Python to send requests to the Google PSI API to collect and extract metrics displayed in PSI and Lighthouse.

I decided to write this script in Google Colab because it’s a good way to start writing Python and allow easy sharing, so this article will use Google Colab throughout the installation process.

However, it can also run locally, with some tweaks to data upload and download.

It is important to note that some steps may take some time to complete, especially if each URL is run through the API, in order not to overload requests.

Therefore, you can run the script in the background and return to the script after completing the steps.

Let’s walk through the steps required to get this script up and running.

Step 1: Install required software packages

Before we can start writing any code, we need to install some Python packages before we can use the script. These are easy to install using the import feature.

The packages we need are:

Urllib: Used to process, open, read, and parse urls.
Json: Allows you to convert JSON files to Python or Python files to JSON.
Request: An HTTP library for sending various HTTP requests.
Pandas: Primarily used for data analysis and processing, we are using it to create DataFrames.
Time: A module for processing time that we are using to provide time intervals between requests.
Files: With Google Colab, you can upload and download files.
IO: the default interface for accessing files.

Step 2: Set up the API request

The next step is to set up the API request. The full instructions can be found here, but essentially the command will look like this:

www.googleapis.com/pagespeedon…

This will allow you to attach urls, policies (desktop or mobile) and API keys.

To use it in Python, we’ll use the urllib request library urllib.request.urlopen and add it to a variable named result so that we can store the results and use them again in our script.

Step 3: Test the API

To test the CORRECT setup of the API and to understand what was generated during the test, I ran a URL through the API using a simple urllib.request method.

After doing this, I converted the result into a JSON file and downloaded it so I could view the results.

(Note that this method is used to convert and download JSON files in Google Colab.)

Step 4: Read the JSON file

JSON files show field data (stored under loadingExperience) and lab data (found under Lightroom Ult).

To extract the desired metrics, we can use the JSON file format because we can see the metrics below each section.

Step 5: Upload the CSV and store it as the Pandas data box

The next step is to upload the CSV file of the URL we want to run through the PSI API. You can generate a list of site urls using a crawl tool, such as DeepCrawl.

We recommend that you use a smaller sample set of urls here when using the API, especially if you have a large site.

For example, you can use the page with the most visits or the page that generates the most revenue. Also, if you have templates on your site, it’s a good idea to test them.

You can also add the column-header variable here, which we’ll use as we walk through the list. Make sure this name matches the column header name in the CSV file you uploaded:

(Note that this method is used to upload CSV files in Google Colab.)

Once uploaded, we will use the Pandas library to convert the CSV to a DataFrame, which we can iterate through in the following steps.

The DataFrame looks like this, starting with a zero index.

Step 6: Save the result to the response object

The next step involves using a for loop to iterate over the DataFrame of the URL you just created through the PSI API.

The for loop allows us to traverse the list of uploads and execute commands for each item. We can then save the result to the response object and convert it to a JSON file.

We’ll use the x in scope here, which represents the urls that are running in the loop, and (0, len) allows the loop to iterate over all urls in the DataFrame, no matter how many urls there are.

The response object prevents loops through overwriting each other to your URL, enabling us to save the data for future use.

This is also where the URL request parameters are defined using the column header variable before it is converted into a JSON file.

I also set the sleep time here to 30 seconds to reduce the number of consecutive API calls.

Alternatively, if you want to make a request faster, you can append an API key to the end of the URL command.

Indentation is also important here, because each step is part of the for loop and therefore must be indented in the command.

Step 7: Create a data box to store the response

We also need to create a DataFrame to store the metrics we want to extract from the response object.

A DataFrame is a table-like data structure with columns and rows for storing data. We just need to add a column for each metric and name it appropriately, as follows:

For the purposes of this script, I used the Core Web Vital metrics along with other load and interaction metrics used in the current release of Lighthouse.

These indicators each have different weights and are then used for the overall performance score:

You can find more information about each metric and how to interpret the scores on the respective target pages linked above.

I also chose to include the speed index and overall categories, which will provide slow, average or fast scores.

Step 8: Extract metrics from the response object

With the response object saved, we can now filter it and extract only the desired metrics.

Here, we’ll use the for loop again to iterate through the response object file and set a series of list indexes to return only specific metrics.

To do this, we will define the column names from the DataFrame and the specific categories of response objects from which each metric is extracted for each URL.

I’ve set this script up to extract the key metrics mentioned above, so you can use it immediately to collect this data.

However, you can extract many other useful metrics that can be found in both PSI tests and Lighthouse analysis.

You can use this JSON file to see where each metric is in the list.

For example, when extracting metrics from the Lighthouse audit (such as the displayed value for “interaction time”), the following will be used:

Again, it is important to ensure that each one is in the loop, otherwise they will not be included in the iteration and only one result will be generated for a URL.

Step 9: Convert DataFrame to CSV file

The final step is to create a summary file to collect all the results, so we can convert it to a format that is easy to analyze, such as a CSV file.

(Note that this method is used to convert and download CSV files in Google Colab.)

Explore the data further

Currently, all the metrics we export are stored as strings, which are Python data types for text and characters.

Since some of the metrics we extracted are actually numeric values, you might want to convert strings to numeric data types, such as integers and floating point numbers.

Integers, also called ints, are data types of integers, such as 1 and 10.

Floating point numbers, also known as floating point numbers, are decimal points, such as 1.0 and 10.1.

To convert a string to a number, we need to perform two steps, the first of which is to replace the ‘s’ character (for seconds) with a space.

We do this by using the.str.replace method on each column.

We will then use the.astype() method to convert the string to an integer or floating point number:

Once you have done this, you can further evaluate the data using a number of different methods.

For example, you can use data visualization libraries (such as Matplotlib or Seaborn) to visualize metrics and how measurements change over time and group the results into slow, medium, and fast buckets.

Since we’ve covered so much, I won’t cover it in this article, but feel free to contact us if you’d like more information.

conclusion

The script ended up helping me measure key page speed and performance metrics for a set of urls and visualize the results to identify pages that needed improvement.

It also allows you to monitor results over time and quantify the improvements that have been made.

I also created a script specifically to measure the percentages and categories of the three core Web Vitals.

I hope this will be helpful to those who wish to automate their performance testing and further explore the PSI API.

Please feel free to save a copy of this Colab file and use it to help measure and monitor your page speed, or follow your own steps. You can access all the code snippets I’ve shared in this article here.

This article reprinted text, copyright belongs to the author, such as infringement contact xiaobian delete!

The original address: www.tuicool.com/articles/iM…

Source code or more (click here to download)

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.