Look at the website

  • wordpress-edu-3autumn.localprod.forc.work/
  • Crawl the title of the article, the date of creation, the content of the article

steps

  • Analyze the elements and organize your thoughts

  • Write the code
  • Chrome egg, quickly generate CSS selector

  • Save to markdown file

1, Code Runner plug-in setup, how to output from terminal

  • ctrl+shift+p
  • settings
  • chooseopen user settings

Set up the default terminal for vscode

The PHP code


      
require 'vendor/autoload.php';
use QL\QueryList;
// Generate a QueryList object
$ql = new QueryList();
/ * *@Description: Get the titles, dates, and links of four posts *@param: Home page URL *@return: two-dimensional array, including title, date, URL */ 
function get_tilte_date($url){
    // Global QueryList object to prevent memory overflow errors
    global $ql;
    return $ql->get($url)->rules([  // Set the collection rule
        'date'= > ['header > div > a > time.entry-date.published'.'text'].// The first element of the array is the CSS selector, and the second element is the property name
        'title'= > ['header > h2 > a'.'text'].'url'= > ['header > h2 > a'.'href']
    ])->queryData(); // queryData() returns an array
}
/ * *@Description: Get the full content of the article *@param: The url of the article *@return: The content of the article, a string */ 
function get_content($url){
    global $ql;
    // Use find() to get single elements, find() to pass CSS selectors, text() to get text
    return $ql->get($url)->find('article.post.type-post.status-publish.format-standard.hentry.category-uncategorized')->text();
}

/ * *@Description: Creates the markdown document and writes the crawled data to *@param: an array containing data *@return: Has no return value */ 
function make_markdown($content_array){
    // Open a mymd.md file, if not, create one
    $md_obj = fopen('mymd.md'.'w+');
    foreach ($content_array as $key => $value) {
      fwrite($md_obj,"## {$value['title']}\n");   
      fwrite($md_obj,'{$value['date']}' \n"); 
      fwrite($md_obj,"```\n"); 
      fwrite($md_obj,"{$value['content']}`\n"); 
      fwrite($md_obj,"```\n"); 
    }
    fclose($md_obj);
}

/ * *@Description: start function, call this function, you can start crawler *@param: starting URL, home url *@return: returns no value and writes the contents to the markdown file */ 
function start($url){
    $data = get_tilte_date($url);
    // Iterate through the array and add a content element to the array
    foreach ($data as $key => $value) {
        $data[$key]['content'] = get_content($value['url']);
    }
    make_markdown($data);
}
// Call start(), pass in the starting URL/home url, and start crawling
start("https://wordpress-edu-3autumn.localprod.forc.work/");
Copy the code

I have a little homework

  • Select the book name and the corresponding price and save it to books.txt
  • books.toscrape.com
  • The end result…

Easter egg (Click to go to the function definition)

The next section,

  • PHP crawler – 010 job parsing