Intermittent search for information, verification of different implementation methods finally basically solved the page screenshots, because the middle process twists and turns to spend more time, share to help you quickly achieve screenshots

Why phantomJS for screenshots

Screenshots can be done in many different ways, such as:

  • selenium
  • HtmlUnit
  • Html2Image,,, and so on But the effect of these screenshots is not good. Selenium can only capture a screen, not the entire page, while HtmlUnit and Html2Image do not support JS well, resulting in a lot of blank space. Phantomjs is all essential oil, which can capture entire pages and support JS well

preparation

Install phantomjs. mac os

brew install phantomjs
Copy the code

Command line interface (CLI) screenshot

Once it’s installed, we can try it out

  • Open the terminal and enter the following command:
/Users/hetiantian/SoftWares/phantomjs/bin/phantomjs
/Users/hetiantian/SoftWares/phantomjs/examples/rasterize.js
https://juejin.cn/post/6844903686821396487
/Users/hetiantian/Desktop/juejin-command.png
Copy the code
  • See the effect

    Found that the image did not load properly

Run the following command line: just/Users/hetiantian/SoftWares/phantomjs/bin/phantomjs: Phantomjs executable file save address/Users/hetiantian/SoftWares/phantomjs/examples/rasterize js: The rasterize.js file address can be interpreted as running the rasterize.js file with Phantomjs, so to solve the problem of blank images we need to look at the rasterize.js file.

"use strict";
var page = require('webpage').create(),
    system = require('system'),
    address, output, size, pageWidth, pageHeight;

if (system.args.length < 3 || system.args.length > 5) {
    console.log('Usage: rasterize.js URL filename [paperwidth*paperheight|paperformat] [zoom]');
    console.log(Examples: "5in*7.5in", "10cm*20cm", "A4", "Letter");
    console.log(' image (png/jpg output) examples: "1920px" entire page, window width 1920px');
    console.log(' "800px*600px" window, clipped to 800x600');
    phantom.exit(1);
} else {
    address = system.args[1];
    output = system.args[2];
    page.viewportSize = { width: 600, height: 600 };
    if (system.args.length > 3 && system.args[2].substr(-4) === ".pdf") {
        size = system.args[3].split(The '*');
        page.paperSize = size.length === 2 ? { width: size[0], height: size[1], margin: '0px' }
                                           : { format: system.args[3], orientation: 'portrait', margin: '1cm' };
    } else if (system.args.length > 3 && system.args[3].substr(-2) === "px") {
        size = system.args[3].split(The '*');
        if (size.length === 2) {
            pageWidth = parseInt(size[0], 10);
            pageHeight = parseInt(size[1], 10);
            page.viewportSize = { width: pageWidth, height: pageHeight };
            page.clipRect = { top: 0, left: 0, width: pageWidth, height: pageHeight };
        } else {
            console.log("size:", system.args[3]);
            pageWidth = parseInt(system.args[3], 10);
            pageHeight = parseInt(pageWidth * 3/4, 10); // it's as good an assumption as any console.log ("pageHeight:",pageHeight); page.viewportSize = { width: pageWidth, height: pageHeight }; } } if (system.args.length > 4) { page.zoomFactor = system.args[4]; } page.open(address, function (status) { if (status ! = = 'success') { console.log('Unable to load the address!'); phantom.exit(1); } else { window.setTimeout(function () { page.render(output); phantom.exit(); }, 200); }}); }Copy the code

ViewportSize = {width: 600, height: 600}; 🤔️ raised the height by ten times, and found that it was basically a perfect screenshot. However, if the page was too short, there would be defects, with a large blank space left below. ViewportSize = {width: 600, height: 600}; The value is set to the size of the browser that is initially opened. You can load JS by increasing this value. If we could get the actual page size, set the height size, but no, I can’t.

And it is not acceptable to set a high height, such as 30000, because it is not acceptable to have white space at the bottom

 page.evaluate(function(){
     scrollBy(0, 18000); 
});
Copy the code

I can’t use the for loop in evaluate. I really don’t know how to change it, so I give up

The screenshot is taken in Java code mode

  • Required dependency
< the dependency > < groupId > org. Seleniumhq. Selenium < / groupId > < artifactId > selenium - Java < / artifactId > < version > 2.45.0 < / version >  </dependency> <dependency> <groupId>com.codeborne</groupId> <artifactId>phantomjsdriver</artifactId> The < version > 1.2.1 < / version > <! -- this will _always_ be behind --> <exclusions> <exclusion> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> </exclusion> <exclusion> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-remote-driver</artifactId> </exclusion> </exclusions> </dependency>Copy the code
  • Code implementation
public class PhantomjsTest2 { public static void main(String[] args) throws InterruptedException, IOException {// Set the necessary parameters DesiredCapabilities dcaps = new DesiredCapabilities(); // SSL certificates support dcaps.setCapability("acceptSslCerts".true); // Dcaps.setCapability ("takesScreenshot".true); // CSS search supports dcaps.setCapability("cssSelectorsEnabled".true); / / js support dcaps. SetJavascriptEnabled (true); / / driver support (the second parameter indicates your phantomjs engine's path) dcaps. SetCapability (PhantomJSDriverService PHANTOMJS_EXECUTABLE_PATH_PROPERTY,"/Users/hetiantian/SoftWares/phantomjs/bin/phantomjs"); PhantomJSDriver = new PhantomJSDriver(dcaps); Driver.manage ().timeouts().implicitlyWait(1, timeunit.seconds); long start = System.currentTimeMillis(); // Open the page driver.get("https://juejin.cn/post/6844903686821396487");
        Thread.sleep(30 * 1000);
        JavascriptExecutor js = driver;
        for (int i = 0; i < 33; i++) {
            js.executeScript("window.scrollBy(0,1000)"); Thread.sleep(5 * 1000); } // OutputType.file is passed to the getScreenshotAs() method, meaning that the captured screen is returned as a FILE. File srcFile = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE); Thread.sleep(3000); // Use the copyFile() method of the FileUtils tool class to save the File object returned by getScreenshotAs() fileutils.copyFile (srcFile, new File("/Users/hetiantian/Desktop/juejin-01.png"));
        System.out.println("Time:" + (System.currentTimeMillis() - start) + "Ms"); }}Copy the code

I don’t want to go into more detail in the comments. The only thing to say is that the page is swiped by executing the JS code, and every time the page is swiped, you sleep to make sure you have time to load the JS in. Because phantomJS intercepts at a maximum height of 32767px, 33 slides guarantee that the maximum part of the page that can be captured is already loaded. Window.scrollby (0,1000), window.scrollto (0,1000)

Window.scrollby (0,1000) window.scrollby (0,1000) when executed here, the page slides 1000+1000px window.scrollto (0,1000) window.scrollto (0,1000) At this point the page slides to 1000pxCopy the code

Window. ScrollTo (0, document. Body. ScrollHeight can slide to the bottom of the page, not to choose for two reasons: 1) the slide to the bottom at a draught js will be too late, some at the bottom of the page doesn’t load 2), can slide has been loaded note: The disadvantage of this method is that the JS that wants to capture the page cannot load properly. Sure enough, a bear and a fish can’t have it both ways, so it takes more than four minutes to capture a picture

= = = = = = = = = = = = = = = = = = = update in 2018.10.15 = = = = = = = = = = = = = = = = = = = =

The disadvantage of phantomjs

  • The maximum screenshot length is 32767px
  • There may be problems with crossing over
  • You need to install a browser driver

Puppeteer can be used for puppeteer screenshots.

const puppeteer = require('/usr/local/lib/node_modules/puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        headless: false
    });
    const page = await browser.newPage();
    // await page.goto('https://www.zhihu.com/question/22263777');
    await page.goto('http://www.iqiyi.com');
    await page.setViewport({
        width: 1200,
        height: 800
    });

    await autoScroll(page);

    await page.screenshot({
        path: 'jd.png',
        fullPage: true}); await browser.close(); }) ();function autoScroll(page) {
    return page.evaluate(() => {
        return new Promise((resolve, reject) => {
            var totalHeight = 0;
            var distance = 100;
            var timer = setInterval(() => {
                var scrollHeight = document.body.scrollHeight;
                window.scrollBy(0, distance);
                totalHeight += distance;

                if(totalHeight >= scrollHeight) { clearInterval(timer); resolve(); }}, 100); })}); }Copy the code