sequence
This article mainly summarizes the optimization measures of PhantomJS
phantomjs
Phantomjs is a background browser with a bit of jetty built in, and is usually used for automated testing or crawlers.
Optimal point
-
Pooling technology to avoid repeated startup
For interprocess invocation in other languages, frequent calls to the process for context switches and frequent object creation are time-consuming, so connection pooling optimization can be handled
-
Set about:blank to avoid bugs that the status does not clear
If a Java-like threadLocal is used in the Tomcat connection pool, if the previous thread is not cleared, the next request to reuse the thread will read the dirty data.
Phantomjs doesn’t seem to have a reset interface, so you can use dark magic to open a Blank every time you get, and then request it.
-
Open the disk cache
If the same page is frequently accessed, enable the cache to cache static resources to avoid repeated requests
-
Ditch Selenium and use the API directly
If you are using selenium wrappers, you can consider using the original API, which is more direct.
-
Build distributed REST API services
The processing of requests for network resources can be very time consuming and unstable, so throughput is definitely not high, bottlenecks can easily occur at high concurrency, and distributed deployment is necessary.
summary
In addition to PhantomJS, Chrome and Firefox have similar versions of Headless, so there are a few more options to try out.
doc
- 【 PhantomJS series 】 PhantomJS open correctly
- Phantomjs Api introduction
- Selenium+ PhantomJS performance optimization
- Selenium+Phantomjs climb the pits
- Selenium+PhantomJS crawler stuff