The glory of the King of web crawlers
Because need, so create. — an open source community
Like to play mobile games friends should have played some news of king pesticide, I as a mobile games dregs also played a few times, commonly used hero for Arthur, Angela, Luban…… . After playing a few games, I am attracted by the beautiful UI design of each hero (but I play shooting games most often, if I like to play, I can pay attention to my private chat), but I don’t know much about the glory of all heroes. So in order to understand the allusion of each hero, FROM 10 o ‘clock yesterday to 2 o ‘clock the next day, I whipped up this open source program (Because need, so create)
Tell me about the program
Modules and technology stacks
First of all, this program mainly includes two parts, namely, data capture and processing, data display. The main technology stacks used are:
- Java8
- Okhttp (Application Layer)
- Jsoup (Data parsing)
- JSP+CSS (ugly interface, ha ha)
Take a look at the actual effect (haha, ugly)
Take a look at the implementation core code
interface
/ / parsing
public interface Parser {
void parser(a) throws ExecutionException, InterruptedException;
}
/ / grab
public interface Crawler<T.R> {
String doGet(String uri, Map<T,R> headers);
default void setHttpHeaders(Request.Builder builder, Map<T,R> headers){
if(headers == null || headers.isEmpty()){
return ;
}
for(Map.Entry<T,R> entry : headers.entrySet()){ builder.addHeader(String.valueOf(entry.getKey()),String.valueOf(entry.getValue())); }}}Copy the code
Fetching public methods
public class HttpCrawler implements Crawler<String.String> {
private OkHttpClient httpClient = new OkHttpClient();
private static HttpCrawler instance = new HttpCrawler();
@Override
public String doGet(String uri, Map<String,String> headers) {
asserturi ! =null;
Request.Builder httpBuilder = new Request.Builder();
// Set the request header
setHttpHeaders(httpBuilder,headers);
Request request = httpBuilder.url(uri).build();
Response response;
String page = "";
try{
response = httpClient.newCall(request).execute();
if(! response.isSuccessful()){throw new HttpStatusException(http_error.getMsg(),response.code(),uri);
}
ResponseBody responseBody = response.body();
if(Objects.nonNull(responseBody)){
byte[] bytes = responseBody.bytes();
page = newString(bytes,Charsets.GB2312.name()); }}catch (IOException e) {
e.printStackTrace();
}
return page;
}
public static HttpCrawler getInstance(a) {
returninstance; }}Copy the code
Parsing the hero
public class KingParser implements Parser {
private static KingParser kingParser = new KingParser();
private StoryParser storyParser = StoryParser.getInstance();
private String page;
private List<Hero> heros = new ArrayList<>();
private ExecutorService executors = Executors.newCachedThreadPool(new ThreadFactory() {
AtomicInteger integer = new AtomicInteger();
@Override
public Thread newThread(@NotNull Runnable r) {
return new Thread(r,"parser-thread-"+integer.getAndIncrement()); }});@Override
public void parser(a) throws ExecutionException, InterruptedException {
Document document = Jsoup.parse(page);
if(document == null || StringUtils.isEmpty(document.body().html())){
return;
}
Elements heroBox = document.getElementsByClass(WebAppConfig.kingClassName);
Elements heroLists = heroBox.get(0).getElementsByTag(li.name());
long start;
System.out.println("Start time ==="+(start=System.currentTimeMillis()));
AtomicInteger count = new AtomicInteger();
for(Element element : heroLists){
Hero hero = new Hero();
count.getAndIncrement();
Future<Object> submit = executors.submit(() -> {
Elements aTag = element.getElementsByTag(a.name());
String uri = WebAppConfig.baseUri + aTag.attr(href.name());
hero.setDetail(parserStory(uri));
hero.setHero(aTag.get(0).getElementsByTag(img.name()).get(0).attr(alt.name()));
hero.setPicture("http:" + aTag.get(0).getElementsByTag(img.name()).get(0).attr(src.name()));
return hero;
});
heros.add((Hero) submit.get());
}
/ / 4922
System.out.println("End time ==="+(System.currentTimeMillis()-start));
System.out.println("Co-grab :"+count.get());
}
public static KingParser getInstance(a){
return kingParser;
}
private String parserStory(String uri){
storyParser.setUri(uri);
storyParser.parser();
return storyParser.getStory();
}
public void setPage(String page) {
this.page = page;
}
public List<Hero> getHeros(a){
if(CollectionUtils.isEmpty(heros)){
try {
parser();
} catch(ExecutionException | InterruptedException e) { e.printStackTrace(); }}returnheros; }}Copy the code
Parsing the story
public class StoryParser implements Parser{
private String uri;
private String story;
private HttpCrawler httpCrawler = HttpCrawler.getInstance();
private static StoryParser storyParser = new StoryParser();
@Override
public void parser(a) {
String detailPage = httpCrawler.doGet(uri, null);
Document parse = Jsoup.parse(detailPage);
Element heroStory = parse.getElementById("hero-story");
Element element = heroStory.getElementsByClass("pop-bd").get(0);
story = element.html();
}
public String getStory(a) { return story; }
public void setUri(String uri) { this.uri = uri; }
public static StoryParser getInstance(a) { returnstoryParser; }}Copy the code
The code still needs to be optimized
- Caching: Each processing requires multiple request resolution and can be used instead.
- Interface: The interface is not beautiful, you can use Javascript and CSS3 to make the page dynamic.
Github welcomes issues
Making the address
Pay attention to my
Individual public number: see cross talk also want to knock code