Series catalog:
JAVA Micro blog crawler Basics – Simple Micro blog crawler (manual cookie)
JAVA Micro Blog crawler intermediate – Business Interface (Without Cookies)
JAVA Micro blog crawler advanced – automatic access to Micro blog cookies (no account, daily million magnitude)
One, foreword
Articles are a real pain to write. My language is not good, what sentence, semantic impassability and so on is often, please be sure not to care (you care is useless). It was my first time writing with Markdown and I was going to test the waters, so the layout was a bit messy.
I couldn’t do the language, I couldn’t do the typography, and I was thinking to myself as I was writing these words, “Why are you writing this? Wouldn’t it be easier to just put the code in?” . People always have to try. How do you know you can’t do it until you try?
Finally, this article is about the intermediate implementation of micro-blog crawler, applicable scope: medium scale use, production use
Second, the introduction
Micro-blog is going further and further on the road of anti-crawler, anti-crawler mechanism is always updated, if the maintenance is too tired. Can consider the regular channel. Weibo business Interface (API). Official website home page: open.weibo.com/wiki/ Microblogging API
Three examples,
Batch for specified weibo forward several comments: open.weibo.com/wiki/2/stat…
Take this interface as an example
This is pseudocode, which basically says. Use HttpClient to send a POST request, and modify the return value slightly. How to obtain the token, go to the official website to find a tutorial.
/ * * *@Title: getCount
* @Description: TODO(get the number of comments, retweets and likes of a single tweet) *@paramMids consists of multiple Mids (separated by ", ") *@return* List<JSONObject> Return type */
public List<JSONObject> getCount(String mids)
{
List<JSONObject> list = new ArrayList<>();
String url = "https://api.weibo.com/2/statuses/count.json";
List<PostParameter> nameValuePair = new ArrayList<PostParameter>();
nameValuePair.add(new PostParameter("access_token", token));
nameValuePair.add(new PostParameter("ids", mids));
PostParameter[] pp = nameValuePair.toArray(new PostParameter[nameValuePair.size()]);
try {
Response str = client.get(url, pp, token);
String[] source = str.getResponseAsString()
.replace("["."").replace("]"."").split("},");
for (int i=0; i<source.length; i++){ String temp = source[i]+"}";
JSONObject json = newJSONObject(temp); list.add(json); }}catch (Exception e) {
e.printStackTrace();
}
return list;
}
Copy the code
This is true for the rest of the API, but I won’t go into details on the official website. If you are unsure, there is a customer service and SDK on the official website, you can try to ask them for help
Four, deficiencies
1. Frequency is limited and cannot be used on a large scale
2. Sometimes the requirements are not met (few fields)
Five, the summary
There’s no need to say too much about this post, all the DOS and don ‘ts are on the website. If you don’t have a lot of requirements or special needs for microblog data, you can use this method.