Web crawler simply means to access the API connection of the website through crawler to obtain data information. A crawler can extract data from a web page and store it in a new document. Web crawler support all kinds of data collection, files, pictures. Video and so on can be collected, but can not collect illegal business. In the era of Internet big data, web crawler mainly provides the most comprehensive and latest data for search engines, and web crawler is also a crawler that collects data from the Internet.
We can also collect public opinion data through web crawlers, such as news, social contact, forum, blog and other information data. This is also one of the common public opinion data acquisition schemes. Generally, crawler is used to collect data from some meaningful websites by crawler proxy IP. Public opinion data can also be purchased in the data trading market or obtained by professional public opinion analysis teams. However, generally speaking, professional public opinion analysis teams collect relevant data by crawler using proxy IP, so as to analyze public opinion data.
Due to the popularity of short videos, Douyin and Kuaishou, the two mainstream short video apps, we can also collect Douyin and Kuaishou through crawler programs for public opinion data analysis. The statistical data will be generated into tables and provided to everyone as data reports. You can also refer to the following collection scheme codes:
String targetUrl = "http://httpbin.org/ip"; // Proxy server (www.16yun.cn) string proxyHost = "http://t.16yun.cn"; string proxyPort = "31111"; // Proxy authentication information string proxyUser = "username"; string proxyPass = "password"; WebProxy proxy = new WebProxy(string.Format("{0}:{1}", proxyHost, proxyPort), true); ServicePointManager.Expect100Continue = false; var request = WebRequest.Create(targetUrl) as HttpWebRequest; request.AllowAutoRedirect = true; request.KeepAlive = true; request.Method = "GET"; request.Proxy = proxy; //request.Proxy.Credentials = CredentialCache.DefaultCredentials; request.Proxy.Credentials = new System.Net.NetworkCredential(proxyUser, proxyPass); // Set Proxy Tunnel // Random ran=new Random(); / / int tunnel = ran. Next 00 (1100); // request.Headers.Add("Proxy-Tunnel", String.valueOf(tunnel)); //request.Timeout = 20000; //request.ServicePoint.ConnectionLimit = 512; //request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36"; //request.Headers.Add("Cache-Control", "max-age=0"); //request.Headers.Add("DNT", "1"); //String encoded = System.Convert.ToBase64String(System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(proxyUser + ":" + proxyPass)); //request.Headers.Add("Proxy-Authorization", "Basic " + encoded); using (var response = request.GetResponse() as HttpWebResponse) using (var sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8)) { string htmlStr = sr.ReadToEnd(); }Copy the code