Introduction: This article is recommended to have some Python foundation and front-end (HTML, JS) foundation friends read, zero foundation can go to see my previous article. Cough cough, can not always more small white text, so it seems that I do not enough professional (xi).
Autumn September, osmanthus fragrance, in this crisp autumn, sunny harvest season, we sent away a summer balance exhausted crying to the campus of children seeds, and will soon usher in the annual great mother’s birthday party (no mood to work, can’t wait to celebrate the mother of the motherland! .
Which begs the question, where to play? Baidu lost a “National Day”, out of the first actually is “where to travel less”… Emmmmmmm, due to lack of thinking hall.
So I came up with the (very dangerous) idea of judging recent tourist sites by their sales.
So the goal of this time is to crawl to the site page, and get the information of the site, we can think about the need for a few steps.
1. Baidu map API and Echarts
Because the previous several crawlers are crawling some text information, do the word cloud and so on, I think: no! Meaning! Scarlett! ! This time I happened to be climbing data, I decided to use data’s good gay friend — graph to output the data I climbed, that is to say, I want to use the sales volume of scenic spots and the specific location of scenic spots to generate some visual data.
Take a look at The Map API of Baidu and Echarts. The former is a dedicated map API tool, I heard that many apps are using it, and the latter is a good partner for data processing and home travel. After using it, it is good and I am good (I feel something is wrong).
What is an API? An API is a programming interface to an application. Like a plug and socket, our program needs electricity (what is this program?). Electricity is provided in the socket, we just need to write a plug interface matching the socket in the program, and we can use electricity to do what we want to do, without knowing how the electricity is generated.
Baidu thermal map after data introduction
According to the routine of my last article, rice sauce’s novel is finished! But she also wanted to turn the novel into a book, but how? M sauce won’t ah, at that time rice sauce found a press provides services, press said only need to provide the cover of the novel text, and a design can, so sauce to save novels into word format, and drew a JPG figure, the cover to press, it wasn’t long before m sauce will be got a book binding good (the period of fabrication, Professional publishers can beat me up, I won’t admit it).
In the process of producing the book, Mijiang does not need to know how the publishing house printed the book, nor does he need to know how to bind the book. Mijiang only needs to provide what the publishing house requires.
Approach developers and service providers through APIS
2. Determine the output file
Some people might say, I already know what API means, but how to use it. At this point, Mi-chan is responsible in telling you: Neither will I.
But!
Baidu Map provides a lot of API usage examples, which can be roughly understood if they are based on HTML. If they are based on JS, they can try to change the function (they will not copy the source code silently). After careful observation of the source code, we can know that the main data of thermal map generation are stored in the variable points.
[{x:x,x:x},{x:x,x:x}] is a kind of JSON format data, because it is self-descriptive, so it is easy to understand, probably can know here three values, the first two are latitude and longitude, the last should be the weight (I guess).
That is to say, if I want to generate the popularity of a scenic spot into a heat map, I need to get the longitude and latitude of the scenic spot as well as its weight. The sales volume of the scenic spot can be used as the weight, and the data should be presented in JSON format.
Echarts does the same (*^__^*).
3. Crawl data
In fact, the crawler part is relatively easy (if you have followed my text through the site).
Analyze the url (where to go to scenic spots)→ climb the information in the page (longitude and latitude of scenic spots, sales volume)→ convert to JSON file.
Analysis where attractions page url can structure: piao.qunar.com/ticket/list… Search location ®ion=&from=mpl_search_suggest&page= number of pages
Instead of using regex to match the content, xpath is used to match the content.
def getList(): Place = raw_input(' please enter the search region, type (e.g. Beijing, hot places, etc.) : ') url = 'http://piao.qunar.com/ticket/list.htm?keyword='+ str(place) +'®ion=&from=mpl_search_suggest&page={}' i = 1 sightlist = [] while i: Page = getPage(url.format(I)) selector = etree.html (Page) print '查 读' + STR (I) + '查 读' I +=1 informations = selector.xpath('//div[@class="result_list"]/div') for inf in informations: # get necessary information sight_name = inf. Xpath ('/div/div/h3 / a/text () ') [0] sight_level = inf. Xpath ('. / / span [@ class = "level"] / text () ') if Len (SIGHT_level): SIGHT_level = sight_level[0]. Replace (' scenic ','') else: sight_level = 0 sight_area = inf.xpath('.//span[@class="area"]/a/text()')[0] sight_hot = Inf. Xpath ('. / / span [@ class = "product_star_level"] / / span/text () ') [0]. Replace sight_add = (' heat ', ') Inf. Xpath (". / / p [@ class = "address color999"] / span/text () ') [0] sight_add = re. The sub (' address: | | (. *?) \ (. *?) \ |. .*?$|\/.*?$','',str(sight_add)) sight_slogen = inf.xpath('.//div[@class="intro color999"]/text()')[0] sight_price = inf.xpath('.//span[@class="sight_item_price"]/em/text()') if len(sight_price): sight_price = sight_price[0] else: i = 0 break sight_soldnum = inf.xpath('.//span[@class="hot_num"]/text()')[0] sight_url = inf.xpath('.//h3/a[@class="name"]/@href')[0] sightlist.append([sight_name,sight_level,sight_area,float(sight_price),int(sight_soldnum),float(sight_hot),sight_add.rep Lace (' Address: ','), SIGHT_Slogen, SIGHT_URL]) time.sleep(3) return sightlist,placeCopy the code
1. All the information for each attraction is climbed down here (just to practice using xpath…) . 2. The while loop is used. The break mode of the for loop is to assign zero to the value of I when no sales are found, so the while loop will end at the same time. 3. Address matching uses the re.sub() function to remove n more complex information, as explained later.
4. Output the local text
The text text is stored in an Excel file to protect the code from errors and to keep it peaceful. The text text is stored in an Excel file for future reference.
def listToExcel(list,name): Df = pd DataFrame (list, the columns = [' names' and 'level', 'area', 'start at', 'sales',' heat ', 'address', 'sign', 'website for details']) df. To_excel (name +' attractions. XLSX ')Copy the code
5. Baidu Latitude and Longitude API
Sadly, I couldn’t find the longitude and latitude to go to any scenic spots, so I thought my bi plan would be aborted. (If anyone knows where the longitude and latitude of scenic spots are, please let me know)
Enhahhahahaha, however, I can give up about it, I again find baidu longitude and latitude API, url: api.map.baidu.com/geocoder/v2… Address & Output = Json&AK = Baidu key, modify the “address” and “Baidu key” in the website, open the browser, you can see the longitude and latitude JSON information.
# Latitude and longitude information of Shanghai Oriental Pearl {" status ": 0," result ": {" location" : {" LNG ": 121.5064701060957," lat ": 31.245341811634675}," precise ": 1," confidence ": 70," level ": "UNKNOWN"}}Copy the code
Baidu key application method
So I can according to climb to the scenic spot address, check the corresponding latitude and longitude hot! Python gets the latitude and longitude JSON data as follows.
Def getBaiduGeo(sightList,name): ak = 'headers' = {' user-agent' :'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome / 60.0.3112.113 Safari / 537.36 '} address = address url = 'http://api.map.baidu.com/geocoder/v2/?address=' + address + '&output=json&ak=' + ak json_data = requests.get(url = url).json() json_geo = json_data['result']['location']Copy the code
Observe the obtained JSON file, the data in location is basically the same as the JSON format required by Baidu API, and the sales volume of scenic spots needs to be added to the JSON file. Here you can learn about the shallow copy and deep copy of JSON, and finally output the sorted JSON file to the local file.
Def getBaiduGeo(sightList,name): ak = 'headers' = {' user-agent' :'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome/60.0.3112.113 Safari/537.36'} List = bjsonList = [] ejsonList1 = [] ejsonList2 = [] Num = 1 for l in list: try: try: try: address = l[6] url = 'http://api.map.baidu.com/geocoder/v2/?address=' + address + '&output=json&ak=' + ak json_data = requests.get(url = url).json() json_geo = json_data['result']['location'] except KeyError,e: address = l[0] url = 'http://api.map.baidu.com/geocoder/v2/?address=' + address + '&output=json&ak=' + ak json_data = requests.get(url = url).json() json_geo = json_data['result']['location'] except KeyError,e: address = l[2] url = 'http://api.map.baidu.com/geocoder/v2/?address=' + address + '&output=json&ak=' + ak json_data = requests.get(url = url).json() json_geo = json_data['result']['location'] except KeyError,e: continue json_geo['count'] = l[4]/100 bjsonlist.append(json_geo) ejson1 = {l[0] : [json_geo['lng'],json_geo['lat']]} ejsonlist1 = dict(ejsonlist1,**ejson1) ejson2 = {'name' : l[0],'value' : L [4]/100} ejsonList2.append (ejson2) print 'jsonList2.append (ejson2) print' jsonList2.append (ejson2) print 'num +=1 bjsonList =json.dumps(bjsonList) ejsonlist1 = json.dumps(ejsonlist1,ensure_ascii=False) ejsonlist2 = json.dumps(ejsonlist2,ensure_ascii=False) with open('./points.json',"w") as f: f.write(bjsonlist) with open('./geoCoordMap.json',"w") as f: f.write(ejsonlist1) with open('./data.json',"w") as f: f.write(ejsonlist2)Copy the code
(╯ ‘-‘) ╯ ┻ ━ ┻
In the setting of the longitude and latitude of the address, in order to match the more accurate longitude and latitude, I chose to match the scenic spot address, but goose, scenic spot address there are all kinds of magical address, with parentheses to explain in XX opposite, said a pile of you should turn left turn right turn all kinds of turn to, and English…… Hence the complex removal information in Chapter 3 (I’m finally back!). .
So, even if the complex information is removed, there are still some unmatched scenic spots, so I use nested try, if the scenic spots do not match; Match the scenic spot name, if the scenic spot name does not match; I’ll match the location of the scenic spot, and if I still can’t, THEN I… Then I can… Then I will leap over the remaining… How can you, as a destination, be so hard to find! I don’t want you!
Three JSON files are generated here, one is for baidu map API introduction, the other two are for Echarts introduction.
6. The web page reads the JSON file
Copy the source code in the Baidu Map API example described in chapter 2 into the interpreter, add the key, save it as AN HTML file, and open it to see the same display effect as on the official website. For echarts, click EN in the upper right corner of the page to switch to the English version, and then click Download Demo to download the full source code.
Modify the page source code according to the HTML import JSON file, import json file.
# baidu map API example code position change of the < head > < script SRC = "http://libs.baidu.com/jquery/2.0.0/jquery.js" > < / script > < / head > < script type="text/javascript"> $.getJSON("points.json", function(data){ var points = data; Existing functions in script; }); </script>Copy the code
If you are using jQuery, you will not be able to display the web page in your browser. If you are using jQuery, you will not be able to display the web page in your browser.
To create a server, go to the HTML file folder on the terminal, type python -m SimpleHTTPServer, and open http://127.0.0.1:8000/ in the browser. Remember to set the HTML file name to index.html
7. Afterword.
I can only get 6K latitude and longitude apis per day because I am registered but have no authenticated developer account (good reason to be lazy), so I select the top 400 pages of popular sites (15 per page), and as expected, to debug additional bugs due to increased data, Finally, the data of scenic spots obtained is about 4K5 pieces (climb time is September 10, 2017, climb keywords: popular scenic spots, only represent the sales volume at that time).
Hot spots heat map
Hot Spots list
These are the hot places on the map, and I think it looks like this on The Fourth of July
such
And something like this
ヾ(* φ ω φ) Halloween Everyone can have fun.
PS: I wrote a web page to show the thermal map effect of Baidu Map and the scenic spots list of Echarts for everyone to check. Easyinfo.online: easyInfo.online: easyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online: EasyInfo.online
Finally, I would like to remember WePhone founder Su Xiangmao with awe and regret. I didn’t know him before his suicide, and I don’t want to know him this way. I hope the world of programmers will always be pure and free of fraud.