Introduction:

Article tutorial sent quite much, often can encounter a problem, taobao how to climb of ah, it anti climb mechanism good difficult to make! Can give a tutorial to climb taobao!

Today we will teach you how to collect Taobao data and achieve data analysis!

Today will bring you a taobao commodity data crawler it. And visualize the data as usual. Without further ado, let’s begin happily

open

hair

work

with

**Python**** version: **3.6.4

Related modules:

DecryptLogin module;

Pyecharts module;

And some modules that come with Python.

Since said is the simulation login related crawler small case, the first natural is to achieve taobao simulation login. Again, we use our open source DecryptLogin library to do this in three lines:

@ staticMethod def login(): lg = login.login () infos_return, session = lg.taobao() return sessionCopy the code

Also, incidentally, I’ve often been asked to add persistent cookies to the DecryptLogin library. You could have done it yourself by writing two more lines of code:

if os.path.isfile('session.pkl'):
    self.session = pickle.load(open('session.pkl', 'rb'))
else:
    self.session = TBGoodsCrawler.login()
    f = open('session.pkl', 'wb')
    pickle.dump(self.session, f)
    f.close()
Copy the code

I really don’t want to add this feature to the library, but I’d like to add some other crawling-related features later, but I’ll talk about that later. Okay, off topic. Let’s get back to the point. Next, let’s go to the web version of Taobao to catch a wave of bags. For example, F12 opens the developer tool and type something randomly into the product search bar on Taobao, like this:

A global search for keywords such as search yields links like this:

Let’s see what it returns:

I guess that’s right. In addition, if you do not find this interface API, you can try to click the next page of product button in the upper right corner:

This will definitely catch the request interface. A simple test shows that although the number of parameters required to request this interface may seem large, there are actually only two parameters that must be submitted:


Q: merchandise name S: offset of the current page numberCopy the code

Well, according to this interface, and our test results, we can now happily start to realize taobao commodity data capture. Specifically, the main code implementation is as follows:

"' external call" 'def run (self) : search_url =' https://s.taobao.com/search? 'while True: goods_name = input (' please type in the name of the commodity information to grab: ') offset = 0 page_size = 44 goods_infos_dict = {} page_interval = random.randint(1, 5) page_pointer = 0 while True: params = { 'q': goods_name, 'ajax': 'true', 'ie': 'utf8', 's': str(offset) } response = self.session.get(search_url, params=params) if (response.status_code ! = 200). break response_json = response.json() all_items = response_json.get('mods', {}).get('itemlist', {}).get('data', {}).get('auctions', []) if len(all_items) == 0: break for item in all_items: if not item['category']: continue goods_infos_dict.update({len(goods_infos_dict)+1: { 'shope_name': item.get('nick', ''), 'title': item.get('raw_title', ''), 'pic_url': item.get('pic_url', ''), 'detail_url': item.get('detail_url', ''), 'price': item.get('view_price', ''), 'location': item.get('item_loc', ''), 'fee': item.get('view_fee', ''), 'num_comments': item.get('comment_count', ''), 'num_sells': item.get('view_sales', '') } }) print(goods_infos_dict) self.__save(goods_infos_dict, goods_name+'.pkl') offset += page_size if offset // page_size > 100: break page_pointer += 1 if page_pointer == page_interval: time.sleep(random.randint(30, 60)+random.random()*10) page_interval = random.randint(1, 5) page_pointer = 0 else: Time.sleep (random.random()+2) print('[INFO]: print('[INFO]: print('[INFO]: print('[INFO]: print('[INFO]: print('[INFO]: print('[INFO]: print('[INFO]: print('... ' % (goods_name, len(goods_infos_dict)))Copy the code

It’s as simple as that. We’re done. Finally, let’s take a look at the running effect of the code:​​

Data visualization

Here we visualize a wave of milk tea data we caught. Let’s take a look at the nationwide distribution of milk tea merchants on Taobao:

Unexpectedly, the most milk tea shops are in Guangdong. T_T

Let’s take a look at the top 10 sales of milk tea shops on Taobao:

And the top 10 milk tea shops with the number of comments on Taobao:

Take a look at the proportion of goods in these stores that require and do not require freight:

Finally, take a look at the price range of milk tea related products:

Almost for today, the relevant files provide all the source code involved in this article and the data to crawl, you can directly take away the need, click to get. Inconvenient point link group: 948351247