An overview of the
- preface
- The statistical results
- Crawler analysis
- Crawler code implementation
- Crawler analysis implementation
- Afterword.
- trailer
preface
** Multi-graph warning, multi-graph warning, multi-graph warning. In the autumn recruitment season, there are many graduates and job-hopping. Our career development still needs to adapt to the market demand, so what about the demand of each programming language in Shenzhen? What about the salary? After writing this article, Zone would like to continue using Python to analyze the current job market in Shenzhen. Then climb to take some hook recruitment data. The following is the sample data of this crawler:
The amount of statistical data in this time is 4658, in which a hook can display 30 pages of data at most and 15 recruitment information on each page, the total is as follows:
30 x 15 = 450
If the home page is climbed and a page is skipped, there are 435 entries, so the data is basically climbed. The rest of the language is not enough for the language in Shenzhen only so many job postings.
The statistical results
Average salary of each language
- Accurate recommendation
- Natural language
- Machine learning
- The Go
- Image recognition
Leading the pack!! The average salary is quite high. Blockchain is hot, as if the average salary is not that high. When I finished my statistics, I felt like I was dragging my feet, MA!! Delete the library and run away!
Calculation method of average salary:
(10k + 20k)/2 = 15k
Finally, take the average of the population. The benefits are quite abundant, such as paid holidays, afternoon tea, snacks and holidays.
The overall decline from Series A to Series D. Most companies don’t need financing, well, probably can’t get capital financing, but their families have money.
How about the market demand for your native language? Did you measure up? Among them, three to five years of siege lion positions are quite many, not afraid to find a job. Another trend is that the higher the salary, the higher the degree requirements. It seems that degree is quite important after all.
Java
Python
The C language
Machine learning
Image recognition
Natural language
Block chain
The Go
PHP
Android
iOS
Web front end
Accurate recommendation
Node.js
Hadoop
Crawler analysis
- Request library: Selenium
- HTML parsing: BeautifulSoup, xpath
- The word cloud: wordcloud
- Data visualization: Pyecharts
- Database: MongoDB
- Database connection: Pymongo
Crawler code implementation
After looking at the statistical results, have you been itching to try? Want to implement the following code yourself? The following is the code implementation. Right-click on the page and click Check to find an item of data:
/ * 1 * / {"_id" : ObjectId("5b8b89328ffaed60a308bacd"),
"education" : "Undergraduate".# Learning requirements
"companySize" : "Over 2,000 people".# Number of employees in the company
"name" : "Python Development Engineer".# Job title
"welfare" : "From 9 to 5, the company has a large platform, many development opportunities, six insurance and one housing fund.".# Company benefits
"salaryMid": 12.5.# Average of salary cap and salary floor
"companyType" : "Mobile Internet".# Type of company
"salaryMin" : "10".# Minimum wage
"salaryMax" : "15".# Salary cap
"experience" : "3-5 years of experience".# Years of work
"companyLevel" : "No financing required.".# Company level
"company" : XXX Technology Co., LTD.# Company name
}
Copy the code
For space reasons, only the main code is shown below:
Get the source data of the web page
# language => Programming language
# city => city
# collectionType => value: True/False True => Database table named after programming language False => city
def main(self, language, city, collectionType):
print("The language currently being crawled is =>" + language + "Current city is =>" + city)
url = self.getUrl(language, city)
browser = webdriver.Chrome()
browser.get(url)
browser.implicitly_wait(10)
for i in range(30):
selector = etree.HTML(browser.page_source) # get source code
soup = BeautifulSoup(browser.page_source, "html.parser")
span = soup.find("div", attrs={"class": "pager_container"}).find("span", attrs={"action": "next"})
print(
span) # Next page
classArr = span['class']
print(classArr) ['pager_next', 'pager_next_disabled']
attr = list(classArr)[0]
attr2 = list(classArr)[1]
if attr2 == "pager_next_disabled":The next button cannot be clicked when the class attribute is ['pager_next', 'pager_next_disabled']
print("I've reached the last page, the crawler is finished.")
break
else:
print("Next page, crawler continues.")
browser.find_element_by_xpath('//*[@id="order"]/li/div[4]/div[2]').click() # Click next
time.sleep(5)
print('{} page captured'.format(i + 1))
self.getItemData(selector, language, city, collectionType)# parse item data and save it in database
browser.close()
Copy the code
Crawler analysis implementation
# Get the sample number of each language
def getLanguageNum(self):
analycisList = []
for index, language in enumerate(self.getLanguage()):
collection = self.zfdb["z_" + language]
totalNum = collection.aggregate([{'$group': {'_id': ' '.'total_num': {'$sum': 1}}}])
totalNum2 = list(totalNum)[0]["total_num"]
analycisList.append(totalNum2)
return (self.getLanguage(), analycisList)
Get the average salary for each language
def getLanguageAvgSalary(self):
analycisList = []
for index, language in enumerate(self.getLanguage()):
collection = self.zfdb["z_" + language]
totalSalary = collection.aggregate([{'$group': {'_id': ' '.'total_salary': {'$sum': '$salaryMid'}}}])
totalNum = collection.aggregate([{'$group': {'_id': ' '.'total_num': {'$sum': 1}}}])
totalNum2 = list(totalNum)[0]["total_num"]
totalSalary2 = list(totalSalary)[0]["total_salary"]
analycisList.append(round(totalSalary2 / totalNum2, 2))
return (self.getLanguage(), analycisList)
# Academic requirements for acquiring a language (word cloud for Pyecharts)
def getEducation(self, language):
results = self.zfdb["z_" + language].aggregate([{'$group': {'_id': '$education'.'weight': {'$sum': 1}}}])
educationList = []
weightList = []
for result in results:
educationList.append(result["_id"])
weightList.append(result["weight"])
# print(list(result))
return (educationList, weightList)
# Working years required to acquire a language (word cloud for Pyecharts)
def getExperience(self, language):
results = self.zfdb["z_" + language].aggregate([{'$group': {'_id': '$experience'.'weight': {'$sum': 1}}}])
totalAvgPriceDirList = []
for result in results:
totalAvgPriceDirList.append(
{"value": result["weight"]."name": result["_id"] + "" + str(result["weight"])})
return totalAvgPriceDirList
# Obtain welfare data for constructing the welfare word cloud
def getWelfare(self):
content = ' '
queryArgs = {}
projectionFields = {'_id': False, 'welfare': True} Use a dictionary
for language in self.getLanguage():
collection = self.zfdb["z_" + language]
searchRes = collection.find(queryArgs, projection=projectionFields).limit(1000)
for result in searchRes:
print(result["welfare"])
content += result["welfare"]
return content
# Get company rank (for bar chart)
def getAllCompanyLevel(self):
levelList = []
weightList = []
newWeightList = []
attrList = ["A round"."B".Wheel "C"."Round D and above"."No financing required."."Public company"]
for language in self.getLanguage():
collection = self.zfdb["z_" + language]
# searchRes = collection.find(queryArgs, projection=projectionFields).limit(1000)
results = collection.aggregate([{'$group': {'_id': '$companyLevel'.'weight': {'$sum': 1}}}])
for result in results:
levelList.append(result["_id"])
weightList.append(result["weight"])
for index, attr in enumerate(attrList):
newWeight = 0
for index2, level in enumerate(levelList):
if attr == level:
newWeight += weightList[index2]
newWeightList.append(newWeight)
return (attrList, newWeightList)
Copy the code
Afterword.
So much for the overall analysis, if you also want to see the salary standard and market demand in your city, welcome backstage harassment. If there are a large number of people, I’ll write down an analysis specifically for your city.
trailer
I’ve written a lot about Python recently, but this is a backend public account, so I’m going to write about the backend. Recently, the concept of micro-services has been very hot, but the network seems to find no real projects to learn, just a few days ago I used Python and Node.js to write micro-services, so I will write micro-services related articles. Stay tuned!
This article was first published on the public account “zone7”, paying attention to the public account to get the latest tweets, background reply [Shenzhen job search] to get the source code.