The text is written in the following format: The text is written in the pandas package, and the text is written in the Matplotlib package. The text is written in the pandas package

1 Data Overview

Total quantity: 34,829 Quantity after deduplication: 970 Quantity of most types: Internet/e-commerce Number of Internet/e-commerce types: 3,476

2 Company type

The private enterprise

 1def autolabel(rects):
 2    for rect in rects:
 3        height = rect.get_height()
 4        plt.text(rect.get_x()+rect.get_width()/20.1, height, '%s' % height)
 5def company_type_desc(data,file_name="Company type and quantity distribution") :6    companyTypeGroup=data['company_type'].groupby(data['company_type'])
 7    companyTypeCount=companyTypeGroup.count().sort_values(ascending=False)
 8    plt.figure(figsize=(22.12))
 9    rects =plt.bar(x = arange(len(companyTypeCount.index)),height = companyTypeCount.values)
10    plt.xticks(arange(len(companyTypeCount.index)),companyTypeCount.index,rotation=360)
11    autolabel(rects)
12    plt.title("Company type and quantity distribution")
13    plt.xlabel('Type of Company')
14    plt.ylabel('number')
15    plt.savefig("data/"+file_name+".jpg")
Copy the code

Row 6 fetches the company_type column and groups it according to it. Row 7 counts the groups by count and sorts the data by sort_values. Line 8 is to initialize Matplotlib and set the proportion of the graph. Line 9-10 is to solve the problem that when Matplotlib is displayed, the X-axis data is not displayed in the order of the data. If the code is directly executed according to line 37, Line 11, where the data displayed is out of order, is processed to display the quantity at the top of each bar chart, where the text method is used to fill in text anywhere on the graph by calculating the coordinates of each bar chart

3 Education processing

college

 data=data[data["education"].apply(lambda x:True if str(x).find("Move") = =- 1 else False)]
Copy the code

The data after cleaning is as follows

4 recruitment number processing

headcountCount=headcountGroup.count().sort_values(ascending=False) [0:20]
Copy the code

Some illegal data were found in the top 20 charts. Since the proportion was not large, it was directly filtered out. The code is as follows

Data =data[data["headcount"].apply(lambda x:True if STR (x).find(" publish ")==-1 else False)]Copy the code

The filtered data is displayed

5 Handling the release time

 1def deal_publish_data(value):
 2    try:   
 3        datetime.datetime.strptime(value,'%m-%d')   
 4    except ValueError:  
 5        return False  
 6    return True 
 7def publish_date_plot_desc(data):
 8    data['publish_date']=data['publish_date'].apply(lambda x:str(x).replace("Release".""))
 9    data=data[data['publish_date'].apply(lambda x: deal_publish_data(x))]
10    publishDateGroup=data['publish_date'].groupby(data['publish_date'])
11    publishDateCount=publishDateGroup.count().sort_index(ascending=False)
12    plt.figure(figsize=(22.12))
13    plt.plot(arange(len(publishDateCount.index)), publishDateCount.values)
14    plt.xticks(arange(len(publishDateCount.index)),publishDateCount.index,rotation=90)
15    plt.title("Release Time Distribution")
16    plt.xlabel('Release time')
17    plt.ylabel('number')
18    plt.savefig("Data/Release time distribution -- by time.jpg")
Copy the code

The 8th line is to replace the data and change it into month-day format. The 9th line is to call the judgment of date conversion and select the data with successful date conversion. The 10th line is to sort the index column, that is, to sort the date

6 Industry Treatment

data=data['business'].str.split('/', expand=True).stack().reset_index(level=0).set_index('level_0').rename(columns={0:'business'}).join(data.drop('business', axis=1))
data=data['business'].str.split(', ', expand=True).stack().reset_index(level=0).set_index('level_0').rename(columns={0:'business'}).join(data.drop('business', axis=1))
Copy the code

Follow the public account and reply “51Job” to get the project code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Is there a future in data analysis? — Data preprocessing (II)

1 Data Overview

2 Company type

3 Education processing

4 recruitment number processing

5 Handling the release time

6 Industry Treatment

Is there a future in data analysis? — Data preprocessing (II)

1 Data Overview

2 Company type

3 Education processing

4 recruitment number processing

5 Handling the release time

6 Industry Treatment

Related Posts

Five design principles for mobile user interfaces

How to establish a complete network security management system

Under the epidemic, from one empty city to another, I saw hangzhou so lonely for the first time