Renting trouble, I believe that we have more or less. Living alone in a big city, it’s not easy to find a warm place to live, so it’s important to rent a nice, affordable place.
The author of this article, Lu Xing, an engineer from Nali, is also one of the many north drifters. How to find the most desirable nest from the housing sources of the major rental networks? Today, I’m going to let Lu Xing show you if the data can make the best choice. The code is at the end.
Dysphoria
It has been more than two years since my graduation. I have been living in a freely shared apartment. However, I have changed my apartment for four times due to various reasons. Room from the vast sea of looking for a room with a favorite, it is a laborious work, for I was shopping this straight to the destination, in room so much contrast in the room all kinds of properties to select the most optimal is a kind of torture (freely here has to be said about the network function of room, a list of screening can’t prepare know room place, There were too few filters for my needs), so I just made do with one at a time.
Recently, after another rent-changing experience, I came up with the idea of climbing down all the data of the rooms I was renting and finding the room that best meets my expectations. After making a set of procedures, I can operate without thinking when I want to change the rent.
Crawl data
Analysis need to climb from freely online data before, I was using a Python Scrapy crawler frame, but the first find the room number after acquisition is less than freely can check the quantity of online, looking for the reason that is comfortable room list page in some room entry is js dynamically generated, because there is no js Scrapy engine, can only crawl static page, Of course, this part of the data is not collected. Use scrapy-splash to provide JS rendering service, and finally complete the collection of all free room data in rent, a total of 7907 pieces. The sample data collected is as follows. Each line is a STRING in JSON format
The overall impression of rent
I only care about the data of shared housing, and then I filtered the dirty data, and I got 4762 pieces of shared housing data. The average and median rents for shared accommodation are very close, and the overall data is pretty much unbiased, with the number of rooms at the low and high end of the spectrum roughly equal.
The distribution of room quantity at different prices is shown in Figure 1, which basically conforms to normal distribution.
FIG. 1 Distribution of the number of rooms in different price ranges
The mystery of the most expensive room
It can be seen from the picture above that a room costs more than 6000 yuan, which arouses my curiosity about what kind of room can be so expensive. The diagram below. In addition to the adjacent Xidan shopping mall other attributes are nothing outstanding. Go to lianjia to have a look at this west Huangchenggen 45 courtyard, the average price of the community is 146,000 yuan/square meter, well, it seems to understand why the house is so expensive.
In order to have a look at the West Huangchenggen no.45 courtyard, I searched all the rented rooms in this community on Freely, as follows. I suddenly realized that this was the one that was expensive, and the other rooms were not as cheap, but not as outrageous, and some of the properties even looked better than this. This room always feels like it has been mispriced, so maybe it has some hidden properties (daily spirit MAX).
The rent map
The distribution of room prices on the map is shown in Figure 2. Red represents the room with more than 3000 yuan/month, green represents the room with 2000-3000 yuan/month, and purple represents the room with less than 2000 yuan/month. The darker the color, the more houses there are in the same location. It can be seen that the north of Beijing is more expensive than the south, and the east is more expensive than the west. To rent a room for less than 2,000 yuan a month, you have to consider going beyond the fifth Ring Road.
FIG. 2 Distribution of shared rental housing prices on the map
Who is most important?
Let’s take a look at the priorities that Ease takes into account when pricing a room. Random forest algorithm is used to predict the monthly rent of the room, and the following 14 features are selected: Room area, freely configuration version (1.0, 2.0, etc.), configuration type (pudding, latte, etc.), orientation, the room floor, total room in building floors, the distance from the nearest subway station, whether to have independent balcony, whether to have independent bathroom, a few room, hall, area county, relative azimuth Angle of tiananmen square in Beijing, and the distance of tiananmen square. One-hot Encoding is used to encode the four category features of the free configuration version, configuration type, orientation and Beijing district and county, which are eventually expanded to 41 features. Two-thirds of the data were used to train the model and 1/3 of the data were tested. The goodness of fit R2=0.86 was obtained in the test set, and the influence degree of different features on rent was Top10 as follows:
It can be seen that location, room size, convenient transportation, and whether there is a private bathroom are the main factors affecting room rent. You know, I always thought it would be more expensive to have a room facing south, but I guess I was wrong?
The ultimate goal
Finally, back to the ultimate goal of this analysis, to find the room that best meets my expectations. What I need to do is to sort the rooms based on my own concerned attribute values. The room attributes I care about most are [room area, room rent, distance to the company]. Here, I use grayscale correlation analysis to score the rooms, and the detailed calculation process is available online, so I will not list them here.
First of all, I filtered out the rooms whose attribute value exceeded my psychological expectation, and the rooms whose price was greater than or equal to 2200 yuan/month and whose area was less than or equal to 8m2. Select 5 items from the filtered data set as follows:
For dimensionless values of these three attributes, I use deviation normalization as follows. The normalized range of x_i^ is 0,1.
The latter data are as follows:
Then set the optimal sequence, the ideal state of course is the largest room area, the lowest rent, the closest to the company. Therefore, the optimal sequence is [1,0,0], and the correlation coefficient between each attribute and the corresponding attribute of the optimal sequence is calculated as follows:
Since I pay different attention to different attributes, I need to set the weight of each attribute here, and the weight value is determined by the objective optimization matrix.
Therefore, the weight of room area is 1/6, the weight of room rent is 1/3, and the weight of distance to the company is 1/2. Then, the correlation coefficient of each room = the correlation coefficient of room area /6+ the correlation coefficient of room rent /3+ the correlation coefficient of distance to the company /2. The calculation results are as follows:
The correlation coefficients of all rooms were calculated, and the Top10 was obtained from the largest to the smallest:
This can choose the scope of the room is greatly reduced, if the future rental feeling much less trouble. Of course, due to the rapid change of online room information, some people will decide at any time, this process has to rent along with the use, or screen the room after two or three days to see again, maybe the room has been robbed by others.
Source: Ali Technology