preface

Buying a house is a big thing, and we should not be careless. If we are careless, we will join the army of safeguarding our rights, thus exhausting ourselves, labor, time and money, and suffering unspeakably.

Last year, I accompanied my friends to see houses and visited three towns in Wuhan. In the meantime, it gave me some thought

  • 1. Whether to establish an evaluation system for real estate?
  • 2, can use a good visual way to choose a house?
  • 3. Smart property recommendation?

1. Real estate evaluation system

1.1. The original intention of buying a house

Everyone has their own intention to buy a house.

Some office workers said: “I just want to buy a small house next to the company to facilitate work, the best second-hand house, buy can live, do not want to wait, do not want to give the money to the landlord to pay the mortgage.”

Some unmarried or newly married couples said, I want to get married in a two-bedroom apartment of 80 square meters, which is convenient for working, but I can t afford it even if it is too big.

Some couples have said: “after a few years may have two children, want to replace a big, at the same time is the best school district room, children go to school convenient.”

Xiaozi girl said: “Buy a loft is enough, but in the business circle, watching movies, shopping, shopping are not delayed.”

There is a steel boss in my hometown told me: “I want to buy a close to a big hospital, good environment, good retirement.”

Of course, there are speculators who don’t care: “I’ll buy the one that appreciates fast!”

However, most people’s wallets are limited, the need to repeatedly check and consider, after all, now buy a house about six wallets, can not be careless.

1.2. Quantitative dimensions of real estate

People have a thousand faces, but how to find the one you are satisfied with is the key. After listening to some of the original intentions, we can probably sort out some key dimensions

  • Position 0.
  • 1. The unit price
  • Area 2.
  • 3. The total price
  • 4. Number of bedrooms
  • 5. Hardcover
  • 6. Whether a new house
  • 7. Property costs
  • 8. Property services
  • 9. The medical
  • 10. Education
  • 11. Business
  • 12. The traffic
  • . (You can add your own)

For 0-7, we can consider it as an objective dimension, while 8-12 is somewhat subjective. It is better to set up an expert rating system to score them (for example, out of 10).

Therefore, a set of data can be used to describe a certain set of real estate. So many sets of real estate can be described with the following data:

Export const HOUSE_DATA = [{name: 'Sinohouse ', LNG: 114.411094, Lat: 30.471274, unitPrice: 18000, area: 100, totalPrice: [{name:' Sinohouse ', LNG: 114.411094, Lat: 30.471274, unitPrice: 18000, area: 100, totalPrice: 1800000, bedroomNum: 3, isRenovation: 1, isNewHouse: 0, propertyFee: 2.2, useFor: 'on sale ', location: MedicalScore: 7 educationScore: 8, businessScore: 10, trafficScore: 9, propertyScore: 4, environmentScore: 5,}, {name: 'Qingriver landscape ', LNG: 114.401809, LAT: 30.475233, unitPrice: 19600, area: 121, totalPrice: 2371600, bedroomNum: 3, isRenovation: 1, isNewHouse: 0, propertyFee: 2.8, useFor: 'rental ', location: MedicalScore: 7, educationScore: 8, businessScore: 9, trafficScore: }, {name: 'Optical Valley Coordinate City ', LNG: 114.412862, LAT: 30.474282, unitPrice: 22000, Area: 90, totalPrice: 1980000, bedroomNum: 2, isRenovation: 1, isNewHouse: 0, propertyFee: 3.1, useFor: MedicalScore: 6, educationScore: 7.8, businessScore: 9, trafficScore: 7.1, propertyScore: 6.2, environmentScore: 7.4,},...Copy the code

Note: The above data are completely fabricated. The difficulty here should lie in the establishment of the expert scoring system, that is, how to objectively score each evaluable dimension of a house. For example, if a house is close to the subway, it can be considered as having good traffic, but the specific score should be assessed by experts based on the actual situation. Since this article is purely YY, the data is simulated by means of mock.

2. Multidimensional data visualization

Traditional visualization methods, such as graph, bar chart and pie chart, can only be used to analyze low-dimensional data (two-dimensional or three-dimensional). When the dimension of data is larger than three-dimensional, it is difficult to directly understand the data in multi-dimensional space using traditional visualization methods. Since human vision can only observe two-dimensional or three-dimensional space, data in multi-dimensional space is usually mapped to low-dimensional space. In this paper, parallel coordinates and maps are used for visual representation of real estate data.

2.1. What are parallel coordinates

More official:

The so-called parallel coordinate refers to the representation of m-dimensional data attributes through M equidistant parallel axes on a two-dimensional plane. Among them, each parallel coordinate axis represents an attribute dimension, and the value range on the coordinate axis is the minimum and maximum value of the corresponding attribute. Therefore, each m-dimensional data item can obtain M points on M parallel coordinate axes according to the value of each attribute, and connect these M points in turn to form a broken line. Therefore, each m-dimensional data item can be represented by a broken line in the m-dimensional parallel coordinate system, and similar data items have similar trend of broken lines.

An official example diagram of Echarts is referenced here

Click inside and you’ll see what’s going on.

Parallel-coordinates plugin based on D3.js

Since Echarts offers relatively unrefined parallelism, this article uses parallel-coordinates (its Github site has many tutorials and papers on parallelism, so there are no redundant links here), and its ES6 version is parcoords-ES.

Here’s a basic example of it

  • 1. Provide various modes of brush operation (filter data)
  • 2. Provide axis switching operations
  • 3. Use different colors for different threads
  • 4. Curvilinear (reduces visual clutter)
  • 5. Bundling visual bundles together to reduce visual clutter
  • 6. There are other features you can explore on your own.

2.3. Visual representation of real estate data based on parallel coordinates

The visual page designed for this article is shown above. It is mainly divided into three parts (parallel coordinate display area, table and map), which can be linked with each other.

2.3.1. Brush operation

If want to choose unit price to be in 2W below, the area is left and right sides of 100 smooth rice, 3 rooms, and the house with higher commercial score, you can do so:

The list and map display the filtered results simultaneously.

2.3.2. Positioning

After the screening, you can view the specific information of the screening results, and click the location to go to the specific geographical location of the property.

In fact, there is a supplementary function that can be added here. Click the icon of real estate on the map to display the information of hospitals, bus stations, schools and other information within several kilometers around (this function is already available on the Homelink APP), but it was not written in the demo because I was too lazy to create data.

2.3.3. Switch axes

Axis swapping operations change the order of parallel axes to produce different views and visualizations. It is usually possible to swap the parallel axes with close relationship between dimensions to adjacent positions to better present the relationship between dimensions in the view.

For example, if I want to know the relationship between property fee and property service quality, I can drag one of the axes to the side of the other axis, as shown in the figure below:

Not necessarily high property cost of the property service level is high, to find cost-effective method of course hope to find low property cost but high property service level of the house. At this time, the second brush operation, Angle filtering, can be used, as shown in the picture:It’s like picking out houses that meet a certain slope range.

2.3.4. Other functions

You can also choose which dimensions to hide, filter patterns, colors, whether to use curves, and whether to allow axis swapping.

3. Intelligent property recommendation and similarity search

Smart recommendation is terrible, and I am often sucked into douyin’s smart recommendation videos, so apps like Douyin always come and go in my mobile apps. Because they don’t machine learn fancy recommendation algorithms, they make their own.

3.1. Similarity distance measurement

Here is a popular language to introduce the similarity measure —

Birds of a feather flock together, but there are so many people in the world. In non-mathematical terms, well, fair-weather friends are a category, people who want to play cards are a category, and people who want to play games are barely a category. In other words, to find the similarities between people, then the problem is, how do you measure the similarities between people? For example, if I have n people, the first thing I want to do is measure the similarity of every two people, which is a combination problem, so if I have n points, I have n minus 1 times n over 2 diagonals, so I have to do this many times. The question then becomes how to measure the similarity between two people. Well, that seems to make things easier.

Suppose you have an idiot A and another idiot B, how do you calculate how similar they are? We can only measure the similarity between A and B in certain aspects, such as gender, card preference, dota preference, alcohol consumption, and musical taste. All right, let’s keep it simple. So silly A has A vector A: [male, love cards, play dota, drink A catty and A half, music aesthetic medium]; Mentally retarded B also has a vector: [female, does not like cards, does not play DOta, does not drink, music aesthetic appreciation of medium high]. So what we need to do is we need to numeralize these two vectors. For example, 1 is male, 0 is female; 1 means they like playing cards, 0 means they don’t like playing cards; Alcohol, first to A limit, such as two catties, so you can use 1 to represent two catties, so silly force A catty and A half to give A 0.75 points, well, this is A mental retardation B is 0 points; A is 0.5, B is 0.75. So let’s look at the numeric vector:

A: 【1,1,1,0.75,0.5】

B: [0,0,0,0.75]

Now let’s figure out how similar A is to B in these respects. One of the simplest algorithms is the well-known Euclidean distance:

No similarity AB = SQRT ((1-0) + (1-0) square, square (1-0) + (0.75 0) square, square square (0.5 0.75)) = 1.9039. The farther the value, the greater the difference.

At this time, there is another dumb C, he is male, does not like cards, do not play dota, do not drink, music aesthetic high. So C: [1, 0, 0, 0, 1]

Let’s do it separately,

The dissimilarity AC = 1.6202

Dissimilarity BC = 1.0308

If it is forced to divide these three people into two categories, it is obvious that stupid A is one category, and B and C are two categories.

Common cosine similarity, you can see ruan Yifeng god’s article

There are many other similarity distance measurement algorithms, you can search ha.

As a result, we can obtain the degree of similarity between any two properties through similarity measurement, and recommend the properties with high similarity to the user’s collection or favorite properties to the user.

3.2. Similarity search of Kmeans clustering algorithm

The algorithm idea of Kmeans is to divide a group of data into K classes with similarity measure as evaluation index. The algorithm steps are as follows:

1. Firstly, determine a k value, that is, we hope to obtain K sets by clustering the data set.

2. Randomly select K data points from the dataset as the centroid.

3. For each point in the data set, calculate its distance (such as Euclidean distance) from each centroid, and divide the point into the set to which the centroid belongs.

4. After grouping all the data into a good set, there are altogether K sets. Then recalculate the center of mass for each set.

5. If the distance between the newly calculated centroid and the original centroid is less than a set threshold (indicating that the recalculated centroid position does not change much, tends to be stable, or converges), we can consider that the clustering has reached the desired result and the algorithm terminates.

6. If the distance between the new centroid and the original centroid changes greatly, iterations 3 to 5 steps are required.

Specific can see article

Therefore, we can divide the real estate data into K categories according to the desired dimensions, and each category is represented by a different color to facilitate user screening.

The end of the

This article is purely personal thinking about the house, it is difficult to practice, just a way of thinking to see the problem.

The code is too dirty to post.

It’s my first time writing here. I know you.

Added: Demo address Warehouse address

The code is very messy, even buggy, everyone tolerance point look good.

🏆 technology project phase iii | data visualization of those things…