I also saw the report of data fraud in Unixin. With the rigorous attitude of programmers, I like to solve problems with technology, so I wrote a simple crawler code to calculate the authenticity of data on unixin in detail.
First open the Uxin Beijing page, see the display of vehicles for 62475, a total of 300 pages. Each page shows 40 cars, so 40*30=12000 cars in total. That’s 12,000 urls. I wrote a simple Python crawler that scanned all 300 pages in Beijing and found only 1,921 urls out of 12,000 urls.
The specific data is shown in the table. It can be seen that among the 12,000 displayed vehicles, nearly 10,000 were obtained by the repeated appearance of 40 vehicles.
Not coincidentally, the national page is the same, with 300 pages of cars and a lot of duplicate data. There are more than 1.8 million vehicles on the page, but only 1,637 vehicles are displayed in the 12,000 displays on 300 pages nationwide. The specific data are as follows:
The article also said that there are many cases where one car corresponds to multiple source numbers in Unixin. After observation, I found that the source codes of all vehicle sources in Uxin were increased by themselves, so I increased the ID from 1000001 and scanned it upward. After more than ten hours of concurrent scanning by multiple virtual machines, I found 14422633. The result is as follows
The rule of weight removal is that all the main information (brand, model, car series, year model, color, licensing time, mileage, price, city) in the page of Youxin is consistent. In addition, the overall capture time is longer, and the data update time is longer; The data is not up to date real-time data, but it can indicate the magnitude of the data. I don’t know that car source fraud is not common in used car e-commerce. To be fair, I also looked at guazi and Renren cars, which are advertised very aggressively. First, I looked at the page display:
The total number of uxin vehicles in China is 1,840,187
Renren does not show the total number of vehicles available
The total number of guazi used cars in China is 98,019
I have climbed the national list page of Renren car and Guazi used car, and there is no problem of repeated display. I counted the number of them and compared them with the real source cars of Uxin (time: February 10, 2017). The results are as follows:
Three sites are updated in real time data options, fetching and capture time lag between them, each website again this time difference can add or shelves sold options, capture data result and I grab the data results have certain deviation, deviation I think below 2 digits is reasonable, but would not have millions of magnitude deviation…
The numbers are in, so judge for yourself