Just a few days ago, The State Film Administration released the data of China’s film market in 2019, which showed that the total box office of last year reached 64.266 billion yuan, up 5.4% year on year. The total box office of Domestic films reached 41.175 billion yuan, up 8.65% year on year, accounting for 64.07% of the market; There were 1.727 billion moviegoers in urban cinemas, up 0.64% year on year.

It looks like a great place, doesn’t it? However, as a serious and realistic data analyst, I can see a hint in the official figures: Domestic box office growth is already 8.65%, why is the number of moviegoers growing less than 1%?

The best way to find out why this happens is to look at the data. We’ll just go ahead and do what we’ve always done, crawl the data in Python, analyze it in BI, and eventually the truth will come out to us.

I. Analysis objectives and indicators

First of all, the purpose of our data analysis is to analyze the domestic film market according to the data of 2019 films, mainly to find the relationship between box office and audience size.

How do you measure a movie? Understand films should be all know so few indicators: “film”, “share”, “attendance”, “than”, “grade”, etc., of our data sources in the cat’s eye film, but because the cat’s eye score threshold is low, may be filled with a lot of water army, so this time don’t have the word “score” indicator.

Python crawls

The following is about to start to crawl data, because the cat’s eye movie’s web page structure is relatively simple, the crawl operation is relatively simple, so HERE I will not show in detail, just say a few steps need to pay attention to the place on the line.

Note: Source code can be obtained in the background private message reply to me “movie”!

1. Look at the structure first

We can see the web page we want to climb from the cat’s eye movie. First, we need to extract the information of this web page. After understanding the general situation, we can right-click and choose to view the source code of the web page to see where the data information we need to climb is in the source code.


2. Request data disguised as a browser

This method is a bit of a cliche, but I won’t go into detail here, just add the headers parameter before sending the request.

3. Extract data

The movie box office etc. in the cat’s eye are encrypted fonts, so we need to decrypt the font. Although character encodings change, objects do not. TTF file, and write out the corresponding encoding font. When we download a font file online_base64.ttf from the Internet again for the second time, we can compare the object information. If the object is the same, Then assign the text corresponding to the first encoding to the second encoding, and that’s it.

4. The main program call is saved in Excel

The first step is to create an empty list and add all the data to it. In the previous extract function, print (data) is changed to yield data, adding all the data to a list will save the data.

5, need to pay attention to the place

  • Download a basic font path and find its corresponding number and encoding
  • Every time you climb a web page, you need to download the font file of the web page first, and then compare it with the basic font file to get the corresponding numeric code of the crawling web page.

3. BI analysis

With the source file, we can carry out BI analysis. As for why we don’t use Python, it is more troublesome to write the code if we want to do a two-eight analysis model, which can not meet the requirements in daily work.

Therefore, I generally use professional BI tools for data analysis. At present, there are many BI tools on the market, but their performance is also uneven. Here, I will take FineBI, an excellent representative of domestic BI tools, as an example.

Note: if you want to get fineBI download address, you can reply to “movie” backstage.

1. Data connection

First, import the data we need to analyze. Finebi can connect Excel, CSV, XML, and all kinds of databases. Here, because there is an Excel table that Python crawls, we can directly select Excel to import.

2. Data processing

We crawl to data may require a second processing, such as dirty data processing, data consolidation, filtering and so on, FineBI is through the way of self-help data sets, according to the requirements of the original data processing, a new one for the analysis of the data sets, then processing including the select field, filtering, grouping, summary, add columns, fields, sort, merge operations.

3. Data visualization

Because the index involved in this time is relatively simple, the visualization can be presented by basically dragging and dropping data fields by FineBI.

4. Conclusion analysis

Without further ado, the conclusion:

  • The domestic film market is close to saturation, this year’s results are false prosperity;
  • The head effect intensifies, and the box office of most films is dismal and the market performance is not good.
  • The growth of box office is basically driven by the price of movies, while the number of moviegoers is basically not increasing.

1. Top 20 grossing movies

More than half of this year’s top 20 box office earners are Domestic films, which seems prosperous. However, as can be seen from the bar chart above, Ne Zha, The Wandering Earth and Avengers: Endgame are in the first tier, raking in more than 4 billion yuan. My Country and I, The Captain, Crazy Alien and Aquaman are in the second tier, earning around 2 to 3 billion yuan. The rest of the films are below 2 billion yuan, with Bank Cram ranked no. 20 at 800 million yuan.

On the whole, last year’s domestic film market has a lot of explosive styles, but the whole shows a ladder shape, more cliffs, most of them concentrated in the top five, generally in line with the 80-20 rule.

2. The Pareto model of box office

To see if the Pareto rule is true, I used FineBI to include a cumulative percentage of ticket sales:

Results it is clear that the top 20% of the film to occupy the market more than 80% of the total box office, that is to say, in the domestic market last year basically rely on several big hot style movie box office total up, sales distribution is more and more focus is definitely not a good thing, which means that most of the film performance is pale, there is no living space.

3. The relationship between box office proportion, screening rate and box office

  • Box office ratio: The percentage of movie box office revenue in total revenue. The higher the box office ratio, the better the quality of the movie and the more people want to see it.
  • Screening rate: films with high screening rate and low box office are bad films, while films with low screening rate and high box office are dark horses;

We can compare this chart with the top 20 bar charts. What are the dark horse films with high box office, high box office proportion and low screening rate in real sense? The answer is the Wandering Earth.

Ne Zha’s high screening rate is due to the lack of quality films to compete with it during the same period of release, so the success of Ne Zha is partly attributed to Renhe and partly to Tianshi. “Mad Alien” is a reasonable performance, “Aquaman” is a typical commercial film, “My Country and I” is a special case, can not be generalized.

4. Relationship between attendance and box office

  • Attendance: that is, the number of people a movie gets, the good movies will have high attendance, vice versa

For comparison purposes, I’ve added a warning line for average attendance, with “My Country and I” and “I Sacrifice For You” topping the charts. One of the more strange is “Pegasus”, “The King of New Comedy”, “The Climber”, the attendance rate is very high, but the box office performance is not satisfactory, should be due to the appeal of its director, starring.

“The Wandering Earth” has an above average attendance rate and is a good movie from every Angle. It is impeccable.

It is gratifying that the top attendance rate is basically domestic movies, watching foreign movies can not meet the taste of most people.

5. Do some additional analysis

The relationship between movie genre and attendance

Comedy films are riding the dust, animation films are emerging, science fiction films are on the rise, thrillers, suspense, history and other niche films are still bleak.


The darker the color, the higher the attendance, and the larger the font, the higher the box office

Although Chen Kaige is often criticized by the king of bad films, but it has to be said that his achievements are still very good, and others like Ning Hao, Han Han, Guo Fan, Chen Guohui and so on are the hope of Domestic films.

Finally, don’t forget, python source code and BI download address, both can be obtained by private message me “movie”!