Big data platform overall construction idea

What about other people’s big data platforms?

What about other people’s big data platforms? If you have attended some large and small technology sharing forums or conferences, you should not be difficult to find that in a variety of new PPT such as “XXX company big data platform practice invincible dry product sharing”, when talking about the technical components of the big data platform, most of them are. A more or less identical system architecture diagram is presented.

In this architecture diagram, various log and DB components data acquisition, storage, and calculation engine, monitoring and scheduling system, no matter how in practice, the application of the real, anyway, all of the components on the diagram will not lack a, in addition to the replacement of increase or decrease of individual components, each company’s big data platform didn’t look too big difference.

So, if you want to ask what a big data platform infrastructure diagram looks like, it’s probably not a bad idea to use HortonWorks’ HDP distribution suite diagram instead of drawing it yourself, as shown below.

The cultivation of service consciousness and product thought

Functional positioning of the big data platform team

In order to discuss the servitization of big data platform and evaluate the service level of big data platform, we must first discuss the functional positioning and service scope of big data platform. Unfortunately, this is not a question with a single answer.

● In some companies, the big data team is only responsible for the development and operation of the basic components, providing services to the business side in the form of SDK, component suite, or cluster.

● In some companies, tools, platforms, etc. on the basis of basic components are responsible for by a dedicated tool team, layer upon layer of division of labor, cross-services between teams.

● In some companies, different division teams build their own business systems vertically on top of the underlying platform components, with the platform base component developers serving the upper business system developers.

● In some companies, the big data team works from bottom to top, from cluster operation and maintenance to the final output of specific terminal business data, and is responsible for the users of terminal data.

Workflow (job) scheduling system

Workflow Scheduler is undoubtedly one of the most important core components of the big data platform. It is an essential part of any big data development platform with a small scale that is not easy to build.

Workflow scheduling systems are also often called Job Scheduler, task scheduling system, workflow Job scheduling system, or simply scheduling system in convention scenarios, depending on the context, address convention, and scope of function reference. We may also use a mixture of these terms as needed below without ambiguity.

As a relatively complex business environment system, workflow scheduling system involves complex content, aiming at a variety of scenarios, the implementation of the scheme is different, is a system that needs both theory and practice.

Security and permission control

Big data platform authority management work, doesn’t it sound like user and password management? Find a database to store the mapping between the two, then find a place to keep track of what everyone can do, and then verify it when needed. Without discussing various encryption and decryption principles and algorithms, is there anything worth discussing about this topic?

In fact, if you really touch the work content of this aspect, you will soon find that no matter in the technical level or the product level, the permission management work in the environment of big data platform is a vexing hot potato. It is not only a technical problem, but also a business problem, and perhaps even a philosophical problem of interpersonal communication and trade-offs.

How to be a Bad Big Data platform engineer

All happy families are alike; each unhappy family is unhappy in its own way.

Originally wanted to start from the perspective of how to become an excellent big data platform development engineer, but carefully thought about it, from this perspective, this topic is too easy! Although I’m not possessed by some Jeff Dean, I’m embarrassed to think I’m in a position to tell people what to do. But who can’t be reasonable?

For example, fry the stock, is not to buy low sell high? Play the Internet, is not to pull traffic cash? And to become an excellent big data platform development engineer, as long as the depth and breadth of both, study technology, understand products, can build architecture, can solve bugs, that is no problem.

If the truth is so simple, need more explanation? And we, probably not easily crushed Buffett, is already smoothly in the air outlet take off!

Yes, good people are all alike, which is too boring to talk about.

There are 10 chapters in this book, from the basic platform construction ideas of big data to the development of big data code farmers. Those who do not have a big picture view cannot understand the true meaning of this book!

Limited to too much space, so it, xiaobian will not be introduced to you, the rest depends on everyone to discover its beauty and wisdom.

So, need this [big data platform infrastructure guide] partner, you can forward this article pay attention to xiaobian, private xiaobian “learn” to get the way!


In short, life is a journey of problem solving, and anxiety is inevitable, but a moderate sense of crisis is not necessarily a bad thing. It’s important not to let anxiety cloud your judgment too much and to empower yourself to actively choose your future.

No matter whether you are engaged in the development of big data platform in the future, we sincerely hope that you can face up to problems in your work and life, find solutions objectively and calmly, and win a full and valuable life for yourself with correct methodology and values.