The amount of project data of the company is limited, and it is also difficult to obtain the online and offline consumption data of the industry. Without enough data, how to apply big data? How to access actual high-concurrency big data projects in your company’s current situation?

Want to get hands-on with high-concurrency architecture or work on internet-level projects, but don’t want to leave your current job?

How do small and medium-sized enterprises think of ways to break the bureau

Big data itself is particularly easy to form a technological monopoly, which makes it difficult for long-tailed small and medium-sized enterprises to form their own big data technology methods and experience, making it difficult to change the fate of small enterprises, and also easy to lead to the loss of high-quality engineers. How to break through this situation?

Small and medium-sized enterprises (smes) are usually long tail, with small data form, and are basically unable to generate large-scale data by themselves, let alone have the strength to establish a data sharing platform. Moreover, there is a natural trust gap with other enterprises, so it is difficult to share enterprise data. The problem caused by these essential underlying logic is that it is difficult for smes to jointly generate enough data, so it is difficult to apply the advantages brought by big data technology and big data platformization.

But if small and medium-sized enterprises have the opportunity, they must set foot in big data, because the era of intelligent Internet has arrived, to pave the way for the future development, to avoid being easily eliminated, the stronger the enterprise in this era, the weaker the weaker.

Is there a good way? The feasible strategy for small and medium-sized enterprises is to “eat big customers”, find big customers to cooperate with, serve for them, and then take the platform data of big customers to exercise their big data ability. This must have excellent business and technical cooperation, open mind and lasting patience!

For example, I used to know an architect of a small Microsoft software company. The company took over a provincial government big data project through business operation, but I was unable to undertake it according to normal logic. But the key elements of the project is core business must understand how to do data cleaning, just my friend company is business experts in this field, this ability allows businesses to go all out to take over this project, the key is in the big data consultants with my friend in architect, is made into the project’s technical architecture and data together develop and implement, After about two years, the company quickly acquired the capacity to undertake big data projects, and since then, it has been advertised as a big data technology enterprise everywhere. My friend has also become a well-known big data technology architect.

Of course, many small and medium-sized enterprises are not software companies, but they can also set up big data application departments to attract some excellent talents, or form strategic alliances with other professional and entrepreneurial big data technology teams, closely follow their own industry specializing in data business, and develop development strategies in line with their own enterprise characteristics. Then go to the government, telecommunications, banking, energy, manufacturing, etc have the capacity to produce large data, providing service solutions for large data, to revitalize the data of the large Dead Sea become data assets, through service form, make oneself become a big data strategy and service providers, then fully with the large-scale data access, It can not only help large enterprises to provide the service capability of collecting, analyzing and operating data, but also quickly build the strength and brand of their own enterprises in the era of big data.

Such as: Do you do for a small manufacturers of medical equipment, while in the field of vertical you’re good, but the future development, commercial limitation is very big, if change its nameplate to medical equipment providers, big data and AI by the entry of the medical equipment as the data acquisition, further for the government to provide medical and big data cleaning services, and grab one or a few key medical demand analysis of the highlights of the data analysis, Make AI analysis platform, and join the big data procurement list, then the prospects of this small business will certainly be greatly improved.

Such as: You is traffic video systems integrators, also positioning themselves as system integrators, so you can’t always get rid of the competition is intense, the red sea, low profit if you cooperate with AI innovation technology company, then with the help of your industry all kinds of business resources flexible transformation ability, grasp and market and positioning itself into big data traffic and AI provider, If you look at your business environment, it’s a different story. This can not only form a business model, but also lay a good foundation for enterprises in the era of intelligent Internet.

How does the technical engineer of small and medium-sized enterprise break the bureau

The hardest thing for engineers these days is to choose between their own development and corporate sentiment. Enterprise development, no feelings, dry pain; Enterprise development limits, feelings, and will do not have confidence.

Essentially, how do you get access to better projects? For example, in high-concurrency and big data projects, engineers can not only get actual combat, practice and improvement, but also provide valuable suggestions for the rapid growth of enterprises. Meanwhile, they are not easy to be eliminated by social development. So I also give some analysis and methods.

Concurrent with big data technology problem in the first place, the ultimate solution to almost every distributed storage + real-time streaming solution, namely high concurrent request load into large throughput, orderly queue, simultaneous requests to reduce high concurrency of CPU, memory, I/O these resource contention fundamental problems, and then through the data fragmentation, Landing in different distributed nodes of the database to achieve the horizontal scalability of upSERT data.

Nginx, for example, no matter how much of a high-performance load group, scheduling to business processing, database query and write all is likely to be the bottleneck, so this time will need to request is the design of data queue transformation, for example by Kafka, RocketMQ, then by the current node fast process, and involves the repository connection, such as: Hbase, MongoDB, Kudu, etc. implement row-level transactions. Finally, there is the exercise: the required business requirements and business models refer directly to the business systems of your enterprise projects, and often the business complexity has little to do with big data and high concurrency. Data set Download data sets similar to your business from Google Kaggle, and then clean and transform them. This process can also practice data cleaning and ETL. Clean out the data you need and make it into a big data source. Kaggle really doesn’t have data similar to your business, so write your own simulation client program to create data and run for 3 days to see the results.

In the process of collecting their own data sources, try different combinations of kafka, Redis, hbase, TiDB, mysql, ELK, mongodb and Hadoop. Finally, three goals are achieved: 1. ACID transactions of data sets; 2. Distributed storage of data; 3. Data processing and query achieve sub-second response. In fact, in high concurrency environment, the big data transaction processing (OLTP) and data analysis (OLAP) architecture will not jump out of this architecture solution pattern.

The most important thing is that this method is suitable for the engineers of small data enterprises to practice their own skills, and they certainly have a more comprehensive grasp of the technical aspects than the engineers of large factories, but there is no denying that the actual combat knowledge of the engineers of large factories is much deeper.

Finally, after streaming processing, multi-stream aggregation is formed, and the library table business model is written. At this time, MySQL Cluster or NewSQL TiDB is selected to form a relational data model, and the data is landed in the distributed transaction environment.

Nginx+Redis +MySQL + TiDB (distributed database), MongoDB (partial service can replace MySQL) and Elasticsearch (search engine), basically this mode can support the vast majority of query responses.

Secondly, we take an example of blog post architecture, as shown in the figure below: how does the architecture operate under the condition of extremely high concurrency, with over 100 million data generated every day?

, often do the media friends will know, the process of editing will be frequently modified, the cloud will also continue to save draft, as a post top flow station, the size of such content frequently write is any relational database is a hit, you need to use big data platform of K – V database features, to realize the high-speed and massive articles editor and frequent operation.

Then the system always enters a review process after submission, and we always have to wait for seconds, minutes or even dozens of minutes. Why? Because the submitted content is being queued, the waiting time is determined by the speed at which the queue throughput of published and modified articles can be processed in a certain period. During queue processing, various machine AI algorithms are used to complete streaming filtering of article content, such as sensitive words, SH, content repetition, etc.

In the real sense of business relationship repository – the official release of the article, is the need to wait until the review, this time will form a real business consistency, is not this time has greatly reduced the high concurrency pressure of the relational database.

Finally, there is the exercise: the required business requirements and business models refer directly to the business systems of your enterprise projects, and often the business complexity has little to do with big data and high concurrency. Data set Download data sets similar to your business from Google Kaggle, and then clean and transform them. This process can also practice data cleaning and ETL. Clean out the data you need and make it into a big data source. Kaggle really doesn’t have data similar to your business, so write your own simulation client program to create data and run for 3 days to see the results.

In the process of collecting their own data sources, try different combinations of kafka, Redis, hbase, TiDB, mysql, ELK, mongodb and Hadoop. Finally, three goals are achieved: 1. ACID transactions of data sets; 2. Distributed storage of data; 3. Data processing and query achieve sub-second response. In fact, in high concurrency environment, the big data transaction processing (OLTP) and data analysis (OLAP) architecture will not jump out of this architecture solution pattern.

The most important thing is that this method is suitable for the engineers of small data enterprises to practice their own skills, and they certainly have a more comprehensive grasp of the technical aspects than the engineers of large factories, but there is no denying that the actual combat knowledge of the engineers of large factories is much deeper.

Read another detailed article on high concurrency and big data technologies:

The structure of blog site is gradually upgraded and optimized, and what is the structure of 100 million daily write volume?

Head over to read Byte Creation Center – learn more about Read Byte creation