What is a big data architect: A system-level developer of big data platforms, familiar with the core framework of mainstream big data platforms such as Hadoop, Spark and Storm. Have an in-depth knowledge of how to write MapReduce jobs and manage job flows, complete data calculation, and be able to use general algorithms provided by Hadoop, and master components of the entire Hadoop ecosystem, such as: Yarn, HBase, Hive, Pig and other important components can realize the development of platform monitoring and auxiliary operation and maintenance system. Now we recommend a big data resource sharing group: 593–188–212, there will be learning routes, related learning materials, algorithm learning materials, free live video classes, and so on. Now I want to share with you the following four points that you must master to become a big data architect
1. Why do you need to build a data architecture
Inconsistent data standards (same column names, different data types, different lists of the same data types, different lengths, no unified standards for column names, different definitions of column names, different types, different lengths, same Chinese names, different English abbreviations, same English abbreviations, different Chinese names)
Data standardization management (building dynamic word database, automatic detection of standard compliance, automatic application of standard, process of standard management, building basic content of knowledge base, providing unified standard for big data application)
Implementation results of standardized management, verification mechanism (application, verification, standard knowledge base, inspection result report)
Automatic application of standard objects (convert logical data model to corresponding physical model based on underlying knowledge base, automatic transformation)
Messy data model management (poor validation, different logical structure, repeatedly adding the same table structure, real table fields not annotated) – design, validation, extension
Data model skeleton
Data model optimization (database parameter optimization 10%, Hint optimization 30%, index and SQL optimization 50%, data model optimization 80%);
Poor SQL statement writing leads to serious performance issues (lack of familiarity with developer execution plans);
Audit system before launch (carried out in parallel with the test work before launch, and captured SQL and execution plan)
Lack of relatively complex data processing ability
Data quality inspection requires the implementation of data quality management (quality standard and diagnostic object definition, analysis, BR definition, data quality diagnosis, data quality improvement);
Former Big Data Architect of Alibaba: How to quickly grow into an excellent Big data architect
Development requirements, Application Architecture, Operation and Maintenance Architecture, Technical planning — Data architecture, business architecture, technical architecture, application architecture;
Data architecture management object, data architecture management process, data architecture management organization, management system (data quality management system, configuration management system);
Data architecture roles and people
Data and the role of architecture department (data architecture (building data structure, management system, standardized), the data model (concept, physical and logical model design, training), application development, development and technical support, shape management, write the core SQL), data migration, migration technical support, testing and optimization (TUNING, the optimal design index, is proposed Problem solution))
Former Big Data Architect of Alibaba: How to quickly grow into an excellent Big data Architect
Norms, policies, management, standard management, structure management, audit and management, enterprise information system;
Personnel training, organization formation, tool procurement, management identification (attention from the top);
Resistance from development (design disputes), operation and maintenance (technical objections), leadership (short-term results are difficult to see);
Data architecture is an important part of enterprise architecture (development, application, technology, data);
Internal work processes (login, demand and the demand for appropriate technical explanation, proper understanding model, data needs to redesign and change the data model, data architects to model audit and examination and approval, automatically generated DDL (DBA), based on business rules checking data quality influence, error analysis and data cleaning, related program analysis) : People + rules + technology;
Former Big Data Architect of Alibaba: How to quickly grow into an excellent Big Data Architect
Cognition (definition, job, ability, position and career);
Learning (pathways, training and books, experiential learning, related activities);
Actual practice (theory landing, expanding influence, spark), can start from model audit and SQL optimization;
Mature (stable data architecture in enterprise system design, development, operation and maintenance position, forming a four-sided confrontation);
Learning techniques (Enterprise Architecture, data quality management, data requirements analysis, data standardization, data modeling, database design and application)