This is the 27th day of my participation in Gwen Challenge
What is Spring Cloud Data Flow?
Spring Cloud Data Flow is a big Data manipulation tool that includes a toolkit for building pipelines for Data integration and real-time Data processing and is a redesign of Spring XD based on the native Cloud
purpose
Simplify the development of big data applications
Spring XD (eXtreme Data)
Is a big data product from Pivotal. It combines Spring Boot and Grails to form the execution part of the Spring IO platform.
Compare Spring Cloud Data Flow to Spring XD
Gone is the ZooKeeper-based runtime environment in Spring XD, replaced by a service delivery busIt is a hybrid computing model that combines the processing of stream data and batch data. Is a toolkit for building data integration and real-time data processing pipelines.
Spring Cloud Data Flow features
- Development using DSL, REST-apis, Dashboards, and drag-and-drop GUIs
- Independent creation, unit testing, troubleshooting, and management of microservices
- Build data channels quickly using stream and Task/Batch applications out of the box
- Use microservices as building blocks for Maven or Docker
- Expand the data channel without interrupting the data stream
- Orchestrate data-centric applications on a modern operating environment platform
- Using metrics, health Check manages each microservice application remotely
Spring Cloud Data Flow features
The core function of SCDF is Extract, Transform, Load
- Extract –> Source
- Transform –> Processor
- Load –> Sink
SCDF uses the Spring Cloud Stream module. Stream creates and runs messaging microservices in the form of Spring Boot applications so that they can be deployed on different platforms, run independently, and interact with each other.
SCDF can act as a kind of glue when creating data pipelines using the Spring Cloud Stream module, aiming to provide a management service model that is designed to streamline the engineering of data projects and allow developers to focus on specific problems and problem analysis
The major components
The main running components of the CDF include Data Flow Server and Skipper Server.
- The run data
Save in mainstream relational databases such as MySQL, PostgreSQL, Oracle, DB2, SQLServer, etc
- Stream processing mode
Rely on RabbitMQ or Kafka.
Today’s summary
I need to go to the hospital in the next few days, and I don’t think I have much time to learn. Today, I have learned a new knowledge, Spring Cloud Data Flow, which is not much, but just a general knowledge, but not complete. I will make persistent efforts to improve it later, and you can point out any mistakes.