This is the 27th day of my participation in Gwen Challenge

What is Spring Cloud Data Flow?

Spring Cloud Data Flow is a big Data manipulation tool that includes a toolkit for building pipelines for Data integration and real-time Data processing and is a redesign of Spring XD based on the native Cloud

purpose

Simplify the development of big data applications

Spring XD (eXtreme Data)

Is a big data product from Pivotal. It combines Spring Boot and Grails to form the execution part of the Spring IO platform.

Compare Spring Cloud Data Flow to Spring XD

Gone is the ZooKeeper-based runtime environment in Spring XD, replaced by a service delivery busIt is a hybrid computing model that combines the processing of stream data and batch data. Is a toolkit for building data integration and real-time data processing pipelines.

Spring Cloud Data Flow features

  • Development using DSL, REST-apis, Dashboards, and drag-and-drop GUIs
  • Independent creation, unit testing, troubleshooting, and management of microservices
  • Build data channels quickly using stream and Task/Batch applications out of the box
  • Use microservices as building blocks for Maven or Docker
  • Expand the data channel without interrupting the data stream
  • Orchestrate data-centric applications on a modern operating environment platform
  • Using metrics, health Check manages each microservice application remotely

Spring Cloud Data Flow features

The core function of SCDF is Extract, Transform, Load

  • Extract –> Source
  • Transform –> Processor
  • Load –> Sink

SCDF uses the Spring Cloud Stream module. Stream creates and runs messaging microservices in the form of Spring Boot applications so that they can be deployed on different platforms, run independently, and interact with each other.

SCDF can act as a kind of glue when creating data pipelines using the Spring Cloud Stream module, aiming to provide a management service model that is designed to streamline the engineering of data projects and allow developers to focus on specific problems and problem analysis

The major components

The main running components of the CDF include Data Flow Server and Skipper Server.

  • The run data

Save in mainstream relational databases such as MySQL, PostgreSQL, Oracle, DB2, SQLServer, etc

  • Stream processing mode

Rely on RabbitMQ or Kafka.

Today’s summary

I need to go to the hospital in the next few days, and I don’t think I have much time to learn. Today, I have learned a new knowledge, Spring Cloud Data Flow, which is not much, but just a general knowledge, but not complete. I will make persistent efforts to improve it later, and you can point out any mistakes.