This is the second day of my participation in Gwen Challenge
StreamSets Data Collector TM is a lightweight, powerful design and execution engine that transmits Data in real time. Use a data collector to route and process data in a data stream.
1. Low code has a picture and a truth
- Simple visualization
- Configuration-based components
Use 2 to configure component parameters.
- Wysiwyg debugging
During debugging, you can view the incoming and outgoing data for each component directly through 2.
- Runtime monitoring
You can browse statistics quickly with 2,3.
- Fault snapshot
- Automatic error collection
2. Install and download
Would you like to try it? , the installation is very simple, but the download of foreign resources is slow, I have prepared the download of resources here, we can collect the use.
Link: https://pan.baidu.com/s/1Jh8fgZV7hUCpHV0LqGNn_A Numbers: 2 LPDCopy the code
Contains building tutorials for a variety of cases.
3. Installation procedure
The current version: 3.22.2 | release date: on May 4, 2021 |, according to the official news, 4.0 has been in development, the current product level version must be installed on Linux, if want to try on Windows, can download the beta to the official.
- Prepare the environment after downloading.
- Download and install OpenJDK 8 or Java 8 JDK. (You must have a Java 8 JDK, not a Java 8 JRE.)
- Open the terminal and set the file descriptor limit to at least 32768.
- Extract the compressed package by running the following command.
Tar XVZF streamsets datacollector - common - 3.22.2. TGZ
-
After successfully extracting the compressed package, change the folder to the root directory of the installation. CD streamsets datacollector — 3.22.2
-
To start the Data Collector, use the following command. bin/streamsets dc
-
In the browser, enter the URL displayed in the terminal window. (for example, http://10.0.0.100:18360)
-
If you have not already logged into your account, you will need to.
-
You will be asked to link the data collector to your account.
-
After linking, the StreamSets data collector is installed.
4. Have fun
By building pipelines, you can have fun playing with various data sources and targets.
Amazon S3 Amazon SQS Consumer Azure Data Lake Storage Gen1 (not recommended) Azure Data Lake Storage Gen2 Azure IoT/Event Center User CoAP Server Cron Scheduler directory Flexible search file tail Google BigQuery Google Cloud Storage Google Pub/Sub Subscriber Groovy script gRPC client Hadoop FS independent HTTP client HTTP server Jython script Kafka Multitopic consumer movement consumer MapR DB CDC MapR DB JSON MapR FS standalone MapR Multitopic Streams consumer MapR Stream consumer MongoDB MongoDB Oplog MQTT subscriber MySQL binary log OPC UA client Oracle Load Oracle CDC client PostgreSQL CDC client in batches Pulsar consumer RabbitMQ consumer Redis consumer REST service Sales force SAP HANA query user SFTP/FTP/FTPS client SQL Server 2019 BDC multi-table user SQL Server CDC client SQL Server Change tracing Start System Specifications TCP Server UDP Multi-threaded source UDP source WebSocket client WebSocket Server Windows event log Amazon S3 Azure Data Lake Storage Gen2 Azure Event Hub Producer Azure IoT Hub Producer Azure Synapse SQL Cassandra CoAP Client Couchbase Databricks Delta Lake Einstein Analytics Elasticsearch Flume (deprecated) Google BigQuery Google Bigtable Google Cloud Storage Google Pub/Sub Publisher Hadoop FS HBase Hive Metastore HTTP Client InfluxDB JDBC Producer JMS Producer Kafka Producer Kinesis Firehose Kinesis Producer Kudu Local FS MapR DB MapR DB JSON MapR FS MapR Streams Producer MongoDB MQTT Publisher Named Pipe Pulsar Producer RabbitMQ Producer Redis Salesforce Send Response to Origin SFTP/FTP/FTPS Client Snowflake Solr Splunk SQL Server 2019 BDC Bulk Loader Syslog To Error Trash WebSocket Client
5. Summary
If you have any usage problems, check out my previous tutorial series. You can always ask me!