Quick to iterate, Greenplum6 continues to surprise you

Since the release of Greenplum 6.0, Greenplum has continued to iterate at a rate of one small release per month, providing users with new features and fixes. The current release is 6.7.1. We’ll be giving you a rundown of what’s new in each release of this series, and helping you review and preview what’s new in Greenplum.

Summary of New features

Greenplum 6.7.1 was released on April 30, 2020, with the following new features in cumulative updates since 6.0:

Greenplum 6.7

Added the gp_resource_group_queuing_timeout parameter

When using resource groups, this parameter specifies how long each transaction should wait before acquiring the resource. The default value is unrestricted wait.

The built-in Madlib version was upgraded to 1.17

Support the k – means
Enhanced deep learning

For more information, see madlib’s website

madlib.apache.org/

Greenplum 6.6

Data error logs for external tables support persistence

When creating a table, if LOG ERRORS is specified, the data error LOG will not be deleted when the external table is dropped. The error log can be emptied with the new function gp_TRUNCATE_PERSISTent_ERROR_log.

Upgrade PXF to 5.11.2

PXF does not check the write option of BATCH_SIZE when performing read operations
Updated jackson-Databind and Tomcat dependencies

Greenplum 6.5

Custom functions support the EXECUTE ON INITPLAN option

If you create a custom function that contains the EXECUTE ON INITPLAN parameter, the function will be executed ON the master node and save the result. Return this result when each Segment is up-scaled.

ORCA supports a new bitmap index cost calculation model

When set optimizer_cost_model = experimental, ORCA selects faster bitmap nested loop join when appropriate.

Upgrade Pl/Container to 3.0

Support Greenplum R
The number of PL/Container processes is reduced
More containers can be executed concurrently
Improved logging

Gpload added the max_retries option

If a network exception occurs while Gpload is working, Gpload will retry based on the value of this option.

Upgrade PXF to 5.11.1

The restart command is used to restart the PXF service
The PXF sync command supports the -d option to delete node data that is not in the PXF configuration
PXF supports the Parquet format for filtering conditions under push
Updated the built-in Guava and Hadoop2 dependency libraries

The Greenplum R client is supported

S3 external tables support deflate compression

Greenplum 6.4

The DISCARD ALL command is discarded

Resource groups can terminate queries that consume too much shared memory

Standby and mirror nodes on different subnets are supported

Greenplum 6.3

A new parameter wait_for_replication_threshold

This parameter specifies the maximum number of WAL logs to be written on the Primary node before synchronization to the mirror. This parameter can help provide synchronization performance when there is a mirror.

Description Upgrade PL/Container to 2.1.0

Python3 containers are supported
GluonTS support has been added to the Data Scientist module

Increased GPCC enable_query_profiling

This parameter, when turned on, collects queries from the GPPerfmon database by the GPmon user, as well as queries less than 10 seconds old.

PXF upgrade to 5.10.1

Greenplum 6.2

Materialized views are supported

Details can be found here:

GPDB. Docs. Pivotal. IO / 6-2 / admin_g…

Gpinitsystem supports the new ignore-Warnings option

Upgrade PXF to 5.10

Updated tomcat and Jackson dependency libraries
Support for JDBC Connector OR and NOT push-downs
Support for writing avro data to Hadoop
Hadoop 2.x and 3.1.x and Hive 2.x and 3.1 are supported
Supports different user configurations for different servers
Supports concurrent connection of multiple Kerberos-authenticated Hadoop clusters

GPSS updated to version 1.3.1

Greenplum 6.1

Updated the JDBC and ODBC driver versions for Datadirect

The JDBC version is 5.1.4.000270 (F000450.u000214), and the ODBC version is 07.16.0334 (B0510, U0363).

Greenplum Stream Server (GPSS) upgraded to 1.3.0

Support for log rotate
Conditional filtering of Kafka messages is supported
Allows resetting to load at a specific point in time (force-reset-timestamp)
Update and merge operations are supported
Kafka and Greenplum support Kerberos authentication
SSL encryption between Kafka, GPSS and Greenplum is supported

About Greenplum

Greenplum is a database product based on the MPP architecture that meets the needs of the next generation of big data warehouses and large-scale analysis tasks. By automatically partitioning data and executing queries on multiple nodes in parallel, it enables a database cluster containing hundreds of nodes to run as easily and reliably as a stand-alone version of a traditional database, while providing performance improvements of tens or even hundreds of times. In addition to traditional SQL, Greenplum supports MapReduce, text indexing, stored procedures, and many other analysis tools that can support data sizes from GB to PEtabyte.

Greenplum 6.0, which was released on September 4, 2019, is a kernel enhancement that updates its Postgres counterpart to V9.4, gaining more Postgres compatibility features. The processing capacity of OLTP load is greatly enhanced, which makes it more suitable for stream computing and HTAP scenarios. Other important updates to Greenplum 6.0 include support for replicated tables, online capacity expansion, disk quotas, support for the Zstandard compression algorithm, anda new high availability mechanism based on streaming replication.

Quick to iterate, Greenplum6 continues to surprise you

Summary of New features

Greenplum 6.7

Greenplum 6.6

Greenplum 6.5

Greenplum 6.4

Greenplum 6.3

Greenplum 6.2

Greenplum 6.1

Related Posts

Python Jinja2, a program that automatically writes code

Want to understand the probability graph model? You need to understand the basic definition and form of graph theory

Baklib is a personal note-taking wizard