Since the release of Greenplum 6.0, Greenplum has continued to iterate at a rate of one small release per month, providing users with new features and fixes. The current release is 6.7.1. We’ll be giving you a rundown of what’s new in each release of this series, and helping you review and preview what’s new in Greenplum.

Summary of New features

Greenplum 6.7.1 was released on April 30, 2020, with the following new features in cumulative updates since 6.0:

Greenplum 6.7

Added the gp_resource_group_queuing_timeout parameter

When using resource groups, this parameter specifies how long each transaction should wait before acquiring the resource. The default value is unrestricted wait.

The built-in Madlib version was upgraded to 1.17

  • Support the k – means

  • Enhanced deep learning

For more information, see madlib’s website

madlib.apache.org/

Greenplum 6.6

Data error logs for external tables support persistence

When creating a table, if LOG ERRORS is specified, the data error LOG will not be deleted when the external table is dropped. The error log can be emptied with the new function gp_TRUNCATE_PERSISTent_ERROR_log.

Upgrade PXF to 5.11.2

  • PXF does not check the write option of BATCH_SIZE when performing read operations

  • Updated jackson-Databind and Tomcat dependencies

Greenplum 6.5

Custom functions support the EXECUTE ON INITPLAN option

If you create a custom function that contains the EXECUTE ON INITPLAN parameter, the function will be executed ON the master node and save the result. Return this result when each Segment is up-scaled.

ORCA supports a new bitmap index cost calculation model

When set optimizer_cost_model = experimental, ORCA selects faster bitmap nested loop join when appropriate.

Upgrade Pl/Container to 3.0

  • Support Greenplum R

  • The number of PL/Container processes is reduced

  • More containers can be executed concurrently

  • Improved logging

Gpload added the max_retries option

If a network exception occurs while Gpload is working, Gpload will retry based on the value of this option.

Upgrade PXF to 5.11.1

  • The restart command is used to restart the PXF service

  • The PXF sync command supports the -d option to delete node data that is not in the PXF configuration

  • PXF supports the Parquet format for filtering conditions under push

  • Updated the built-in Guava and Hadoop2 dependency libraries

The Greenplum R client is supported

S3 external tables support deflate compression

Greenplum 6.4

The DISCARD ALL command is discarded

Resource groups can terminate queries that consume too much shared memory

Standby and mirror nodes on different subnets are supported

Greenplum 6.3

A new parameter wait_for_replication_threshold

This parameter specifies the maximum number of WAL logs to be written on the Primary node before synchronization to the mirror. This parameter can help provide synchronization performance when there is a mirror.

Description Upgrade PL/Container to 2.1.0

  • Python3 containers are supported

  • GluonTS support has been added to the Data Scientist module

Increased GPCC enable_query_profiling

This parameter, when turned on, collects queries from the GPPerfmon database by the GPmon user, as well as queries less than 10 seconds old.

PXF upgrade to 5.10.1

Greenplum 6.2

Materialized views are supported

Details can be found here:

GPDB. Docs. Pivotal. IO / 6-2 / admin_g…

Gpinitsystem supports the new ignore-Warnings option

Upgrade PXF to 5.10

  • Updated tomcat and Jackson dependency libraries

  • Support for JDBC Connector OR and NOT push-downs

  • Support for writing avro data to Hadoop

  • Hadoop 2.x and 3.1.x and Hive 2.x and 3.1 are supported

  • Supports different user configurations for different servers

  • Supports concurrent connection of multiple Kerberos-authenticated Hadoop clusters

GPSS updated to version 1.3.1

Greenplum 6.1

Updated the JDBC and ODBC driver versions for Datadirect

The JDBC version is 5.1.4.000270 (F000450.u000214), and the ODBC version is 07.16.0334 (B0510, U0363).

Greenplum Stream Server (GPSS) upgraded to 1.3.0

  • Support for log rotate

  • Conditional filtering of Kafka messages is supported

  • Allows resetting to load at a specific point in time (force-reset-timestamp)

  • Update and merge operations are supported

  • Kafka and Greenplum support Kerberos authentication

  • SSL encryption between Kafka, GPSS and Greenplum is supported

About Greenplum

Greenplum is a database product based on the MPP architecture that meets the needs of the next generation of big data warehouses and large-scale analysis tasks. By automatically partitioning data and executing queries on multiple nodes in parallel, it enables a database cluster containing hundreds of nodes to run as easily and reliably as a stand-alone version of a traditional database, while providing performance improvements of tens or even hundreds of times. In addition to traditional SQL, Greenplum supports MapReduce, text indexing, stored procedures, and many other analysis tools that can support data sizes from GB to PEtabyte.

Greenplum 6.0, which was released on September 4, 2019, is a kernel enhancement that updates its Postgres counterpart to V9.4, gaining more Postgres compatibility features. The processing capacity of OLTP load is greatly enhanced, which makes it more suitable for stream computing and HTAP scenarios. Other important updates to Greenplum 6.0 include support for replicated tables, online capacity expansion, disk quotas, support for the Zstandard compression algorithm, anda new high availability mechanism based on streaming replication.