On December 12, 2019, Greenplum released GP6.2.1 to coincide with the company’s GP cluster expansion and upgrade, so gp6 was installed to test against GP5. This document is installed step by step according to the official documentation. The documentation lists the differences between gp6 installations and older versions.

Time: 20191217 installed version: greenplum 6.2.1 download address: https://network.pivotal.io/products/pivotal-gpdb/#/releases/526878Official installation documentation: https://gpdb.docs.pivotal.io/6-2/install_guide/platform-requirements.html in Chinese community installation documentation: https://greenplum.cn/2019/11/30/how-to-set-up-greenplum-6-1-cluster/Copy the code

1. Software and hardware description and necessary installation

1.1 Software and Hardware Description

  1. System version: Redhat6.8
  2. Hardware: 3 VMS, 2 cores, 16 GB memory, and 50 GB hard disks
  3. Plan one master node, four segments, four mirrors, and no standby node
Host IP host node planning 172.28.25.201 MDW master 172.28.25.202 sdw1 seg1, seg2, mirror3, mirror4 172.28.25.203 sdw2 seg3,seg4,mirror1,mirror2Copy the code

1.2 Required Installation

Differences with earlier versions GP4. x None Installation dependency check Procedure GP5. x RPM installation dependency check gp6.2 RPM installation dependency check Before installing the gp6. X RPM version, you need to check the software dependency. During the installation, you need to connect to the Internet. If the installation is on an Intranet computer, you need to download the corresponding package.Copy the code

1.2.1 Installing Dependency Packages in Batches (Networking Required)

Greenplum 5 uses the RPM command, while Greenplum 6 installs dependencies directly with yum install.

sudo yum install -y apr apr-util bash bzip2 curl krb5 libcurl libevent libxml2 libyaml zlib  openldap openssh openssl openssl-libs perl readline rsync R sed tar zip krb5-devel 
Copy the code

1.2.2 Manual download is required on the Intranet and then upload to the server

Note: Bit of OS version, for example, EL6.x86_64

[root@mdw ~]# uname -aLinux MDW 2.6.32-642. The el6. X86_64#1 SMP Wed Apr 13 00:51:26 EDT 2016 
x86_64 x86_64 x86_64 GNU/Linux
Copy the code

Download: rpmfind.net/linux/rpm2h…

1.2.3 Offline Download in Linux

Conditions: 1. The OPERATING system of the same version as that of the GP cluster to be installed 2

yumdownloader --destdir ./ --resolve libyaml 
Copy the code

2 Set system parameters

## Different from the old versionGp6 the gpcheck tool does not exist, but system parameters are checked in the gpinitsystem link. If you do not change the recommended parameters, cluster installation will not be affected and cluster performance will be affectedCopy the code
  • You need to modify system parameters as user root and restart the system after modifying the parameters. You can also restart the system after modifying the parameters.
  • You are advised to modify the parameters of the master host first. After the MASTER GP is installed, access SSH, and use GPSCP and GPSSH to modify system parameters of other nodes in batches
  • IO /6-2/install…

2.1 Disabling the Firewall

2.1.1 Checking Security-Enhanced Linux (SElinux)

View information as user root

[root@mdw ~]# sestatus
SELinux status:                 disabled
Copy the code

If SELinux status! = disabled, modify /etc/selinux/config and restart the system.

SELINUX=disabled
Copy the code

2.1.2 Checking the Iptables status

[root@mdw ~]# /sbin/chkconfig --list iptables
iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off
Copy the code

If the status is not closed, modify it and then restart the system (you can reset the system after adjusting the parameters).

/sbin/chkconfig iptables off
Copy the code

2.1.3 checking firewalld (centos6 generally does not exist)

[root@mdw ~]# systemctl status firewalld
Copy the code

If Firewalld is off, output

* firewalld.service - firewalld - dynamic firewall daemon
  Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; 
vendor preset: enabled)
  Active: inactive (dead)
Copy the code

If the status is not closed, modify it and then restart the system (you can reset the system after adjusting the parameters).

[root@mdw ~]# systemctl stop firewalld.service
[root@mdw ~]# systemctl disable firewalld.service
Copy the code

2.2 configure the host

2.2.1 Configuring each Host

Configure the master hostname as MDW. Other segment host hostname is not mandatory. Example Change the host name of each host.

The general naming rules are as follows:

  • Master :mdw
  • Standby Master :smdw
  • Segment Host :sdw1, SDw2… sdwn

Modify operation:

# temporary changes
hostname mdw
# permanent change
vi /etc/sysconfig/network
Copy the code

2.2.2 configuration/etc/hosts

Add IP and alias for each machine
[root@mdw ~]# cat /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4. Localdomain4 ::1 localhost localhost.localdomain Localhost6 localHost6. Localdomain6 172.28.25.201 MDW 172.28.25.202 SDw1 172.28.25.203 SDw2Change the hosts file for all hosts in the cluster, log in to each host, and execute the following statement:Cat >> /etc/hosts << EOF 172.28.25.201 MDW 172.28.25.202 SDw1 172.28.25.203 SDw2 EOFCopy the code

2.3 configuration sysctl. Conf

Modify the system parameters according to the actual situation of the system (the default values are officially given before GP 5.0, and some calculation formulas are given after 5.0).

Reloading parameters (sysctl -p) :

Shmall = _PHYS_PAGES / 2 # See Shared Memory Pages #
kernel.shmall = 4000000000
Shmmax = kernel.shmall * PAGE_SIZE
kernel.shmmax = 500000000
kernel.shmmni = 4096
vm.overcommit_memory = 2 # See Segment Host Memory
vm.overcommit_ratio = 95 # See Segment Host Memory

net.ipv4.ip_local_port_range = 10000 65535 # See Port Settings Port Settingskernel.sem = 500 2048000 200 40960 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536  kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 1 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.core.netdev_max_backlog = 10000 net.core.rmem_max = 2097152 net.core.wmem_max = 2097152 vm.swappiness = 10 vm.zone_reclaim_mode = 0 vm.dirty_expire_centisecs = 500 vm.dirty_writeback_centisecs = 100 vm.dirty_background_ratio = 0# See System Memory
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296
Copy the code

2.3.1 Shared Memory

  • kernel.shmall = _PHYS_PAGES / 2
  • kernel.shmmax = kernel.shmall * PAGE_SIZE
[root@mdw ~]# echo $(expr $(getconf _PHYS_PAGES) / 2)
2041774 
[root@mdw ~]# echo $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
8363106304
Copy the code

2.3.2 Host Memory

  • ** VM. Overcommit_memory: ** The system uses this parameter to determine how much memory can be allocated to the process. For GP databases, this parameter should be set to 2.
  • Vm. overcommit_ratio: The percentage allocated for a process, leaving the rest to the operating system. On Red Hat, the default value is 50. 95 is recommended.
# calculation vm. Overcommit_ratioOvercommit_ratio = (RAM-0.026*gp_vmem)/RAMCopy the code

2.3.3 Port Settings

To avoid port conflicts with other applications during Greenplum initialization, specify the port range net.ipv4.ip_local_port_range.

When initializing Greenplum using gpinitSystem, do not specify the Greenplum database port in this scope.

For example, if net.ipv4.ip_local_port_range = 10000 65535, set the Greenplum database base port number to these values.

PORT_BASE = 6000
MIRROR_PORT_BASE = 7000
Copy the code

2.3.4 System Memory If the system memory is greater than 64 GB, the following configuration is recommended

vm.dirty_background_ratio = 0
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736 # 1.5 GB
vm.dirty_bytes = 4294967296 # 4GB
Copy the code

If the system memory is smaller than or equal to 64GB, remove vm.dirty_background_bytes and set the following parameters:

vm.dirty_background_ratio = 3
vm.dirty_ratio = 10
Copy the code

Add vm.min_free_kbytes to ensure that the network and storage driver PF_MEMALLOC are allocated. This is especially important for large memory systems. On general systems, the default value is usually too low. You can use the awk command to calculate the value of vm.min_free_kbytes, which is usually 3% of the recommended system physical memory:

awk 'BEGIN {OFMT = "%.0f"; } /MemTotal/ {print "vm.min_free_kbytes =", $2 * .03; } ' /proc/meminfo >> /etc/sysctl.conf 
Copy the code

Do not set vm.min_free_kbytes to more than 5% of system memory, as this may result in insufficient memory

Redhat 6.8 is used in this experiment, and the memory is 16 gb. The configuration is as follows:

[root@mdw ~]# vi /etc/sysctl.conf
[root@mdw ~]# sysctl -pkernel.shmall = 2041774 kernel.shmmax = 8363106304 kernel.shmmni = 4096 vm.overcommit_memory = 2 vm.overcommit_ratio = 95 net.ipv4.ip_local_port_range = 10000 65535 kernel.sem = 500 2048000 200 40960 kernel.sysrq = 1 kernel.core_uses_pid =  1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 1 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.core.netdev_max_backlog = 10000 net.core.rmem_max = 2097152 net.core.wmem_max = 2097152 vm.swappiness = 10 vm.zone_reclaim_mode = 0 vm.dirty_expire_centisecs = 500 vm.dirty_writeback_centisecs = 100 vm.dirty_background_ratio = 3 vm.dirty_ratio = 10Copy the code

2.4 System Resource Restrictions

Modify/etc/security/limits. Conf, add the following parameters:

* soft nofile 524288
* hard nofile 524288
* soft nproc 131072
* hard nproc 131072
Copy the code
  • The asterisk (*) indicates all users
  • Noproc represents the maximum number of processes
  • Nofile indicates the maximum number of open files
  • RHEL/CentOS 6 changes: / etc/security/limits. D / 90 – nproc. Nproc conf file for 131072
  • RHEL/CentOS 7 changes: / etc/security/limits. D / 20 – nproc. Nproc conf file for 131072
[root@mdw ~]# cat /etc/security/limits.d/90-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.

*          soft    nproc     131072
root       soft    nproc     unlimited
Copy the code

The Linux module pam_limits sets user limits by reading limits.conf files. The ulimit -u command displays the maximum number of processes available to each user. Max user processes Verify that the return value is 131072.

2.5 XFS mount options

Compared with ext4, XFS has the following advantages:

  1. The scalability of XFS is significantly better than that of ext4, and the performance of ext4 deteriorates significantly when the number of single file directories exceeds 200W
  2. Ext4 is very stable as a traditional file system, but as storage requirements get bigger and bigger, ext4 doesn’t fit in
  3. Due to historical disk problems, ext4 is limited to a maximum of 4 billion inodes (32 bits), with a maximum of 16 TERabytes for a single file
  4. XFS uses 64-bit management space and EB file system size. XFS manages metadata based on B+Tree

GP uses the XFS file system. RHEL/CentOS 7 and Oracle Linux use XFS as the default file system. SUSE/openSUSE has long supported XFS. Because the VM has only one disk and it is a system disk, the file system cannot be changed. Hanging on XFS is skipped here.

Host IP host node planning 172.28.25.201 MDW master 172.28.25.202 sdw1 seg1, seg2, mirror3, mirror4 172.28.25.203 sdw2 seg3,seg4,mirror1,mirror2Copy the code

For example, mount a new XFS:

1. Partitioning and formatting:

Differences with earlier versions GP4. x None Installation dependency check Procedure GP5. x RPM installation dependency check gp6.2 RPM installation dependency check Before installing the gp6. X RPM version, you need to check the software dependency. During the installation, you need to connect to the Internet. If the installation is on an Intranet computer, you need to download the corresponding package.Copy the code

2. Add it to the /etc/fstab file

/dev/sda3 /data xfs rw,noatime,inode64,allocsize=16m 1 1
Copy the code

For more information on XFS:

  • Blog.csdn.net/marxyong/ar…
  • Access.redhat.com/documentati…

2.6 Disk I/O Settings

Disk file prefetch: 16384. Disk directories on different systems are different. You can use LSBLK to check whether the disk is hanging

[root@mdw ~]# lsblkNAME MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB MB ├ ─ VolGroup - lv_swap (dm) 0 253, 0 0 0 4 G LVM/SWAP └ ─ VolGroup - lv_root (dm - 1), 253:45.8 G 1 0 0 LVM/sr0 11:0 1 1024 m 0 ROM# Effective this time
[root@mdw ~]# /sbin/blockdev --setra 16384 /dev/sda
[root@mdw ~]# /sbin/blockdev --getra /dev/sda16384
Add the above script to /etc/rc.d/rc.local
Copy the code

2.7 Modifying the RC. local Permission

The rc.local file must be running on startup. For example, set file execution permission on RHEL/CentOS 7.

chmod + x /etc/rc.d/rc.local
Copy the code

2.8 Disk I/O Scheduling Algorithm

[root@mdw ~]# more /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]
[root@mdw ~]# echo deadline > /sys/block/sda/queue/scheduler
Copy the code

This does not take effect permanently and requires a reset each time you restart.

RHEL 6.x or CentOS 6.x You can change /boot/grub/grub.conf to add elevator=deadline for example:

For RHEL 7.x or CentOS 7.x, use gruB2. You can use the system tool Grubby to change the value.

grubby --update-kernel=ALL --args="elevator=deadline"
Use the following command to check after the restart
grubby --info=ALL
Copy the code

2.9 Transparent Huge Pages (THP)

Disable THP because it degrades the Greenplum database performance. RHEL 6.x or CentOS 6.x or later THP is enabled by default.

One way to disable THP on RHEL 6.x is to add the parameter transparent_hugePage =never to /boot/grub/grub.conf:

Kernel /vmlinuz-2.6.18-274.3.1. El5 ro root=LABEL=/ elevator=deadline crashkernel=128M@16M quiet console=tty1 Console =ttyS1,115200 Panic =30 Transparent_hugePage =never initrd /initrd-2.6.18-274.3.1.el5.imgCopy the code

For RHEL 7.x or CentOS 7.x, use grub2. You can use the system tool Grubby to change the value.

Grubby --update-kernel = ALL --args = "transparent_hugePage =never"After adding parameters, restart the system.
# check parameters:
cat /sys/kernel/mm/*transparent_hugepage/enabled
always [never]
Copy the code

2.10 the IPC Object Removal

Disable IPC Object removal for RHEL 7.2 or CentOS 7.2, or Ubuntu. The default systemd setting RemoveIPC=yes removes IPC connections when non-system user accounts log out. This causes the Greenplum Database utility gpinitsystem to fail with semaphore errors. Perform one of the following to avoid this issue.

  • When you add the gpadmin operating system user account to the master node in Creating the Greenplum Administrative User, create the user as a system account.
  • Disable RemoveIPC. Set this parameter in /etc/systemd/logind.conf on the Greenplum Database host systems.
RemoveIPC=no
Copy the code

The setting takes effect after restarting the systemd-login service or rebooting the system. To restart the service, run this command as the root user.

service systemd-logind restart
Copy the code

2.11 SSH Connection Threshold

Gpexpand, gPinitSystem, and gpaddmirrors in the Greenplum database manager use SSH connections to perform tasks. In a large Greenplum cluster, the number of SSH connections for a program may exceed the maximum threshold for unauthenticated connections for the host. When this happens, you receive the following error:

Ssh_exchange_identification: Connection closed by remote host.Copy the code

To avoid this, update the MaxStartups and MaxSessions parameters in /etc/ssh/sshd_config or /etc/sshd_config files

MaxStartups 200
MaxSessions 200
Copy the code

Restart the SSHD for the parameters to take effect

service sshd restart
Copy the code

2.12 Synchronizing time with a Cluster (NTP)

Conf on the master server to configure the clock server as the NTP server in the data center. If no, change the time of the master server to the correct time, and then change the /etc/ntp.conf file of other nodes to synchronize the time of the master server.

vi /etc/ntp.conf
# add the prefix to server
server mdw prefer  # priority primary node
server smdw        If there is no standby node, it can be configured as the clock server in the data center
service ntpd restart  Restart the NTP service
Copy the code

2.13 Checking character Sets

[root@mdw greenplum-db]# echo $LANG
en_US.UTF-8
Copy the code

Otherwise, modify /etc/sysconfig/language to add RC_LANG= en_us.utF-8

2.14 Creating user gpadmin

 # Differences from older versionsX /gp5.x you can create a gpamdin user by using the -u parameter when gpseginstall is installedCopy the code

Create a gpadmin user on each node to manage and run the GP cluster, preferably with sudo permission. After the GP of the active node is installed, use GPSSH to create the GPSSH on other nodes in batches. Example:

[root@mdw ~]# groupadd gpadmin
[root@mdw ~]# useradd gpadmin -r -m -g gpadmin
[root@mdw ~]# passwd gpadmin
Copy the code

3. Install cluster software

Reference: GPDB. Docs. Pivotal. IO / 6-2 / install…

  # Differences from older versionsGp4.x /gp5.x installation was previously divided into four parts: 1. Install master (usually a bin executable, install, Gpseginstall Installs each SEG. Gp group parameter verification. Gpinitsystem Initializing GP6.2. Install master(RPM -ivh/yum install -y) to /usr/local/ 2. Gp6 does not have gpseginstall. So either package the gp directory installed by master and upload it to SEG, or yum install it separately for each node. 2. Package the installation directory of the primary node and distribute it to the SEG host. Verify cluster performance. 4. Gpinitsystem Initialize the clusterCopy the code

3.1 Running the Installation program

Install /usr/local/ by default.Yum install - y. / greenplum db - 6.2.1 - rhel6 - x86_64. RPMOr use RPM to installThe RPM - the ivh greenplum db - 6.2.1 - rhel6 - x86_64. RPMCopy the code

This test is an Intranet machine, so all the dependent packages cannot be downloaded online, nor have they been downloaded from the Internet in advance. It’s about downloading what’s missing when you install it. Libyaml is now missing, download and upload to the server, install and run the GP installer again. Libyaml download address rpmfind.net/linux/rpm2h…

[root@mdw gp_install_package]# yum install -y./greenplum-db-6.2.1-rhel6-x86_64. RPMLoaded plugins: product-id, refresh-packagekit, search-disabled-repos, security, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager To register. Setting up Install Process temp./greenplum-db-6.2.1-rhel6-x86_64. RPM: RPM to be installedResolving Dependencies --> Running Transaction Check --> Package greenplum-db.x86_64 0:6.2.1-1.el6 will be installed --> Finished Dependency Resolution Dependencies Resolved = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =============================== Package Arch Version Repository Size = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =============================== Installing: Greenplum-db x86_64 6.2.1-1.el6 /greenplum-db-6.2.1-rhel6-x86_64 493 M Transaction Summary = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =============================== Install 1 Package(s) Total size: 493 M Installed size: 493 M Downloading Packages: Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Warning: RPMDB altered outside of yum. Installing: greenplum-db-6.2.1-1.el6.x86_64 1/1 Verifying: X86_64 1/1 Installed: Greenplum-db.x86_64 0:6.2.1-1.el6 Complete!Copy the code

3.2 create hostfile_exkeys

Create two host files (all_host and seg_host) in the $GPHOME directory for subsequent use of GPSSH, GPSCP and other script host parameter files

  • All_host: indicates the names or IP addresses of all hosts in the cluster, including master, segment, and standby.
  • Seg_host: Content is all segment host names or IP addresses

If a machine has multiple nics and the nics are not bound in bond0 mode, the IP addresses and hosts of the nics must be listed.

[root@mdw ~]# cd /usr/local/
[root@mdw local]# lsBin etc games greenplum-db greenplum-db-6.2.1 include lib lib64 libexec openssh-6.5p1 sbin share SRC SSL [root@mdwlocal]# cd greenplum-db
[root@mdw greenplum-db]# ls
bin  docs  etc  ext  greenplum_path.sh  include  lib  open_source_license_pivotal_greenplum.txt  pxf  sbin  share
[root@mdw greenplum-db]# vi all_host
[root@mdw greenplum-db]# vi seg_host
[root@mdw greenplum-db]# cat all_host
mdw
sdw1
sdw2
[root@mdw greenplum-db]# cat seg_host
sdw1
sdw2
Copy the code

Modifying folder permissions

[root@mdw greenplum-db]# chown -R gpadmin:gpadmin /usr/local/greenplum*
Copy the code

3.3 Cluster trust, no secret login

## Different from the old versionGp6. x previously did not need 3.3.1 ssh-keygen to generate a key,3.3.2 ssh-copy-id step, gp5.x directly gpssh-exkeys-fAll_host.Copy the code

3.3.1 Generating a Key

# I don't have a public/private key pair for Linux yet, so make one
[root@gjzq-sh-mb greenplum-db]# ssh-keygen
Copy the code

3.3.2 Copy the local public key to the authorized_keys file of each node machine

[root@mdw ~]# sestatus
SELinux status:                 disabled
Copy the code

3.3.3 Use gpssh-exkeys to enable n-N’s secret-free login

SELINUX=disabled
Copy the code

We verify GPSSH

The official verification directory is /usr/local/greenplum-db-.

[root@mdw ~]# /sbin/chkconfig --list 
iptablesiptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off
Copy the code

/usr/local/greenplum-db- = /usr/local/greenplum-db- = /usr/local/greenplum-db- = /usr/local/greenplum-db- = /usr/local/greenplum-db- = /usr/local/greenplum-db- = /usr/local/greenplum-db- = /usr/local/greenplum-db-

/sbin/chkconfig iptables off
Copy the code

3.4 Synchronizing the Master configuration to various hosts

This step is not an official tutorial. In the official tutorial, the configuration of all hosts in the cluster has been changed to be consistent in the step of modifying system parameters. In this document, modify only the parameters of the master host. Perform unified cluster configuration in this step.

3.4.1 Adding user gpadmin in Batches

[root@mdw ~]# systemctl status firewalld
Copy the code

3.4.2 Enabling the encryption-free login of user gpadmin

* firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; 
vendor preset: enabled)   
Active: inactive (dead)
Copy the code
[root @ MDW greenplum db - 6.2.1]# su - gpadmin
[gpadmin@mdw ~]$ source /usr/local/greenplum-db/greenplum_path.sh
[gpadmin@mdw ~]$ ssh-keygen
[gpadmin@mdw ~]$ ssh-copy-id sdw1
[gpadmin@mdw ~]$ ssh-copy-id sdw2
[gpadmin@mdw ~]$ gpssh-exkeys -f /usr/local/greenplum-db/all_host
Copy the code

3.4.3 Batch Setting the environment variable of greenplum in user gpadmin

Add gp’s installation directory and session environment information to the user’s environment variables.

Edge.bash_profil and.bashrc

[root@mdw ~]# systemctl stop firewalld.service
[root@mdw ~]# systemctl disable firewalld.service
Copy the code

3.4.4 Batch Copying System Parameters to Other Nodes

This step distributes the system parameters previously configured on the master node to the other nodes in the cluster.

# sample:
su root
gpscp -f seg_host /etc/hosts   root@=:/etc/hosts
gpscp -f seg_host /etc/security/limits.conf   
root@=:/etc/security/limits.conf 
gpscp -f seg_host /etc/sysctl.conf  root@=:/etc/sysctl.conf 
gpscp -f seg_host /etc/security/limits.d/90-nproc.conf   
root@=:/etc/security/limits.d/90-nproc.conf
gpssh -f seg_host -e '/sbin/blockdev --setra 16384 /dev/sda'
gpssh -f seg_host -e 'echo deadline > /sys/block/sda/queue/scheduler'
gpssh -f seg_host -e 'sysctl -p'
gpssh -f seg_host -e 'reboot'
Copy the code

3.5 Installing Cluster Nodes

 ## Different from the old versionThis section is missing from the official website. Prior to GP6, there was a tool, gpsegInstall, to install GP software for each node. Gpseginstall logs show that the main steps of gpseginstall are: 1. Create gp users on each node (skip this step). 2. Package the installation directory of the primary node. 5. Grant gpamdin gpseginstall installation logsCopy the code

3.5.1 Simulating the gpseginstall script

The following script emulates the main process of gpsegInstall to complete the deployment of gpSegment

Run the command as user root
# variable Settings
link_name='greenplum-db'                    # soft connection name
binary_dir_location='/usr/local'            # Install path
binary_dir_name='the greenplum db - 6.2.1'        # Install directory
binary_path='/ usr/local/greenplum db - 6.2.1' # all directory

Package on master node
chown -R gpadmin:gpadmin $binary_path
rm -f ${binary_path}.tar; rm -f ${binary_path}.tar.gz
cd $binary_dir_location; tar cf ${binary_dir_name}.tar
${binary_dir_name}
gzip ${binary_path}.tar

# distribute to segment
gpssh -f ${binary_path}/seg_host -e "mkdir -p ${binary_dir_location}; rm -rf${binary_path}; rm -rf${binary_path}.tar; rm -rf${binary_path}.tar.gz"
gpscp -f ${binary_path}/seg_host ${binary_path}.tar.gz
root@=:${binary_path}.tar.gz
gpssh -f ${binary_path}/seg_host -e "cd ${binary_dir_location}; gzip -f -d${binary_path}.tar.gz; tar xf${binary_path}.tar"
gpssh -f ${binary_path}/seg_host -e "rm -rf
${binary_path}.tar; rm -rf${binary_path}.tar.gz; rm -f${binary_dir_location}/${link_name}"gpssh -f
${binary_path}/seg_host -e ln -fs
${binary_dir_location}/${binary_dir_name} ${binary_dir_location}/${link_name}
gpssh -f ${binary_path}/seg_host -e "chown -R gpadmin:gpadmin
${binary_dir_location}/${link_name}; chown -R gpadmin:gpadmin${binary_dir_location}/${binary_dir_name}"
gpssh -f ${binary_path}/seg_host -e "source ${binary_path}/greenplum_path"
gpssh -f ${binary_path}/seg_host -e "cd ${binary_dir_location}; ll"
Copy the code

3.5.2 Creating a Cluster Data Directory

3.5.2.1 Creating the Master Data Directory

 mkdir -p /opt/greenplum/data/master 
 chown gpadmin:gpadmin /opt/greenplum/data/master
 
 #standby data directory (this experiment has no standby)
 Use GPSSH to create a data directory for the standby
 # source /usr/local/greenplum-db/greenplum_path.sh 
 # gpssh -h smdw -e 'mkdir -p /data/master'
 # gpssh -h smdw -e 'chown gpadmin:gpadmin /data/master'
Copy the code

3.5.2.2 Creating the Segment Data Directory

This time, we plan to install two segments and two mirrors for each host.

source /usr/local/greenplum-db/greenplum_path.sh 
gpssh -f /usr/local/greenplum-db/seg_host -e 'mkdir -p /opt/greenplum/data1/primary'
gpssh -f /usr/local/greenplum-db/seg_host -e 'mkdir -p /opt/greenplum/data1/mirror'
gpssh -f /usr/local/greenplum-db/seg_host -e 'mkdir -p /opt/greenplum/data2/primary'
gpssh -f /usr/local/greenplum-db/seg_host -e 'mkdir -p /opt/greenplum/data2/mirror'
gpssh -f /usr/local/greenplum-db/seg_host -e 'chown -R gpadmin /opt/greenplum/data*'
Copy the code

3.6 Cluster Performance Testing

 ## Different from the old versionGp6 disables the gpCheck tool. The part that is currently verifiable is network and disk IO performance. The gpCheck tool verifies system parameters and hardware configurations required by gpCopy the code

IO /6-2/install…

Further reading: yq.aliyun.com/articles/23… A2c4e. 11155435.0.0. A9756e1eIiHSoH

Personal experience (only for examination, specific standards to find information) :

  • Generally, the disk must reach 2000M/s
  • The network must be at least 1000 MBIT /s

3.6.1 Network Performance Test

# temporary changes
hostname mdw
# permanent change
vi /etc/sysconfig/network
Copy the code

Netperf failed on sdw2 -> sdw1. The hosts of SDW2 is not configured. Change the host of SDW2.

3.6.2 Disk I/O Performance Test

Add IP and alias for each machine
[root@mdw ~]# cat /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4. Localdomain4 ::1 localhost localhost.localdomain Localhost6 localHost6. Localdomain6 172.28.25.201 MDW 172.28.25.202 SDw1 172.28.25.203 SDw2Change the hosts file for all hosts in the cluster, log in to each host, and execute the following statement:Cat >> /etc/hosts << EOF 172.28.25.201 MDW 172.28.25.202 SDw1 172.28.25.203 SDw2 EOFCopy the code
[root@mdw greenplum-db]# gpcheckperf -f /usr/local/greenplum-db/seg_host -r ds -D -d /opt/greenplum/data1/primary
/usr/local/greenplum-db/./bin/gpcheckperf -f /usr/local/greenplum-db/seg_host -r ds -D -d /opt/greenplum/data1/primary
--------------------
--  DISK WRITE TEST
--------------------
--------------------
--  DISK READ TEST
--------------------
--------------------
--  STREAM TEST
--------------------
====================
==  RESULT 2019-12-18T19:59:06.969229
==================== 
disk write avg time (sec): 47.34 
disk write tot bytes: 66904850432 
disk write tot bandwidth (MB/s): 1411.59 
disk write min bandwidth (MB/s): 555.60 [sdw2] 
disk write max bandwidth (MB/s): 855.99 [sdw1] 
-- per host bandwidth --    
disk write bandwidth (MB/s): 855.99 [sdw1]    
disk write bandwidth (MB/s): 555.60 [sdw2] 

disk readAvg time (SEC): 87.33 diskread tot bytes: 66904850432 
disk readTot Bandwidth (MB/s): 738.54 DiskreadMin Bandwidth (MB/s): 331.15 [sdw2] DiskreadMax bandwidth (MB/s): 407.39 [sdw1] -- per host bandwidth -- diskreadBandwidth (MB/s): 407.39 [sdw1] DiskreadBandwidth (MB/s): 331.15 [sdw2] Stream tot Bandwidth (MB/s): 12924.30 Stream min bandwidth (MB/s): 6451.80 [sdw1] Stream Max Bandwidth (MB/s): 6472.50 [sdw2] -- per host Bandwidth -- Stream Bandwidth (MB/s): [sdw1] Stream Bandwidth (MB/s): 6472.50 [sdw2]Copy the code

3.6.3 Verifying Cluster Clock (Unofficial Step)

Verify the cluster time. If the time is inconsistent, change the NTP time
gpssh -f /usr/local/greenplum-db/all_host -e 'date'
Copy the code

4. Initialize the cluster

IO /6-2/install…

4.1 Writing an Initial Configuration File

4.1.1 Copying a Configuration File Template

Shmall = _PHYS_PAGES / 2 # See Shared Memory Pages #
kernel.shmall = 4000000000# 
kernel.shmmax = kernel.shmall * PAGE_SIZE                  # Shared memory
kernel.shmmax = 500000000
kernel.shmmni = 4096
vm.overcommit_memory = 2 # See Segment Host memory
vm.overcommit_ratio = 95 # See Segment Host Memory

net.ipv4.ip_local_port_range = 10000 65535 # See Port Settings Port Settingskernel.sem = 500 2048000 200 40960 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536  kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 1 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.core.netdev_max_backlog = 10000 net.core.rmem_max = 2097152 net.core.wmem_max = 2097152 vm.swappiness = 10 vm.zone_reclaim_mode = 0 vm.dirty_expire_centisecs = 500 vm.dirty_writeback_centisecs = 100 vm.dirty_background_ratio = 0# See System Memory
vm.dirty_ratio = 0vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296
Copy the code

4.1.2 Modify parameters as required

Note: To specify PORT_BASE, review the port range specified in the net.ipv4.ip_local_port_range parameter in the /etc/sysctl.conf file.

Main modified parameters:

[root@mdw ~]# echo $(expr $(getconf _PHYS_PAGES) / 2)
2041774 
[root@mdw ~]# echo $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
8363106304
Copy the code

4.2 Cluster Initialization

4.2.1 Parameters of the Cluster Initialization command

# calculation vm. Overcommit_ratioOvercommit_ratio = (RAM-0.026*gp_vmem)/RAMCopy the code

4.2.2 Perform error handling

[gpadmin@mdw gpconfigs]$ gpinitsystem -c
/home/gpadmin/gpconfigs/gpinitsystem_config -h /usr/local/greenplum-
db/seg_host -D
...
/usr/local/greenplum-db/./bin/gpinitsystem: line 244:
/tmp/cluster_tmp_file.8070: Permission denied
/bin/mv: cannot stat `/tmp/cluster_tmp_file.8070': Permission denied ... 20191218:20:22:57:008070 gpinitsystem:mdw:gpadmin-[FATAL]:-Unknown host sdw1: ping: icmp open socket: Operation not permitted unknown host Script Exiting!Copy the code

4.2.2.1 Permission Denied Error Handling

awk 'BEGIN {OFMT = "%.0f"; } /MemTotal/ {print "vm.min_free_kbytes =", $2 * .03; } ' /proc/meminfo >> /etc/sysctl.conf 
Copy the code

4.2.2.2 ICMP Open Socket: Operation not permitted

gpssh -f /usr/local/greenplum-db/all_host -e 'chmod u+s /bin/ping'
Copy the code

4.2.2.3 Rollback Failed

Install midway failure, suggest using bash/home/gpadmin gpAdminLogs/backout_gpinitsystem_gpadmin_ * back, execute the script, such as:

[root@mdw ~]# vi /etc/sysctl.conf
[root@mdw ~]# sysctl -pkernel.shmall = 2041774 kernel.shmmax = 8363106304 kernel.shmmni = 4096 vm.overcommit_memory = 2 vm.overcommit_ratio = 95 net.ipv4.ip_local_port_range = 10000 65535 kernel.sem = 500 2048000 200 40960 kernel.sysrq = 1 kernel.core_uses_pid =  1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 1 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.core.netdev_max_backlog = 10000 net.core.rmem_max = 2097152 net.core.wmem_max = 2097152 vm.swappiness = 10 vm.zone_reclaim_mode = 0 vm.dirty_expire_centisecs = 500 vm.dirty_writeback_centisecs = 100 vm.dirty_background_ratio = 3 vm.dirty_ratio = 10Copy the code

Execute the rollback script

[gpadmin@mdw gpAdminLogs]$ ls
backout_gpinitsystem_gpadmin_20191218_203938  gpinitsystem_20191218.log
[gpadmin@mdw gpAdminLogs]$ bash
backout_gpinitsystem_gpadmin_20191218_203938
Stopping Master instance
waiting for server to shut down.... done
server stopped
Removing Master log file
Removing Master lock files
Removing Master data directory files
Copy the code

If the fault persists, run the following statements and reinstall the system

* soft nofile 524288
* hard nofile 524288
* soft nproc 131072
* hard nproc 131072
Copy the code

4.2.2.4 Ping: Unknown host GPzq-sh-MB Unknown host Script Withdraw! error

Please refer to: note.youdao.com/noteshare?i… 8 a72fdf1ec13a1c79b2d795e406b3dd2 & sub = 313 fe99d57c84f2ea498db6d7b79c7d3 editor/home/gpadmin /. Gphostcache file, for the content:

[gpadmin@mdw ~]$ cat .gphostcache
mdw:mdw
sdw1:sdw1
sdw2:sdw2
Copy the code

4.3 Initialization Complete the subsequent operations

If the initialization completes successfully, Greenplum Database instance successfully created is printed. Log is generated to/home/gpadmin gpAdminLogs/directory and naming rules: gpinitsystem_ ${date} installation. The log log last section below

. 20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[WARN]:- ******************************************************* 20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[WARN]:-Scan oflog file indicates that some warnings or errors
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[WARN]:-were generated during the array creation
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-Please review contents of log file
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-/home/gpadmin/gpAdminLogs/gpinitsystem_20191218.log
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-To determine level of criticality
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-These messages could be from a previous run of the utility
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-that was called today!
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[WARN]:-
*******************************************************
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-End Function SCAN_LOG
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-Greenplum Database instance successfully created
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:---------
----------------------------------------------
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-To complete the environment configuration, please
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-update gpadmin .bashrc file with the following
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-1. Ensure that the greenplum_path.sh file is sourced
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-2. Add "export MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1"
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-   to access the Greenplum scripts for this instance:
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-   or, use -d /opt/greenplum/data/master/gpseg-1 option for the Greenplum scripts
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-   Example gpstate -d /opt/greenplum/data/master/gpseg-1
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-Script log file = /home/gpadmin/gpAdminLogs/gpinitsystem_20191218.log
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-To remove instance, run gpdeletesystem utility
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-To initialize a Standby Master Segment for this Greenplum instance
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-Review options for gpinitstandby20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-----------
--------------------------------------------
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-The Master /opt/greenplum/data/master/gpseg-1/pg_hba.conf post gpinitsystem
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-has been configured to allow all hosts within this new
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-new array must be explicitly added to this file
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-located in the /usr/local/greenplum-db/./docs directory
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:------
-------------------------------------------------
20191218:20:45:51:013612 gpinitsystem:mdw:gpadmin-[INFO]:-End Main
Copy the code

Read the bottom of the log carefully, there are a few more steps to take.

4.3.1 Checking Log Content

The following message is displayed in the log:
Scan of log file indicates that some warnings or errors
were generated during the array creation
Please review contents of log file
/home/gpadmin/gpAdminLogs/gpinitsystem_20191218.log
Copy the code

Check installation log errors

#Scan warnings or errors:
cat /home/gpadmin/gpAdminLogs/gpinitsystem_20191218.log|grep -E -i 'WARN|ERROR]'
Copy the code

Adjust the log content to optimize cluster performance.

4.3.2 Setting Environment Variables

# edit the environment variable for user gpadmin, add
source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1
Copy the code

Environment variables for reference: GPDB. Docs. Pivotal. IO / 510 / install… Source /usr/local/greenplum-db/greenplum_path.sh add the following to.bash_profile and.bashrc:

su - gpadmin
cat >> /home/gpadmin/.bash_profile << EOF
export MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1
export PGPORT=5432
export PGUSER=gpadmin
export PGDATABASE=yjbdw
EOF
Copy the code

Distribute to each node

gpscp -f /usr/local/greenplum-db/seg_host /home/gpadmin/.bash_profile  gpadmin@=:/home/gpadmin/.bash_profile
gpscp -f /usr/local/greenplum-db/seg_host /home/gpadmin/.bashrc gpadmin@=:/home/gpadmin/.bashrc
gpssh -f /usr/local/greenplum-db/all_host -e 'source /home/gpadmin/.bash_profile; source /home/gpadmin/.bashrc; '
Copy the code

4.3.3 To delete the reinstallation, use gpdeletesystem

Once the installation is complete, use the GPdeletesystem tool if you need to delete the cluster and reinstall it for various reasons

IO /6-2/ Utility gpdb.docs.pivotal.

Use the following command:

gpdeletesystem -d /opt/greenplum/data/master/gpseg-1 -f
Copy the code
  • -d: the MASTER_DATA_DIRECTORY command is used to delete all data directories in the master segment.
  • -f: force: terminates all processes and forcibly deletes them.
[root@mdw ~]# cat /etc/security/limits.d/90-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.

*          soft    nproc     131072
root       soft    nproc     unlimited
Copy the code

After deleting the cluster, adjust the cluster initialization configuration file as required and initialize the cluster again.

vi /home/gpadmin/gpconfigs/gpinitsystem_config
gpinitsystem -c /home/gpadmin/gpconfigs/gpinitsystem_config -h 
/usr/local/greenplum-db/seg_host -D
Copy the code

4.3.4 configuration pg_hba. Conf

Configure pg_hba.conf as required.
/opt/greenplum/data/master/gpseg-1/pg_hba.conf
For details, see 5.2.1. Configuring pg_hba.conf
Copy the code

5 Configure the configuration after the installation

5.1 PSQL Logging In to the GP and Setting the Password

Use PSQL to log in to gp. The general command format is as follows:

psql -h hostname -p port -d database -U user -W password
Copy the code
  • -h: followed by the host name of the corresponding master or segment
  • -p: indicates the port number of the master or segment
  • -d: Followed by the database name

You can set the preceding parameters to user environment variables. In Linux, you do not need a password to use user gpadmin. Example for PSQL login and setting the password of user gpadmin:

[gpadmin@mdw gpconfigs]$PSQL PSQL (9.4.24) Type"help" for help.

yjbdw=# ALTER USER gpadmin WITH PASSWORD 'gpadmin';
ALTER ROLE
yjbdw=# \q
Copy the code

5.1.1 Log in to different nodes

# Parameter schematic:
Log in to the master node
[gpadmin@mdw gpconfigs]$ PGOPTIONS='-c gp_session_role=utility' psql -h mdw -p5432 -d postgres

To log in to the segment, specify the segment port.
[gpadmin@mdw gpconfigs]$ PGOPTIONS='-c gp_session_role=utility' psql -h sdw1 -p6000 -d postgres
Copy the code

5.2 Logging In to the GP from a Client

  • Configuration pg_hba. Conf
  • Configuring postgresql. Conf

5.2.1. Configuration pg_hba. Conf

The reference configuration instructions: blog.csdn.net/yaoqiancuo3…

# sample
vi /opt/greenplum523/data/master/gpseg-1/pg_hba.conf

# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
# IPv4 local connections:
# IPv6 local connections:
localAll gpadmin ident host all gpadmin 127.0.0.1/28 trusthost all gpadmin 172.28.25.204/32 trusthost all gpadmin 0.0.0.0/0 md5# New rule allows login with any IP password
host     all         gpadmin         ::1/128       trust
host     all         gpadmin         fe80::250:56ff:fe91:63fc/128       trust
local    replication gpadmin         ident
host     replication gpadmin         samenet       trust
Copy the code

5.2.2. Modify the postgresql. Conf

Listen_addresses = ‘*’ # Listen_addresses are allowed

Gp6.0 sets this parameter to listen_addresses = ‘*’ by default

vi /opt/greenplum523/data/master/gpseg-1/postgresql.conf
Copy the code

5.2.3. Load the modified file

gpstop -u 
Copy the code

5.2.4 Login from a Client

Log in using pgadmin4 or Navicat based on the installation information.

Installation ends here