Select Page

Contents

Filesystem Policy/Placement consideration

FailureGroup: GPFS uses this information during data and metadata placement to assure that no two replicas of the same block are written in such a way as to become unavailable due to a single failure. All disks that are attached to the same NSD server or adapter should be placed in the same failure group.

E.X.

– SAN can be FailureGroup 1, and CoRAID can be 2.

StoragePool: Storage pool names are case sensitive. Only the system pool may contain metadataOnly

We use pool when we apply file placement/policy

E.X.

– We create fileset call fsdata and link it to path /gpfs/fsdata (fsdata folder should not exist yet)

– We want all .ISO and .iso file to go to CoRAID_data LUN
RULE ‘isofiles’ SET POOL ‘data_coraid’ WHERE UPPER(name) like ‘%.ISO’

– All non *.ISO files placed in /gpfs/fsdata will go to CoRAID_data LUN
RULE ‘fsdata’ SET POOL ‘data_coraid’ FOR FILESET (‘fsdata’)

Adding tiebreaker to 3 nodes cluster with node-quroum

If you use tiebreaker disks, you can lose 2 of the 3 nodes and the file systems remain mounted on the last node.

# mmshutdown -a

# mmchconfig tiebreakerDisks="c3007tb1;c3008tb2;c3009tb3"

# mmstartup -a 

More about configruation details: How to configure GPFS tiebreakers

Using data disks on CoRaid as well as tiebreaker

Testing for cluster availability

Scenario A

Preparation:

Login to ubuntu desktop by UPI and mount smb home dir as the local home dir.

While testing, writing a 2G file (dd1.iso) to the home dir.

Find out which gpfs server is connecting to, in this case, it is gpfs1

Outage:

On gpfs1, do ifconfig bond0 down.

Outcome:

mmgetstate shows gpfs1 is still an active node.

gpfs cluster file system had no responding on the remaining nodes for about a minute.

Unbutu desktop got frozen, which proves with ctdb, smb states is not able to do handover (but the public ip address on failed node is failed to one of the remaining).

The file dd1.iso is not integrity.

File System Repilication

If replication of -m or -r is set to 2, storage pools must have two failure groups.

You can set replication when you create GPFS file system by mmcrfs.

You also can apply it later on by mmchfs, and mmrestripefs

Bring a disk up from down status

# mmlsdisk mfs1
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
mdisk1       nsd         512     100 yes      yes   ready         down           system
mdisk2       nsd         512     200 yes      yes   ready         up           system

# mmchdisk mfs1 start -d "mdisk1"

# mmrestripefs mfs1 -r

# mmlsdisk mfs1
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
mdisk1       nsd         512     100 yes      yes   ready         up           system
mdisk2       nsd         512     200 yes      yes   ready         up           system

Add a disk to replicated file system

-r for Rebalance all existing files in the file system to make use of new disks.

# mmadddisk mfs1 -F mdisk3.txt -r

Accessing GPFS file systems form other GPFS clusters

Each node in the GPFS cluster requiring file system access must have connection to the disks containning file system data.

Mounting a remote GPFS file system owned and served by another GPFS cluster by using mmauth to provide authorization.

System autoreboot in case of kernel panic

Due to lack of proper fencing it’s crucial to have the kernel autoreboot the system in case of a kernel panic, by default it hangs forever. I added the following line to /etc/sysctl.conf which causes the kernel to reboot after 10s of a kernel panic.

kernel.panic = 10

Loading up the AoE driver on startup on time

cat > /etc/sysconfig/modules/coraid.modules <<EOF

#!/bin/bash
modprobe aoe
EOF

chmod +x /etc/sysconfig/modules/coraid.modules

Delaying the startup of GPFS until CoRAID driver is fully initialized, added

sleep 60s

at the startup function of /etc/init.d/gpfs

GPFS storage pool management with policy rules

Refer to: How to create mulitple GPFS pools and define data policy rules

Testing CTDB with smb availability

Scenario A

Preparation:

Login to ubuntu desktop by UPI and mount smb home dir as the local home dir.

While testing, writing a 2G file (dd1.iso) to the home dir.

Find out which gpfs server is connecting to, in this case, it is gpfs3

Outage:

On gpfs3, do service ctdb stop

Outcome:

The smb public ip address is failed over to one of the remaining nodes.

Unbutu desktop got frozen for about 5mins, and then start to respond.

The data should be integrity if client user can wait the states back.

Improvement:

Make sure if network gear can treat this (arp cache) more efficient.

Testing CTDB with nfs availability

Scenario A

Preparation:

mount nfs share

find out which nfs server node is connectiong to, in this case gpfs3

Outag:

On gpfs3, to stop nfs service, do service ctdb stop

Outcome:

NFS mount point got frozen, it’s similar situation to smb, which would wait until the nfs server starts to responding.

Improvement:

Make sure network gear can treat this (arp cache) more efficient.

Performance Test Result

Image:Gpfs-test-22-06.png 

Image:Gpfs-test-22-06-read.png

Cluster with NSD server

In a cluster, nodes not directly attached to the disk have remote data access over the local area network (either Ethernet or Myrinet) to the NSD server. A backup NSD server having direct Fibre Channel access to the disks may also be defined. Any nodes directly attached to the disk will not access data through the NSD server. Furthermore, this provides a redundant scheme in case the direct attached link is broken (SAN connection), so the node can access the disk any way possible. This is the default behavior and can be changed with the useNSDserver file system mount option.

HOWTO E.X. We have 3nodes gpfs1, gpfs2, and gpfs3 in the cluster. Nodes gpfs1 and gpfs2 have directly attached link to SAN storage, but not gpfs3. How can we get gpfs3 to mount the gpfs filesystem (in this case we named the filesytem as fosgpfs)?

Shutdown the cluster

# mmshutdown -a

Add NSD server list, the first one in the list is always has higher priority, in this case is the primary NSD server is gpfs1, the secondary is gpfs2. You need to do on all the NSD LUN belong to the device filesystem.

# mmchnsd "system_san1:gpfs1,gpfs2"

# mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
fosgpfs system_san1 gpfs1,gpfs2

Change the mount option permanently, in this case the device name we are using is fosgpfs, and we need acl option as well.

# mmchfs fosgpfs -o acl,useNSDserver

# mmlsfs fosgpfs
flag value description
---- ---------------- -----------------------------------------------------
-f 2048 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 8192 Indirect block size in bytes
-m 2 Default number of metadata replicas
-M 2 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 2 Maximum number of data replicas
-j cluster Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-a 1048576 Estimated average file size
-n 32 Estimated number of nodes that will mount file system
-B 65536 Block size
-Q none Quotas enforced
none Default quotas enabled
-F 512006 Maximum number of inodes
-V 11.03 (3.3.0.0) File system version
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-L 2097152 Logfile size
-E yes Exact mtime mount option
-S no Suppress atime mount option
-K whenpossible Strict replica allocation option
-P system;san Disk storage pools in file system
-d system_san1;system_san2;data_san1;data_san2 Disks in file system
-A yes Automatic mount option
-o acl,useNSDserver Additional mount options
-T /gpfs Default mount point

Start cluster, all the nodes should be able to mount gpfs filesystem

# mmstartup -a

Skip to toolbar