Perform a Complex Restore of a Blockchain Application in Kubernetes with Kasten K10 

When the piano was invented, musicians noticed that it had become much easier to produce a sound with the instrument. But pianists were expected to play many more sounds than a classic instrument. Complexity was just moving to another place.  

Kubernetes makes application deployment easier, yet application features require an increasing number of components and requirements. A good example of this is the IBM blockchain network, in which the trust on the transaction can be distributed between organizations.

Four organizations — R1, R2, R3 and R4 — have jointly decided and entered an agreement to set up and exploit a Hyperledger Fabric network.  

Without entering all the details, this application is complex to backup and restore. It creates  :  

  1. A least 11 PVCs that must be backed up and restored in the right order. 
  2. More than 12 deployments if we include the client application. 
  3. Multiple secrets, configmaps, service accounts, services and routes.  
  4. Multiple IBP custom resources managed by an operator that creates the deployment and PVCs. 

IBM features an IBP console that helps with deploying peers, oredererers and ca nodes on the network. It also helps to manage identities and channels, and visualize the transaction on the ledger. This UI is very handy, but  constantly creates new PVCs, deployments and route services as the network grows, all of which must be backed up regularly. Additionally, the operator must also be managed by the Lifecycle Operator on OpenShift, which creates even more complexity. 

Without a tool such as Kasten K10 by Veeam to protect and restore such applications, backup and restore processes are challenging. In this post, we’ll demonstrate the power of Kasten K10 using this scenario as an example. 

Assumptions  

In this tutorial, we assume that you have installed Kasten on OpenShift.  (If that’s not the case, read this blog post.) 

My base domain on OpenShift will be michael-1.aws.kasten.io.   

We will restore the application without the Operator Lifecycle Manager, to demonstrate restoration is possible on a cluster that does not exist on vanilla Kubernetes. 

Step 1: Create a Blockchain Network 

The first step is to create the blockchain namespace: 

oc new-project my-blockchain 

Next, we install the blockchain operator, browse to the Red Hat Marketplace and log in or create a new account. Then, we will register the cluster to the Red Hat marketplace.  

Now, we’ll go to the Operator Hub. In the search bar, type “blockchain” to load the blockchain tile. 

Step 2: Apply the Security Context Constraint 

The next step is to copy and save the following security context constraint object to the local system as ibp-scc.yaml: 

cat <<EOF | oc apply -f - 
allowHostDirVolumePlugin: false 
allowHostIPC: false 
allowHostNetwork: false 
allowHostPID: false 
allowHostPorts: false 
allowPrivilegeEscalation: true 
allowPrivilegedContainer: true 
allowedCapabilities: 
- NET_BIND_SERVICE 
- CHOWN 
- DAC_OVERRIDE 
- SETGID 
- SETUID 
- FOWNER 
apiVersion: security.openshift.io/v1 
defaultAddCapabilities: [] 
fsGroup: 
  type: RunAsAny 
groups: 
- system:serviceaccounts:my-blockchain 
kind: SecurityContextConstraints 
metadata: 
  name: my-blockchain 
readOnlyRootFilesystem: false 
requiredDropCapabilities: [] 
runAsUser: 
  type: RunAsAny 
seLinuxContext: 
  type: RunAsAny 
supplementalGroups: 
  type: RunAsAny 
volumes: 
- "*" 
EOF 

Then, we will run the following commands to add the file to the cluster, and add the constraint to the project: 

oc adm policy add-scc-to-user my-blockchain system:serviceaccounts:my-blockchain 

When the command is successful, we will see a response that is similar to the following example: 

securitycontextconstraints.security.openshift.io/my-blockchain created 
scc "blockchain-project" added to: ["system:serviceaccounts:my-blockchain"] 

Step 3: Deploy the IBM Blockchain Platform console 

There are four instances available listed under “Provided APIs”: 

Step 4: Create the Instance on the IBPConsole Tile 

apiVersion: ibp.com/v1beta1 
kind: IBPConsole 
metadata: 
  name: ibpconsole 
  namespace: my-blockchain 
  labels: 
    app.kubernetes.io/name: "ibp" 
    app.kubernetes.io/instance: "ibp" 
    app.kubernetes.io/managed-by: "ibm-ibp" 
spec: 
  email: michael@kasten.io 
  password: ultrasecurepassword 
  imagePullSecrets: 
  - regcred 
  registryURL: cp.icr.io/cp 
  license: 
    accept: true 
  networkinfo: 
    domain: apps.michael-1.aws.kasten.io 
  storage: 
    console: 
      class: '' 
      size: 5Gi 
  serviceAccountName: ibm-blockchain 
  version: 2.5.1 

See https://cloud.ibm.com/docs/blockchain-sw-251?topic=blockchain-sw-251-deploy-ocp-rhm#console-deploy-ocp-rhm-advanced for more advanced options. 

Step 5: Verify the Console Installation and Log In 

oc get deployment -n my-blockchain 
NAME           READY   UP-TO-DATE   AVAILABLE   AGE 
ibp-operator   1/1     1            1           34m 
ibpconsole     1/1     1            1           16m 

All deployment are in the ready state: 

oc get po  
NAME                            READY   STATUS    RESTARTS   AGE 
ibp-operator-7f896d6644-sh2mx   1/1     Running   0          36m 
ibpconsole-6f94ddc6f9-t9khx     4/4     Running   0          18moc get route 
NAME                 HOST/PORT                                                       
PATH   SERVICES     PORT      TERMINATION   WILDCARD 
ibpconsole-console   
my-blockchain-ibpconsole-console.apps.michael-1.aws.kasten.io          
ibpconsole   optools   passthrough   None 
ibpconsole-proxy     
my-blockchain-ibpconsole-proxy.apps.michael-1.aws.kasten.io            
ibpconsole   optools   passthrough   None 

We connect to https://my-blockchain-ibpconsole-console.apps.michael-1.aws.kasten.io  

and use our email and password provided in the ibpconsole michael@kasten.io and ultrasecurepassword. We then change the password as requested by the UI and reconnect: 

Step 6: Create a Blockchain Network 

We will create a minimal network for testing purposes following this tutorial

At this point, we can run backup and recovery on the  first two blocks: 

But it’s more interesting to have transactions to make sure we are able to retrieve them upon restore. 

We can use this tutorial to create a nodejs basic smartcontract  

Here’s a summary of the steps:  

All these operations create some transactions on the ledger that we can use now to test the restoration process: 

It could also be interesting to connect from VS Code to the platform and create some transactions, but this is beyond the scope of this blog post. 

What’s Available for Backup and Restore? 

At this point, we have these pods: 

oc get po  
NAME                                                       READY   STATUS    RESTARTS   AGE 
chaincode-execution-8d91fc09-df12-4c91-8b1b-6239b7947e23   1/1     Running   0          64mibp-operator-7f896d6644-sh2mx                              1/1     Running   0          22h 
ibpconsole-6f94ddc6f9-t9khx                                4/4     Running   0          22h 
orderingserviceca-b64685fc9-2q6dl                          1/1     Running   0          15h 
orderingservicenode1-f7bb5f9f-2w29b                        2/2     Running   0          14h 
org1ca-7fd68c8cc-ggvpq                                     1/1     Running   0          16h 
peerorg1-58b48f49f9-mwp2j                                  4/4     Running   0          15h 

We also have these deployments: 

oc get deployment 
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE 
ibp-operator           1/1     1            1           22h 
ibpconsole             1/1     1            1           22h 
orderingserviceca      1/1     1            1           15h 
orderingservicenode1   1/1     1            1           14h 
org1ca                 1/1     1            1           16h 
peerorg1               1/1     1            1           15h 

Note that the chaincode-execution pod is not linked to any deployment.  

Here are the PVCs… 

oc get pvc        
NAME                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE 
ibpconsole-pvc             Bound    pvc-07638d8f-1e9d-4fb8-a4d7-5dc3a7a05840   5Gi        RWO            gp2-csi        22h 
orderingserviceca-pvc      Bound    pvc-b71c1051-d538-4eff-9e0f-937497125dec   20Gi       RWO            gp2-csi        15h 
orderingservicenode1-pvc   Bound    pvc-3748a0cd-995e-4701-bbb9-484634345b5f   100Gi      RWO            gp2-csi        14h 
org1ca-pvc                 Bound    pvc-d46a10b3-f9d7-4bbb-8a71-d4574aa05baa   20Gi       RWO            gp2-csi        16h 
peerorg1-pvc               Bound    pvc-e5575337-064f-4e0c-8e8f-1745655b8529   100Gi      RWO            gp2-csi        15h 
peerorg1-statedb-pvc       Bound    pvc-e18e0783-8608-4668-9640-7f411726c6bd   10Gi       RWO            gp2-csi        15h 

…and the corresponding CRD mapping to ibp objects: 

oc get ibpca,ibpconsole,ibporderer,ibppeer 
NAME                              READY 
ibpca.ibp.com/orderingserviceca    
ibpca.ibp.com/org1ca               
 
NAME                            READY 
ibpconsole.ibp.com/ibpconsole   
 
NAME                                      READY 
ibporderer.ibp.com/orderingservice         
ibporderer.ibp.com/orderingservicenode1    
 
NAME                       READY 
ibppeer.ibp.com/peerorg1  

Here are  the Operator Lyfecycle Manager elements provided to install the global blockchain application: 

oc get operatorgroup,subscription,csv  
NAME                                                     AGE 
operatorgroup.operators.coreos.com/my-blockchain-jh4jr   23h 
 
NAME                                               PACKAGE          SOURCE                 CHANNEL 
subscription.operators.coreos.com/ibm-blockchain   ibm-blockchain   ibm-operator-catalog   v2.5 
 
NAME                                                 DISPLAY          VERSION   REPLACES   PHASE 
clusterserviceversion.operators.coreos.com/ibm-blockchain.v2.5.1   IBM Blockchain   2.5.1     Succeeded 

The Operator Lifecycle Manager created the clusterrole and clusterrolebinding during install, to let the ibp-operator manage pods, deployment and IBP custom resources. These need to be captured at cluster resource scope level: 

oc get clusterrole | grep ibp                     
ibpcas.ibp.com-v1beta1-admin                                                2021-03-29T14:30:16Z 
ibpcas.ibp.com-v1beta1-crdview                                              2021-03-29T14:30:17Z 
ibpcas.ibp.com-v1beta1-edit                                                 2021-03-29T14:30:16Z 
ibpcas.ibp.com-v1beta1-view                                                 2021-03-29T14:30:16Z 
ibpconsoles.ibp.com-v1beta1-admin                                           2021-03-29T14:30:17Z 
ibpconsoles.ibp.com-v1beta1-crdview                                         2021-03-29T14:30:17Z 
ibpconsoles.ibp.com-v1beta1-edit                                            2021-03-29T14:30:17Z 
ibpconsoles.ibp.com-v1beta1-view                                            2021-03-29T14:30:17Z 
ibporderers.ibp.com-v1beta1-admin                                           2021-03-29T14:30:17Z 
ibporderers.ibp.com-v1beta1-crdview                                         2021-03-29T14:30:17Z 
ibporderers.ibp.com-v1beta1-edit                                            2021-03-29T14:30:17Z 
ibporderers.ibp.com-v1beta1-view                                            2021-03-29T14:30:17Z 
ibppeers.ibp.com-v1beta1-admin                                              2021-03-29T14:30:17Z 
ibppeers.ibp.com-v1beta1-crdview                                            2021-03-29T14:30:17Z 
ibppeers.ibp.com-v1beta1-edit                                               2021-03-29T14:30:17Z 
ibppeers.ibp.com-v1beta1-view                                               2021-03-29T14:30:17Z 
 
oc get clusterrole | grep ibm  
ibm-blockchain.v2.5.2-6bdc5f6d8 
 
oc get clusterrolebinding | grep ibm  
ibm-blockchain.v2.5.2-6bdc5f6d8 

Restoration Constraints  

  1. Topology constraint 

To accomplish high resiliency, the blockchain operator decides on a very precise topology breakdown of the pods around zones. When needed, the operator can create a zone label on the PVC that corresponds to the zone of the actual PVC:  

oc get pvc -L zone                                        
NAME                       STATUS   VOLUME                                     ZONE 
ibpconsole-pvc             Bound    pvc-a08bd96f-bc41-432f-ac96-26c4850b81ce    
orderingserviceca-pvc      Bound    pvc-f3ca0d28-012a-4a5a-a793-474045bac3ce   eu-west-1a 
orderingservicenode1-pvc   Bound    pvc-161921ca-3fcf-490a-b49b-5ab88ee6bc41    
org1ca-pvc                 Bound    pvc-308f93b1-eb45-439a-8808-1e3d79550b46   eu-west-1a 
peerorg1-pvc               Bound    pvc-d0318054-af5f-44a4-b49c-7f1cd5d4e56e   eu-west-1b 
peerorg1-statedb-pvc       Bound    pvc-2ac27a9c-a571-488a-a28e-d38f90054a42   eu-west-1b 

To reinforce this, an affinity rule is also set up on the deployment itself: 

oc get deploy peerorg1 -o jsonpath='{.spec.template.spec.affinity}' | jq 
{ 
  "nodeAffinity": { 
    "requiredDuringSchedulingIgnoredDuringExecution": { 
      "nodeSelectorTerms": [ 
        { 
          "matchExpressions": [ 
            { 
              "key": "topology.kubernetes.io/zone", 
              "operator": "In", 
              "values": [ 
                "eu-west-1b" 
              ] 
            }, 
            { 
              "key": "topology.kubernetes.io/region", 
              "operator": "In", 
              "values": [ 
                "eu-west-1" 
              ] 
            } 
          ] 
        }, 
        { 
          "matchExpressions": [ 
            { 
              "key": "failure-domain.beta.kubernetes.io/zone", 
              "operator": "In", 
              "values": [ 
                "eu-west-1b" 
              ] 
            }, 
            { 
              "key": "failure-domain.beta.kubernetes.io/region", 
              "operator": "In", 
              "values": [ 
                "eu-west-1" 
              ] 
            } 
          ] 
        } 
      ] 
    } 
  }, 
  "podAntiAffinity": { 
    "preferredDuringSchedulingIgnoredDuringExecution": [ 
      { 
        "podAffinityTerm": { 
          "labelSelector": { 
            "matchExpressions": [ 
              { 
                "key": "orgname", 
                "operator": "In", 
                "values": [ 
                  "org1msp" 
                ] 
              } 
            ] 
          }, 
          "topologyKey": "kubernetes.io/hostname" 
        }, 
        "weight": 100 
      } 
    ] 
  } 
} 

That means that when restoring the PVI, we must make sure we restore it to the right zone. 

  1. Data Consistency Constraint 

To understand the data consistency constraint, we must first understand the transaction flow: 

Every peer pod holds 2 PVCs:  

It’s important to make sure that the couchdb PVC does not have validated transactions that the ledger does not have. IBM recommends taking a snapshot of the couchdb PVC at 3:00 a.m. and a snapshot of the ledger PVC at 3:05 a.m. 

Every orderer pod has a ledger PVC, but contrary to a public blockchain that uses proof of work to validate the blockchain, private blockchains use orderers to order the validated block. 

We must be sure that peers do not have validated transactions that the orderer does not have. This time, we are not speaking about consistency inside a pod, but consistency inside a wide network. IBM recommends taking a snapshot of the PVC orderer at 5:00 a.m. (2 hours after the snapshot of the peers).   

There are also CA pods and an IBP Console pod with a single PVC. These should be backed up each time the network topology changes. To make it simpler, let’s back them up at 5:00 a.m. with the orderer. 

Here’s a summary of the scheduled backups: 

PEER COUCHDB PVC 

3:00 A.M. 

Peer Ledger PVC 

3:05 a.m. 

Orderer Ledger PVC 

5:00 a.m. 

CA and IBPConsole PVC 

5:00 a.m. 

   

Implementing the Backup Strategy 

We can implement the backup strategy easily by creating:  

  1. One daily policy that backs up the entire namespace at 3:00 a.m. and 5:00 a.m. 
  1. One daily policy that backs up the entire namespace at 3:05 a.m. 
  1. One weekly cluster resource policy (to capture clusterrole and clusterrolebinding). 

With this schedule, we have everything we need to restore the application consistently. 

Restoring the Application After a Disaster 

Let’s remove the my-blockchain application to emulate the disaster: 

oc delete ns my-blockchain 

Step 1: Recreate the PVC from the Different Restore Points 

First, we’ll recreate the empty namespace:  

oc create ns my-blockchain 

We’ll  use the partial restore capacity of Kasten K10 to pick up the different PVCs for the various  restore points:

RESTOREPOINT 

PVC 

3:00 

peerorg1-pvc 

3:05 

peerorg1-statedb-pvc 

5:00 

orderingservicenode1-pvc 

 

The console pvc and the rest of CA pvc  

ibpconsole-pvc 

orderingserviceca-pvc 

org1ca-pvc 

This image shows how to restore only the peerorg1-pvc from the 3:00 a.m. restore point: 

Step 2: ReCreate the PVC in the Right Zone  

Before launching the restore, we need to apply a transform to make sure the PVC is recreated in the right zone. When the blockchain operator defines precisely in which zone the PVC has been created, it add a “zone” label to the PVC and an affinity constraint to the deployment using this PVC:  

oc get pvc peerorg1-pvc -o yaml  
apiVersion: v1 
kind: PersistentVolumeClaim 
metadata: 
  labels: 
    ... 
    zone: eu-west-1b 
  name: peerorg1-pvc 
...oc get deploy peerorg1 -o yaml 
apiVersion: apps/v1 
kind: Deployment 
metadata: 
  ... 
  name: peerorg1 
  namespace: my-blockchain 
  …. 
spec: 
  .... 
    spec: 
      affinity: 
        nodeAffinity: 
          requiredDuringSchedulingIgnoredDuringExecution: 
            nodeSelectorTerms: 
            - matchExpressions: 
              - key: topology.kubernetes.io/zone 
                operator: In 
                values: 
                - eu-west-1b 
              - key: topology.kubernetes.io/region 
                operator: In 
                values: 
                - eu-west-1 
            - matchExpressions: 
              - key: failure-domain.beta.kubernetes.io/zone 
                operator: In 
                values: 
                - eu-west-1b 
              - key: failure-domain.beta.kubernetes.io/region 
                operator: In 
                values: 
                - eu-west-1 
... 

We will use this label to recreate the PVC in the right zone. Then, we’ll  create a storageclass per zone: 

oc get sc eu-west-1a -o yaml  
allowVolumeExpansion: true 
allowedTopologies: 
- matchLabelExpressions: 
  - key: topology.ebs.csi.aws.com/zone 
    values: 
    - eu-west-1a 
apiVersion: storage.k8s.io/v1 
kind: StorageClass 
metadata: 
  name: eu-west-1a   
parameters: 
  encrypted: "true" 
  type: gp2 
provisioner: ebs.csi.aws.com 
reclaimPolicy: Delete 
volumeBindingMode: WaitForFirstConsumer 
... 

We’ll use a transform to copy the zone name in the storageClassName in the PVC spec: 

Then, we’ll apply this transform for each of the three restore points.  

Step 3: Restore the Rest of the namespace (with Some Exclusions) 

Now we have the namespace with only the PVCs. We must bring back the rest of the namespace, but without:  

  1. ClusterServiceVersion  
  2. Subscription 
  3. OperatorGroup 
  4. InstallPlan 

We also need to scale down all the deployments. This is because Kasten K10 will restore the CRD after all deployments are up and running. However, if those deployments rely on this CRD to work, it’s best to scale down everything so that Kasten K10 considers everything successful and moves to CRD restoration. We’ll scale up afterwards. 

Step 4: Restore the clusterrole and clusterrolebinding from the Cluster Restore Point 

When we deleted the namespace, the Operator Lifecycle Manager automatically deleted the clusterrole and clusterrolebinding created for the IBM Blockchain operator:  

apiVersion: rbac.authorization.k8s.io/v1 
kind: ClusterRole 
metadata: 
  creationTimestamp: "2021-03-31T23:09:32Z" 
  labels: 
    olm.owner: ibm-blockchain.v2.5.1 
    olm.owner.kind: ClusterServiceVersion 
    olm.owner.namespace: my-blockchain 
    operators.coreos.com/ibm-blockchain.my-blockchain: "" 
  name: ibm-blockchain.v2.5.1-6bdc5f6d8 
rules: 
- apiGroups: 
  - apiextensions.k8s.io 
  resources: 
  - persistentvolumeclaims 
  - persistentvolumes 
  verbs: 
  - '*' 
... 

The olm.owner label has OLM to remove this object if the ClusterServiceVersion is removed, and we decide to restore without the OLM objects. To achieve our restore, we need to remove the labels section.  

We retrieve the clusterrole and clusterrolebinding from the cluster restore point and apply a transform to remove the labels section:  

If we were restoring in a new cluster, we would have to restore  the other my-blockhain clusterrole and rolebinding created by the deployment of the operator by OLM, as well:  

Restart the Deployment  

We can now restart the deployment and check that everything is recovered and that all the pods are back: 

for dep in $(oc get deploy -o name); do oc scale $dep --replicas=1; doneoc get pods 
NAME                                    READY   STATUS    RESTARTS   AGE 
ibp-operator-7dd4bfb76f-s89rx           1/1     Running   0          2d8h 
ibpconsole-cbf76d57d-2xwx9              4/4     Running   0          2d8h 
orderingserviceca-78fb95747c-dvk4k      1/1     Running   0          2d8h 
orderingservicenode1-646dd846bc-94fwm   2/2     Running   0          2d8h 
org1ca-75845db8bf-ts6tf                 1/1     Running   0          2d8h 
peerorg1-7758946784-kqcrp               4/4     Running   0          2d8h 

The best way to check that your data is recovered is to check the block transactions: 

We can verify that we got back all the blocks and the transaction. We were also able to  restart the whole blockchain application, within the data consistency and topology constraints. 

Conclusion  

With this article, we’ve shown how you can more easily perform a complex backup and restoration process with Kasten K10. Kasten has all the necessary features to make your backup and restore possible, as long as you understand how your application works. 

Try Kasten K10 for free today.  

Exit mobile version