Advisory: DRBD Unrotated UUID

Overview

A bug in older versions of DRBD and LINSTOR can cause a storage node to incorrectly treat a freshly-provisioned replica as already up to date, skipping the data synchronization that should occur when a node is added or replaced. If this happens, your application could silently read corrupted or stale data without any error being reported.

The root cause is that DRBD tracks which replica holds the most recent data using internal version identifiers. Volumes created with older software may never have had those identifiers updated after the initial write, leaving them unable to tell a new empty replica apart from an existing one. The bug is fixed in DRBD 9.2.18 / 9.3.2 and LINSTOR 1.33.2 (or 1.34+): upgrading prevents any new volumes from being affected.

Upgrading alone is not enough. Volumes that are already in the affected state need a one-time corrective action to permanently resolve the issue. The steps below walk you through identifying which volumes are affected and how to fix them.

The remediation procedure consists of four steps:

Identify affected resources using a provided Kubernetes job.
Review the labeled PersistentVolumes.
Trigger a UUID bump for each affected volume.
Verify and clean up.

Step 1: Identify Affected Resources

Apply the detection job below. It spawns a privileged pod on every cluster node, scans all DRBD volumes for the day-0 UUID condition, and labels any matching PersistentVolume with linstor.csi.linbit.com/unrotated-uuid=true.

kubectl apply -n linbit-sds -f https://charts.linstor.io/advisories/drbd-unrotated-uuid/find-unrotated-uuids.yaml

Wait for the control job to finish (it cleans up the per-node jobs automatically):

kubectl wait --for=condition=complete --timeout=600s \
    job/find-unrotated-uuids-ctrl -n linbit-sds

Inspect the job log to see which resources were found and which PVs were labeled:

kubectl logs -n linbit-sds job/find-unrotated-uuids-ctrl

Step 2: Review Affected PersistentVolumes

List every PV that was labeled by the detection job:

kubectl get pv -l linstor.csi.linbit.com/unrotated-uuid=true

If no PVs are listed, your cluster is not affected and no further action is required.

Step 3: Trigger a UUID Bump

Perform the following procedure for each affected PV. The right approach depends on how many diskful replicas the underlying LINSTOR resource has.

Volumes with two or more diskful replicas (most common)

Gracefully disconnect a secondary replica, wait for one write to reach the primary, then reconnect. The write causes DRBD to rotate the UUID. No application downtime is required.

Get the DRBD resource name from the PV:

RESOURCE=$(kubectl get pv <pv-name> -o jsonpath='{.spec.csi.volumeHandle}')
echo $RESOURCE

List the replicas and identify a secondary (non-primary) diskful node:

kubectl exec -n linbit-sds deploy/linstor-controller -- \
    linstor resource list -r $RESOURCE

Disconnect the resource on that node via the LINSTOR satellite pod:

kubectl exec -n linbit-sds ds/linstor-satellite.<NODE> -- drbdadm disconnect $RESOURCE

Ensure at least one write reaches the primary. If the PVC is actively used by a running workload, any application write is sufficient. Otherwise, exec into a pod that mounts the PVC and write a small amount of data, for example:
```
kubectl exec <pod> -- sh -c 'dd if=/dev/urandom bs=4k count=1 of=<mountpath>/.uuid-bump && sync && rm <mountpath>/.uuid-bump'
```

Reconnect the replica:

kubectl exec -n linbit-sds ds/linstor-satellite.<NODE> -- drbdadm connect $RESOURCE

Wait for the resync to complete. All replicas must return to UpToDate before proceeding to the next volume:
```
kubectl exec -n linbit-sds deploy/linstor-controller -- \
    linstor resource list -r $RESOURCE
```

Volumes with a single diskful replica

When there is only one diskful replica, toggling to diskless would remove the only copy of the data. Instead, use drbdadm new-current-uuid directly on the hosting node. This requires briefly quiescing I/O on the resource.

Scale down all workloads (Deployments, StatefulSets, etc.) that use the PVC.

Identify the node hosting the diskful replica (from linstor resource list -r $RESOURCE) and run the following commands on that node as root:

kubectl exec -n linbit-sds ds/linstor-satellite.<NODE> -- drbdadm disconnect $RESOURCE
kubectl exec -n linbit-sds ds/linstor-satellite.<NODE> -- drbdadm new-current-uuid $RESOURCE/0
kubectl exec -n linbit-sds ds/linstor-satellite.<NODE> -- drbdadm connect $RESOURCE

Scale the workloads back up. LINSTOR will reconnect the resource automatically.

Step 4: Verify and Clean Up

After remediating all affected volumes, follow these steps to confirm the cluster is clean.

Remove the advisory labels so the detection job starts with a clean slate:

kubectl label pv -l linstor.csi.linbit.com/unrotated-uuid=true \
    linstor.csi.linbit.com/unrotated-uuid-

Delete and re-apply the detection job:

kubectl delete -n linbit-sds -f https://charts.linstor.io/advisories/drbd-unrotated-uuid/find-unrotated-uuids.yaml
kubectl apply -n linbit-sds -f https://charts.linstor.io/advisories/drbd-unrotated-uuid/find-unrotated-uuids.yaml
kubectl wait --for=condition=complete --timeout=600s \
    job/find-unrotated-uuids-ctrl -n linbit-sds

Confirm no PVs were labeled. If any appear, repeat Step 3 for the remaining volumes.
```
kubectl get pv -l linstor.csi.linbit.com/unrotated-uuid=true
```

Remove the detection job and its RBAC resources:

kubectl delete -n linbit-sds -f https://charts.linstor.io/advisories/drbd-unrotated-uuid/find-unrotated-uuids.yaml

Suspected Past Corruption

The detection job identifies volumes that are currently in the vulnerable state. It cannot determine whether corruption already occurred on a volume whose UUID has since been rotated by normal operation (for example, after a node was replaced). To check whether the replicas of a volume actually hold identical data, you can use the drbd-verify.py tool, which compares every pair of nodes and reports any differences. If you have reason to believe that a node was re-provisioned while running an affected DRBD and LINSTOR version, please contact LINBIT support for guidance.