# OS OTA Upgrades MonoK8s upgrades are driven through two custom resources: - `OSUpgrade`: the user-facing upgrade request. - `OSUpgradeProgress`: the per-node upgrade state watched and executed by the node agent. The node agent does the actual upgrade work. It watches `OSUpgradeProgress` resources assigned to its node, downloads the selected image, writes it to the inactive rootfs partition, updates status, and reboots when ready. The controller is optional but strongly recommended. It watches `OSUpgrade` resources and creates the matching `OSUpgradeProgress` resources for the target nodes. ## Install the controller By default, each managed node only runs the node agent. The node agent does **not** watch `OSUpgrade` directly; it only watches `OSUpgradeProgress`. You can create `OSUpgradeProgress` resources by hand, but normal users should not need to. Install the controller instead, then create `OSUpgrade` resources. Install the controller from the existing node-agent image: ```bash kubectl exec -i -n mono-system ds/node-agent -- \ ctl create controller --image REPO/IMAGE:TAG | kubectl apply -f - ``` ### `--image` `--image` is optional. If omitted, the generated Deployment uses the local controller image that is already shipped with managed nodes. In that mode, the controller Deployment is scheduled only onto managed nodes because the image is expected to exist locally. If provided, the generated Deployment uses that image directly. This is useful when you host the controller image in your own registry. There is no official public image repository yet, so external controller images must currently be managed by the operator. ## Create an upgrade Create an `OSUpgrade` resource to request an upgrade: ```bash kubectl apply -f upgrade.yaml ``` Example: ```yaml apiVersion: monok8s.io/v1alpha1 kind: OSUpgrade metadata: name: my-upgrade-2 spec: version: v1.35.3 nodeSelector: {} catalog: inline: | stable: v1.35.1 images: - version: v1.34.6 url: https://example.invalid/images/monok8s-v1.34.6.img.zst checksum: sha256:abc - version: v1.34.1 url: https://example.invalid/images/monok8s-v1.34.1.img.zst checksum: sha256:abc - version: v1.35.0 url: https://example.invalid/images/monok8s-v1.35.0.img.zst checksum: sha256:ghi - version: v1.35.4 url: https://example.invalid/images/monok8s-v1.35.4.img.zst checksum: sha256:jkl - version: v1.35.1 url: http://localhost:8000/rootfs.ext4.zst checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst blocked: - v1.34.0 ``` ### `spec.version` `spec.version` is the requested target version. It may be either: - an explicit version, such as `v1.35.3` - `stable`, if the catalog defines a `stable` version ### `spec.nodeSelector` `spec.nodeSelector` selects the nodes that should receive the upgrade. An empty selector means all eligible managed nodes. ### `spec.catalog` The catalog tells the agent where to find available OS images. The catalog can be provided inline: ```yaml catalog: inline: | stable: v1.35.1 images: - version: v1.35.1 url: https://example.invalid/images/monok8s-v1.35.1.img.zst checksum: sha256:abc size: 1858076672 ``` It can also be loaded from a URL: ```yaml catalog: url: https://example.com/images.yaml ``` Or from a ConfigMap: ```yaml catalog: configMap: images-cm ``` ConfigMap catalogs require extra RBAC. This permission is not enabled by default. To use a ConfigMap catalog, edit the relevant ClusterRole and allow `get` on `configmaps`. Catalog content should look like this: ```yaml stable: v1.35.1 images: - version: v1.34.6 url: https://example.invalid/images/monok8s-v1.34.6.img.zst checksum: sha256:abc - version: v1.34.1 url: https://example.invalid/images/monok8s-v1.34.1.img.zst checksum: sha256:abc - version: v1.35.0 url: https://example.invalid/images/monok8s-v1.35.0.img.zst checksum: sha256:ghi - version: v1.35.4 url: https://example.invalid/images/monok8s-v1.35.4.img.zst checksum: sha256:jkl - version: v1.35.1 url: http://localhost:8000/rootfs.ext4.zst checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst blocked: - v1.34.0 ``` ## Monitor upgrades List upgrade requests: ```bash kubectl get osupgrades ``` Example output: ```text NAME DESIRED RESOLVED PHASE my-upgrade-3 stable v1.35.4 Pending my-upgrade-2 v1.35.3 v1.35.3 Accepted my-downgrade-1 v1.33.2 v1.33.2 Rejected ``` List per-node progress: ```bash kubectl get osupgradeprogresses ``` Example output: ```text NAME NODE SOURCE CURRENT TARGET STATUS osupgrade-abc123f node-1 my-upgrade-2 v1.34.1 v1.35.3 Downloading osupgrade-cde456g node-2 my-upgrade-2 v1.35.3 v1.35.3 Completed ``` Inspect one node's progress: ```bash kubectl describe osupgradeprogress osupgrade-abc123f ``` Example resource: ```yaml apiVersion: monok8s.io/v1alpha1 kind: OSUpgradeProgress metadata: name: osupgrade-abc123f spec: sourceRef: name: my-upgrade-2 nodeName: node-1 status: currentVersion: v1.34.1 targetVersion: v1.35.3 phase: Downloading startedAt: null completedAt: null lastUpdatedAt: null retryCount: 0 inactivePartition: B failureReason: "" message: "" ``` ## Retry a failed upgrade If an upgrade fails, for example because the image download failed, edit `spec.retryNonce` on the affected `OSUpgradeProgress` resource. Any changed value is enough. The field is only used to tell the node agent that the user intentionally requested a retry. Example: ```bash kubectl patch osupgradeprogress osupgrade-abc123f \ --type merge \ -p '{"spec":{"retryNonce":"retry-1"}}' ``` If the same node fails again and you want to retry again, change the nonce to a new value: ```bash kubectl patch osupgradeprogress osupgrade-abc123f \ --type merge \ -p '{"spec":{"retryNonce":"retry-2"}}' ``` ## Development notes ### Flash an image manually into partition B Use nmap's `ncat`. Other tools may work, but they are more likely to cause annoying stream or connection behavior. On the sending machine: ```bash pv out/rootfs.ext4.zst | ncat 10.0.0.10 1234 --send-only ``` On the receiving machine: ```bash ncat -l 1234 --recv-only | \ zstd -d -c | \ dd of=/dev/sda3 bs=4M status=progress && \ sync && \ echo "SUCCESS" ``` Be careful with the target partition. The example writes to `/dev/sda3`, which is assumed to be rootfs B in that setup. Verify the partition layout before running this on real hardware.