6.8 KiB
OS OTA Upgrades
MonoK8s upgrades are driven through two custom resources:
OSUpgrade: the user-facing upgrade request.OSUpgradeProgress: the per-node upgrade state watched and executed by the node agent.
The node agent does the actual upgrade work. It watches OSUpgradeProgress resources assigned to its node, downloads the selected image, writes it to the inactive rootfs partition, updates status, and reboots when ready.
The controller is optional but strongly recommended. It watches OSUpgrade resources and creates the matching OSUpgradeProgress resources for the target nodes.
Install the controller
By default, each managed node only runs the node agent. The node agent does not watch OSUpgrade directly; it only watches OSUpgradeProgress.
You can create OSUpgradeProgress resources by hand, but normal users should not need to. Install the controller instead, then create OSUpgrade resources.
Install the controller from the existing node-agent image:
kubectl exec -i -n mono-system ds/node-agent -- \
ctl create controller --image REPO/IMAGE:TAG | kubectl apply -f -
--image
--image is optional.
If omitted, the generated Deployment uses the local controller image that is already shipped with managed nodes. In that mode, the controller Deployment is scheduled only onto managed nodes because the image is expected to exist locally.
If provided, the generated Deployment uses that image directly. This is useful when you host the controller image in your own registry.
There is no official public image repository yet, so external controller images must currently be managed by the operator.
Create an upgrade
Create an OSUpgrade resource to request an upgrade:
kubectl apply -f upgrade.yaml
Example:
apiVersion: monok8s.io/v1alpha1
kind: OSUpgrade
metadata:
name: my-upgrade-2
spec:
version: v1.35.3
nodeSelector: {}
catalog:
inline: |
stable: v1.35.1
images:
- version: v1.34.6
url: https://example.invalid/images/monok8s-v1.34.6.img.zst
checksum: sha256:abc
- version: v1.34.1
url: https://example.invalid/images/monok8s-v1.34.1.img.zst
checksum: sha256:abc
- version: v1.35.0
url: https://example.invalid/images/monok8s-v1.35.0.img.zst
checksum: sha256:ghi
- version: v1.35.4
url: https://example.invalid/images/monok8s-v1.35.4.img.zst
checksum: sha256:jkl
- version: v1.35.1
url: http://localhost:8000/rootfs.ext4.zst
checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b
size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst
blocked:
- v1.34.0
spec.version
spec.version is the requested target version.
It may be either:
- an explicit version, such as
v1.35.3 stable, if the catalog defines astableversion
spec.nodeSelector
spec.nodeSelector selects the nodes that should receive the upgrade.
An empty selector means all eligible managed nodes.
spec.catalog
The catalog tells the agent where to find available OS images.
The catalog can be provided inline:
catalog:
inline: |
stable: v1.35.1
images:
- version: v1.35.1
url: https://example.invalid/images/monok8s-v1.35.1.img.zst
checksum: sha256:abc
size: 1858076672
It can also be loaded from a URL:
catalog:
url: https://example.com/images.yaml
Or from a ConfigMap:
catalog:
configMap: images-cm
ConfigMap catalogs require extra RBAC. This permission is not enabled by default. To use a ConfigMap catalog, edit the relevant ClusterRole and allow get on configmaps.
Catalog content should look like this:
stable: v1.35.1
images:
- version: v1.34.6
url: https://example.invalid/images/monok8s-v1.34.6.img.zst
checksum: sha256:abc
- version: v1.34.1
url: https://example.invalid/images/monok8s-v1.34.1.img.zst
checksum: sha256:abc
- version: v1.35.0
url: https://example.invalid/images/monok8s-v1.35.0.img.zst
checksum: sha256:ghi
- version: v1.35.4
url: https://example.invalid/images/monok8s-v1.35.4.img.zst
checksum: sha256:jkl
- version: v1.35.1
url: http://localhost:8000/rootfs.ext4.zst
checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b
size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst
blocked:
- v1.34.0
Monitor upgrades
List upgrade requests:
kubectl get osupgrades
Example output:
NAME DESIRED RESOLVED PHASE
my-upgrade-3 stable v1.35.4 Pending
my-upgrade-2 v1.35.3 v1.35.3 Accepted
my-downgrade-1 v1.33.2 v1.33.2 Rejected
List per-node progress:
kubectl get osupgradeprogresses
Example output:
NAME NODE SOURCE CURRENT TARGET STATUS
osupgrade-abc123f node-1 my-upgrade-2 v1.34.1 v1.35.3 Downloading
osupgrade-cde456g node-2 my-upgrade-2 v1.35.3 v1.35.3 Completed
Inspect one node's progress:
kubectl describe osupgradeprogress osupgrade-abc123f
Example resource:
apiVersion: monok8s.io/v1alpha1
kind: OSUpgradeProgress
metadata:
name: osupgrade-abc123f
spec:
sourceRef:
name: my-upgrade-2
nodeName: node-1
status:
currentVersion: v1.34.1
targetVersion: v1.35.3
phase: Downloading
startedAt: null
completedAt: null
lastUpdatedAt: null
retryCount: 0
inactivePartition: B
failureReason: ""
message: ""
Retry a failed upgrade
If an upgrade fails, for example because the image download failed, edit spec.retryNonce on the affected OSUpgradeProgress resource.
Any changed value is enough. The field is only used to tell the node agent that the user intentionally requested a retry.
Example:
kubectl patch osupgradeprogress osupgrade-abc123f \
--type merge \
-p '{"spec":{"retryNonce":"retry-1"}}'
If the same node fails again and you want to retry again, change the nonce to a new value:
kubectl patch osupgradeprogress osupgrade-abc123f \
--type merge \
-p '{"spec":{"retryNonce":"retry-2"}}'
Development notes
Flash an image manually into partition B
Use nmap's ncat. Other tools may work, but they are more likely to cause annoying stream or connection behavior.
On the sending machine:
pv out/rootfs.ext4.zst | ncat 10.0.0.10 1234 --send-only
On the receiving machine:
ncat -l 1234 --recv-only | \
zstd -d -c | \
dd of=/dev/sda3 bs=4M status=progress && \
sync && \
echo "SUCCESS"
Be careful with the target partition. The example writes to /dev/sda3, which is assumed to be rootfs B in that setup. Verify the partition layout before running this on real hardware.