Refine controller template and probe listeners

This commit is contained in:
2026-04-27 00:28:25 +08:00
parent 8fae920fc8
commit d7c2dac944
20 changed files with 780 additions and 217 deletions

View File

@@ -1,19 +1,54 @@
## Upgrade process
# OS OTA Upgrades
We use an agent to watch the OSUpgrade CRD to handle this. Our image versions follows upstream.
MonoK8s upgrades are driven through two custom resources:
To issue an upgrade. Simply use
- `OSUpgrade`: the user-facing upgrade request.
- `OSUpgradeProgress`: the per-node upgrade state watched and executed by the node agent.
The node agent does the actual upgrade work. It watches `OSUpgradeProgress` resources assigned to its node, downloads the selected image, writes it to the inactive rootfs partition, updates status, and reboots when ready.
The controller is optional but strongly recommended. It watches `OSUpgrade` resources and creates the matching `OSUpgradeProgress` resources for the target nodes.
## Install the controller
By default, each managed node only runs the node agent. The node agent does **not** watch `OSUpgrade` directly; it only watches `OSUpgradeProgress`.
You can create `OSUpgradeProgress` resources by hand, but normal users should not need to. Install the controller instead, then create `OSUpgrade` resources.
Install the controller from the existing node-agent image:
```bash
kubectl exec -i -n mono-system ds/node-agent -- \
ctl create controller --image REPO/IMAGE:TAG | kubectl apply -f -
```
### `--image`
`--image` is optional.
If omitted, the generated Deployment uses the local controller image that is already shipped with managed nodes. In that mode, the controller Deployment is scheduled only onto managed nodes because the image is expected to exist locally.
If provided, the generated Deployment uses that image directly. This is useful when you host the controller image in your own registry.
There is no official public image repository yet, so external controller images must currently be managed by the operator.
## Create an upgrade
Create an `OSUpgrade` resource to request an upgrade:
```bash
kubectl apply -f upgrade.yaml
```
Example:
Example yaml
```yaml
apiVersion: monok8s.io/v1alpha1
kind: OSUpgrade
metadata:
name: "my-ugrade-2"
name: my-upgrade-2
spec:
version: "v1.35.3"
version: v1.35.3
nodeSelector: {}
catalog:
inline: |
@@ -34,24 +69,61 @@ spec:
- version: v1.35.1
url: http://localhost:8000/rootfs.ext4.zst
checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b
size: 1858076672 # expanded image size in bytes, use "zstd -lv image.zst to check"
size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst
blocked:
- v1.34.0
```
catalog also accepts URL or ConfigMap※
### `spec.version`
`spec.version` is the requested target version.
It may be either:
- an explicit version, such as `v1.35.3`
- `stable`, if the catalog defines a `stable` version
### `spec.nodeSelector`
`spec.nodeSelector` selects the nodes that should receive the upgrade.
An empty selector means all eligible managed nodes.
### `spec.catalog`
The catalog tells the agent where to find available OS images.
The catalog can be provided inline:
```yaml
catalog:
URL: https://example.com/images.yaml
catalog:
ConfigMap: images-cm
inline: |
stable: v1.35.1
images:
- version: v1.35.1
url: https://example.invalid/images/monok8s-v1.35.1.img.zst
checksum: sha256:abc
size: 1858076672
```
※ ConfigMap requires additional RBAC permissions which is not enabled by default. You can edit
the node-agent's ClusterRole and add `configmaps: get` to allow this.
It can also be loaded from a URL:
```yaml
catalog:
url: https://example.com/images.yaml
```
Or from a ConfigMap:
```yaml
catalog:
configMap: images-cm
```
ConfigMap catalogs require extra RBAC. This permission is not enabled by default. To use a ConfigMap catalog, edit the relevant ClusterRole and allow `get` on `configmaps`.
Catalog content should look like this:
Contents should look like this
```yaml
stable: v1.35.1
images:
@@ -70,64 +142,114 @@ images:
- version: v1.35.1
url: http://localhost:8000/rootfs.ext4.zst
checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b
size: 1858076672 # expanded image size in bytes, use "zstd -lv image.zst to check"
size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst
blocked:
- v1.34.0
```
### Monitoring the upgrades
## Monitor upgrades
kubectl get osugrades
```
NAME DESIRED RESOLVED PHASE TARGETS OK FAIL AGE
my-upgrade-3 stable v1.35.4 RollingOut 3 1 0 1m
my-upgrade-2 v1.35.3 v1.35.3 Accepted 2 0 0 1m
my-downgrade-1 v1.33.2 v1.33.2 Rejected 2 0 2 1m
List upgrade requests:
```bash
kubectl get osupgrades
```
kubectl get osupgradeprogress
Example output:
```text
NAME DESIRED RESOLVED PHASE
my-upgrade-3 stable v1.35.4 Pending
my-upgrade-2 v1.35.3 v1.35.3 Accepted
my-downgrade-1 v1.33.2 v1.33.2 Rejected
```
List per-node progress:
```bash
kubectl get osupgradeprogresses
```
Example output:
```text
NAME NODE SOURCE CURRENT TARGET STATUS
osupgrade-abc123f node-1 my-upgrade-2 v1.34.1 v1.35.3 downloading
osupgrade-cde456g node-2 my-upgrade-2 v1.35.3 v1.35.3 completed
osupgrade-abc123f node-1 my-upgrade-2 v1.34.1 v1.35.3 Downloading
osupgrade-cde456g node-2 my-upgrade-2 v1.35.3 v1.35.3 Completed
```
Inspect one node's progress:
```bash
kubectl describe osupgradeprogress osupgrade-abc123f
```
Example resource:
```yaml
apiVersion: monok8s.io/v1alpha1
kind: OSUpgradeProgress
metadata:
name: "osupgrade-abc123f"
name: osupgrade-abc123f
spec:
sourceRef:
name: my-upgrade-2
nodeName: node-1
status:
currentVersion: "v1.34.1"
targetVersion: "v1.35.3"
currentVersion: v1.34.1
targetVersion: v1.35.3
phase: Downloading
startedAt: null
completedAt: null
lastUpdatedAt: null
retryCount: 0
inactivePartition: "B"
inactivePartition: B
failureReason: ""
message: ""
```
## Retry a failed upgrade
If an upgrade fails, for example because the image download failed, edit `spec.retryNonce` on the affected `OSUpgradeProgress` resource.
Any changed value is enough. The field is only used to tell the node agent that the user intentionally requested a retry.
Example:
```bash
kubectl patch osupgradeprogress osupgrade-abc123f \
--type merge \
-p '{"spec":{"retryNonce":"retry-1"}}'
```
If the same node fails again and you want to retry again, change the nonce to a new value:
```bash
kubectl patch osupgradeprogress osupgrade-abc123f \
--type merge \
-p '{"spec":{"retryNonce":"retry-2"}}'
```
## Development notes
### Flashing manually into partition B
### Flash an image manually into partition B
**Use nmap ncat**. Otherwise we'll have all kinds of fabulous issues sending it.
Use nmap's `ncat`. Other tools may work, but they are more likely to cause annoying stream or connection behavior.
Sending side
```
pv "out/rootfs.ext4.zst" | ncat 10.0.0.10 1234 --send-only
On the sending machine:
```bash
pv out/rootfs.ext4.zst | ncat 10.0.0.10 1234 --send-only
```
Receiving side
```
ncat -l 1234 --recv-only | zstd -d -c | dd of=/dev/sda3 bs=4M status=progress && sync && echo "SUCCESS"
On the receiving machine:
```bash
ncat -l 1234 --recv-only | \
zstd -d -c | \
dd of=/dev/sda3 bs=4M status=progress && \
sync && \
echo "SUCCESS"
```
Be careful with the target partition. The example writes to `/dev/sda3`, which is assumed to be rootfs B in that setup. Verify the partition layout before running this on real hardware.