Refine controller template and probe listeners
This commit is contained in:
198
docs/ota.md
198
docs/ota.md
@@ -1,19 +1,54 @@
|
||||
## Upgrade process
|
||||
# OS OTA Upgrades
|
||||
|
||||
We use an agent to watch the OSUpgrade CRD to handle this. Our image versions follows upstream.
|
||||
MonoK8s upgrades are driven through two custom resources:
|
||||
|
||||
To issue an upgrade. Simply use
|
||||
- `OSUpgrade`: the user-facing upgrade request.
|
||||
- `OSUpgradeProgress`: the per-node upgrade state watched and executed by the node agent.
|
||||
|
||||
The node agent does the actual upgrade work. It watches `OSUpgradeProgress` resources assigned to its node, downloads the selected image, writes it to the inactive rootfs partition, updates status, and reboots when ready.
|
||||
|
||||
The controller is optional but strongly recommended. It watches `OSUpgrade` resources and creates the matching `OSUpgradeProgress` resources for the target nodes.
|
||||
|
||||
## Install the controller
|
||||
|
||||
By default, each managed node only runs the node agent. The node agent does **not** watch `OSUpgrade` directly; it only watches `OSUpgradeProgress`.
|
||||
|
||||
You can create `OSUpgradeProgress` resources by hand, but normal users should not need to. Install the controller instead, then create `OSUpgrade` resources.
|
||||
|
||||
Install the controller from the existing node-agent image:
|
||||
|
||||
```bash
|
||||
kubectl exec -i -n mono-system ds/node-agent -- \
|
||||
ctl create controller --image REPO/IMAGE:TAG | kubectl apply -f -
|
||||
```
|
||||
|
||||
### `--image`
|
||||
|
||||
`--image` is optional.
|
||||
|
||||
If omitted, the generated Deployment uses the local controller image that is already shipped with managed nodes. In that mode, the controller Deployment is scheduled only onto managed nodes because the image is expected to exist locally.
|
||||
|
||||
If provided, the generated Deployment uses that image directly. This is useful when you host the controller image in your own registry.
|
||||
|
||||
There is no official public image repository yet, so external controller images must currently be managed by the operator.
|
||||
|
||||
## Create an upgrade
|
||||
|
||||
Create an `OSUpgrade` resource to request an upgrade:
|
||||
|
||||
```bash
|
||||
kubectl apply -f upgrade.yaml
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
Example yaml
|
||||
```yaml
|
||||
apiVersion: monok8s.io/v1alpha1
|
||||
kind: OSUpgrade
|
||||
metadata:
|
||||
name: "my-ugrade-2"
|
||||
name: my-upgrade-2
|
||||
spec:
|
||||
version: "v1.35.3"
|
||||
version: v1.35.3
|
||||
nodeSelector: {}
|
||||
catalog:
|
||||
inline: |
|
||||
@@ -34,24 +69,61 @@ spec:
|
||||
- version: v1.35.1
|
||||
url: http://localhost:8000/rootfs.ext4.zst
|
||||
checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b
|
||||
size: 1858076672 # expanded image size in bytes, use "zstd -lv image.zst to check"
|
||||
size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst
|
||||
blocked:
|
||||
- v1.34.0
|
||||
```
|
||||
|
||||
catalog also accepts URL or ConfigMap※
|
||||
### `spec.version`
|
||||
|
||||
`spec.version` is the requested target version.
|
||||
|
||||
It may be either:
|
||||
|
||||
- an explicit version, such as `v1.35.3`
|
||||
- `stable`, if the catalog defines a `stable` version
|
||||
|
||||
### `spec.nodeSelector`
|
||||
|
||||
`spec.nodeSelector` selects the nodes that should receive the upgrade.
|
||||
|
||||
An empty selector means all eligible managed nodes.
|
||||
|
||||
### `spec.catalog`
|
||||
|
||||
The catalog tells the agent where to find available OS images.
|
||||
|
||||
The catalog can be provided inline:
|
||||
|
||||
```yaml
|
||||
catalog:
|
||||
URL: https://example.com/images.yaml
|
||||
|
||||
catalog:
|
||||
ConfigMap: images-cm
|
||||
inline: |
|
||||
stable: v1.35.1
|
||||
images:
|
||||
- version: v1.35.1
|
||||
url: https://example.invalid/images/monok8s-v1.35.1.img.zst
|
||||
checksum: sha256:abc
|
||||
size: 1858076672
|
||||
```
|
||||
|
||||
※ ConfigMap requires additional RBAC permissions which is not enabled by default. You can edit
|
||||
the node-agent's ClusterRole and add `configmaps: get` to allow this.
|
||||
It can also be loaded from a URL:
|
||||
|
||||
```yaml
|
||||
catalog:
|
||||
url: https://example.com/images.yaml
|
||||
```
|
||||
|
||||
Or from a ConfigMap:
|
||||
|
||||
```yaml
|
||||
catalog:
|
||||
configMap: images-cm
|
||||
```
|
||||
|
||||
ConfigMap catalogs require extra RBAC. This permission is not enabled by default. To use a ConfigMap catalog, edit the relevant ClusterRole and allow `get` on `configmaps`.
|
||||
|
||||
Catalog content should look like this:
|
||||
|
||||
Contents should look like this
|
||||
```yaml
|
||||
stable: v1.35.1
|
||||
images:
|
||||
@@ -70,64 +142,114 @@ images:
|
||||
- version: v1.35.1
|
||||
url: http://localhost:8000/rootfs.ext4.zst
|
||||
checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b
|
||||
size: 1858076672 # expanded image size in bytes, use "zstd -lv image.zst to check"
|
||||
size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst
|
||||
blocked:
|
||||
- v1.34.0
|
||||
```
|
||||
|
||||
### Monitoring the upgrades
|
||||
## Monitor upgrades
|
||||
|
||||
kubectl get osugrades
|
||||
```
|
||||
NAME DESIRED RESOLVED PHASE TARGETS OK FAIL AGE
|
||||
my-upgrade-3 stable v1.35.4 RollingOut 3 1 0 1m
|
||||
my-upgrade-2 v1.35.3 v1.35.3 Accepted 2 0 0 1m
|
||||
my-downgrade-1 v1.33.2 v1.33.2 Rejected 2 0 2 1m
|
||||
List upgrade requests:
|
||||
|
||||
```bash
|
||||
kubectl get osupgrades
|
||||
```
|
||||
|
||||
kubectl get osupgradeprogress
|
||||
Example output:
|
||||
|
||||
```text
|
||||
NAME DESIRED RESOLVED PHASE
|
||||
my-upgrade-3 stable v1.35.4 Pending
|
||||
my-upgrade-2 v1.35.3 v1.35.3 Accepted
|
||||
my-downgrade-1 v1.33.2 v1.33.2 Rejected
|
||||
```
|
||||
|
||||
List per-node progress:
|
||||
|
||||
```bash
|
||||
kubectl get osupgradeprogresses
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```text
|
||||
NAME NODE SOURCE CURRENT TARGET STATUS
|
||||
osupgrade-abc123f node-1 my-upgrade-2 v1.34.1 v1.35.3 downloading
|
||||
osupgrade-cde456g node-2 my-upgrade-2 v1.35.3 v1.35.3 completed
|
||||
osupgrade-abc123f node-1 my-upgrade-2 v1.34.1 v1.35.3 Downloading
|
||||
osupgrade-cde456g node-2 my-upgrade-2 v1.35.3 v1.35.3 Completed
|
||||
```
|
||||
|
||||
Inspect one node's progress:
|
||||
|
||||
```bash
|
||||
kubectl describe osupgradeprogress osupgrade-abc123f
|
||||
```
|
||||
|
||||
Example resource:
|
||||
|
||||
```yaml
|
||||
apiVersion: monok8s.io/v1alpha1
|
||||
kind: OSUpgradeProgress
|
||||
metadata:
|
||||
name: "osupgrade-abc123f"
|
||||
name: osupgrade-abc123f
|
||||
spec:
|
||||
sourceRef:
|
||||
name: my-upgrade-2
|
||||
nodeName: node-1
|
||||
status:
|
||||
currentVersion: "v1.34.1"
|
||||
targetVersion: "v1.35.3"
|
||||
currentVersion: v1.34.1
|
||||
targetVersion: v1.35.3
|
||||
phase: Downloading
|
||||
startedAt: null
|
||||
completedAt: null
|
||||
lastUpdatedAt: null
|
||||
retryCount: 0
|
||||
inactivePartition: "B"
|
||||
inactivePartition: B
|
||||
failureReason: ""
|
||||
message: ""
|
||||
```
|
||||
|
||||
## Retry a failed upgrade
|
||||
|
||||
If an upgrade fails, for example because the image download failed, edit `spec.retryNonce` on the affected `OSUpgradeProgress` resource.
|
||||
|
||||
Any changed value is enough. The field is only used to tell the node agent that the user intentionally requested a retry.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
kubectl patch osupgradeprogress osupgrade-abc123f \
|
||||
--type merge \
|
||||
-p '{"spec":{"retryNonce":"retry-1"}}'
|
||||
```
|
||||
|
||||
If the same node fails again and you want to retry again, change the nonce to a new value:
|
||||
|
||||
```bash
|
||||
kubectl patch osupgradeprogress osupgrade-abc123f \
|
||||
--type merge \
|
||||
-p '{"spec":{"retryNonce":"retry-2"}}'
|
||||
```
|
||||
|
||||
## Development notes
|
||||
|
||||
### Flashing manually into partition B
|
||||
### Flash an image manually into partition B
|
||||
|
||||
**Use nmap ncat**. Otherwise we'll have all kinds of fabulous issues sending it.
|
||||
Use nmap's `ncat`. Other tools may work, but they are more likely to cause annoying stream or connection behavior.
|
||||
|
||||
Sending side
|
||||
```
|
||||
pv "out/rootfs.ext4.zst" | ncat 10.0.0.10 1234 --send-only
|
||||
On the sending machine:
|
||||
|
||||
```bash
|
||||
pv out/rootfs.ext4.zst | ncat 10.0.0.10 1234 --send-only
|
||||
```
|
||||
|
||||
Receiving side
|
||||
```
|
||||
ncat -l 1234 --recv-only | zstd -d -c | dd of=/dev/sda3 bs=4M status=progress && sync && echo "SUCCESS"
|
||||
On the receiving machine:
|
||||
|
||||
```bash
|
||||
ncat -l 1234 --recv-only | \
|
||||
zstd -d -c | \
|
||||
dd of=/dev/sda3 bs=4M status=progress && \
|
||||
sync && \
|
||||
echo "SUCCESS"
|
||||
```
|
||||
|
||||
Be careful with the target partition. The example writes to `/dev/sda3`, which is assumed to be rootfs B in that setup. Verify the partition layout before running this on real hardware.
|
||||
|
||||
Reference in New Issue
Block a user