Refine controller template and probe listeners

2026-04-27 00:28:25 +08:00
parent 8fae920fc8
commit d7c2dac944
20 changed files with 780 additions and 217 deletions
--- a/docs/installing-ssh-pod.md
+++ b/docs/installing-ssh-pod.md
@@ -0,0 +1,204 @@
+# Installing the recovery SSHD pod
+
+This page explains how to install a temporary SSH server pod for break-glass recovery.
+
+Use this when normal Kubernetes access is degraded, for example after the API server certificate expires or rotates and you need to retrieve updated host-side credentials.
+
+The SSHD pod is intended for recovery and debugging only. Remove it when you are done.
+
+## What this does
+
+The recovery pod starts an SSH server on the selected node and authorizes your local SSH public key.
+
+The pod also mounts selected host paths under `/host`, so you can inspect the host filesystem and run some host-side recovery commands through `chroot`.
+
+For example:
+
+```sh
+chroot /host /bin/sh -lc 'rc-status'
+chroot /host /bin/sh -lc 'rc-service crio status'
+chroot /host /bin/sh -lc 'rc-service kubelet status'
+```
+
+## Requirements
+
+You need:
+
+- A working `kubectl` connection to the cluster.
+- Access to the `node-agent` DaemonSet in the `mono-system` namespace.
+- A local SSH public key, usually `~/.ssh/id_rsa.pub` or `~/.ssh/id_ed25519.pub`.
+
+Use a public key file only. Do not pass your private key.
+
+## Generate the SSHD manifest
+
+To print the recovery SSHD manifest:
+
+```bash
+kubectl exec -i -n mono-system ds/node-agent -- \
+  ctl create sshd --authkeys /dev/stdin < ~/.ssh/id_rsa.pub
+```
+
+This reads your local public key and places it into the generated pod's `authorized_keys`.
+
+If you use Ed25519 keys, use:
+
+```bash
+kubectl exec -i -n mono-system ds/node-agent -- \
+  ctl create sshd --authkeys /dev/stdin < ~/.ssh/id_ed25519.pub
+```
+
+## Generate and apply the manifest
+
+To create the recovery SSHD resources in one step:
+
+```bash
+kubectl exec -i -n mono-system ds/node-agent -- \
+  ctl create sshd --authkeys /dev/stdin < ~/.ssh/id_rsa.pub \
+  | kubectl apply -f -
+```
+
+For Ed25519:
+
+```bash
+kubectl exec -i -n mono-system ds/node-agent -- \
+  ctl create sshd --authkeys /dev/stdin < ~/.ssh/id_ed25519.pub \
+  | kubectl apply -f -
+```
+
+## Why `-i` is used instead of `-it`
+
+Use `-i`, not `-it`, when piping the SSH public key.
+
+The `-t` option allocates a pseudo-TTY. A pseudo-TTY can modify piped input, which is not what you want when passing an SSH public key through stdin.
+
+Correct:
+
+```bash
+kubectl exec -i -n mono-system ds/node-agent -- \
+  ctl create sshd --authkeys /dev/stdin < ~/.ssh/id_rsa.pub
+```
+
+Avoid:
+
+```bash
+kubectl exec -it -n mono-system ds/node-agent -- \
+  ctl create sshd --authkeys /dev/stdin < ~/.ssh/id_rsa.pub
+```
+
+## Check that the pod is running
+
+After applying the manifest, check the pod:
+
+```bash
+kubectl get pods -n mono-system -l app.kubernetes.io/name=sshd
+```
+
+Check the service:
+
+```bash
+kubectl get svc -n mono-system -l app.kubernetes.io/name=sshd
+```
+
+If the pod does not start, inspect it:
+
+```bash
+kubectl describe pod -n mono-system -l app.kubernetes.io/name=sshd
+```
+
+## Connect through SSH
+
+The exact SSH command depends on how the generated service exposes the pod.
+
+If the service uses a NodePort such as `30022`, connect with:
+
+```bash
+ssh -p 30022 root@<node-ip>
+```
+
+Replace `<node-ip>` with the node's reachable IP address.
+
+## Access the host environment
+
+Inside the SSH session, the host filesystem is available under `/host`.
+
+Useful checks:
+
+```sh
+ls -la /host
+chroot /host /bin/sh -lc 'rc-status'
+chroot /host /bin/sh -lc 'rc-service crio status'
+chroot /host /bin/sh -lc 'rc-service kubelet status'
+```
+
+Restart CRI-O:
+
+```sh
+chroot /host /bin/sh -lc 'rc-service crio restart'
+```
+
+Restart kubelet:
+
+```sh
+chroot /host /bin/sh -lc 'rc-service kubelet restart'
+```
+
+You can also inspect host processes from the pod because the recovery pod uses the host PID namespace:
+
+```sh
+ps aux | grep -E 'kubelet|crio'
+```
+
+## Notes for monok8s host mounts
+
+The recovery pod does not mount host `/` directly.
+
+On monok8s, `/` and `/var` may be private mounts. Mounting them directly as host paths can fail with errors such as:
+
+```text
+path "/" is mounted on "/" but it is not a shared or slave mount
+```
+
+or:
+
+```text
+path "/var" is mounted on "/var" but it is not a shared or slave mount
+```
+
+Instead, the recovery pod assembles a minimal host root under `/host` from individual host paths.
+
+For `/var`, it uses the backing path:
+
+```text
+/data/var -> /host/var
+```
+
+This avoids the private bind-mount issue.
+
+## Remove the recovery pod
+
+When recovery is complete, remove the generated resources.
+
+If the resources use the default SSHD labels:
+
+```bash
+kubectl delete deployment -n mono-system -l app.kubernetes.io/name=sshd
+kubectl delete service -n mono-system -l app.kubernetes.io/name=sshd
+kubectl delete configmap -n mono-system -l app.kubernetes.io/name=sshd
+```
+
+If your generated manifest uses a fixed resource name, you can also remove them by name:
+
+```bash
+kubectl delete deployment -n mono-system sshd
+kubectl delete service -n mono-system sshd
+kubectl delete configmap -n mono-system sshd-authorized-keys
+```
+
+## Security warning
+
+This pod is powerful.
+
+It runs with root-level recovery access and can inspect or modify host files through `/host`. Treat it as a temporary break-glass tool, not a normal service.
+
+Do not leave it running after recovery.
--- a/docs/ota.md
+++ b/docs/ota.md
@@ -1,19 +1,54 @@
-## Upgrade process
+# OS OTA Upgrades

-We use an agent to watch the OSUpgrade CRD to handle this. Our image versions follows upstream.
+MonoK8s upgrades are driven through two custom resources:

-To issue an upgrade. Simply use
+- `OSUpgrade`: the user-facing upgrade request.
+- `OSUpgradeProgress`: the per-node upgrade state watched and executed by the node agent.

+The node agent does the actual upgrade work. It watches `OSUpgradeProgress` resources assigned to its node, downloads the selected image, writes it to the inactive rootfs partition, updates status, and reboots when ready.
+
+The controller is optional but strongly recommended. It watches `OSUpgrade` resources and creates the matching `OSUpgradeProgress` resources for the target nodes.
+
+## Install the controller
+
+By default, each managed node only runs the node agent. The node agent does **not** watch `OSUpgrade` directly; it only watches `OSUpgradeProgress`.
+
+You can create `OSUpgradeProgress` resources by hand, but normal users should not need to. Install the controller instead, then create `OSUpgrade` resources.
+
+Install the controller from the existing node-agent image:
+
+```bash
+kubectl exec -i -n mono-system ds/node-agent -- \
+  ctl create controller --image REPO/IMAGE:TAG | kubectl apply -f -
+```
+
+### `--image`
+
+`--image` is optional.
+
+If omitted, the generated Deployment uses the local controller image that is already shipped with managed nodes. In that mode, the controller Deployment is scheduled only onto managed nodes because the image is expected to exist locally.
+
+If provided, the generated Deployment uses that image directly. This is useful when you host the controller image in your own registry.
+
+There is no official public image repository yet, so external controller images must currently be managed by the operator.
+
+## Create an upgrade
+
+Create an `OSUpgrade` resource to request an upgrade:
+
+```bash
 kubectl apply -f upgrade.yaml
+```
+
+Example:

-Example yaml
 ```yaml
 apiVersion: monok8s.io/v1alpha1
 kind: OSUpgrade
 metadata:
-  name: "my-ugrade-2"
+  name: my-upgrade-2
 spec:
-  version: "v1.35.3"
+  version: v1.35.3
  nodeSelector: {}
  catalog:
    inline: |
@@ -34,24 +69,61 @@ spec:
        - version: v1.35.1
          url: http://localhost:8000/rootfs.ext4.zst
          checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b
-          size: 1858076672 # expanded image size in bytes, use "zstd -lv image.zst to check"
+          size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst
      blocked:
        - v1.34.0
 ```

-catalog also accepts URL or ConfigMap※
+### `spec.version`
+
+`spec.version` is the requested target version.
+
+It may be either:
+
+- an explicit version, such as `v1.35.3`
+- `stable`, if the catalog defines a `stable` version
+
+### `spec.nodeSelector`
+
+`spec.nodeSelector` selects the nodes that should receive the upgrade.
+
+An empty selector means all eligible managed nodes.
+
+### `spec.catalog`
+
+The catalog tells the agent where to find available OS images.
+
+The catalog can be provided inline:
+
 ```yaml
 catalog:
-  URL: https://example.com/images.yaml
-
-catalog:
-  ConfigMap: images-cm
+  inline: |
+    stable: v1.35.1
+    images:
+      - version: v1.35.1
+        url: https://example.invalid/images/monok8s-v1.35.1.img.zst
+        checksum: sha256:abc
+        size: 1858076672
 ```

-※ ConfigMap requires additional RBAC permissions which is not enabled by default. You can edit
-the node-agent's ClusterRole and add `configmaps: get` to allow this.
+It can also be loaded from a URL:
+
+```yaml
+catalog:
+  url: https://example.com/images.yaml
+```
+
+Or from a ConfigMap:
+
+```yaml
+catalog:
+  configMap: images-cm
+```
+
+ConfigMap catalogs require extra RBAC. This permission is not enabled by default. To use a ConfigMap catalog, edit the relevant ClusterRole and allow `get` on `configmaps`.
+
+Catalog content should look like this:

-Contents should look like this
 ```yaml
 stable: v1.35.1
 images:
@@ -70,64 +142,114 @@ images:
  - version: v1.35.1
    url: http://localhost:8000/rootfs.ext4.zst
    checksum: sha256:99af82a263deca44ad91d21d684f0fa944d5d0456a1da540f1c644f8aa59b14b
-    size: 1858076672 # expanded image size in bytes, use "zstd -lv image.zst to check"
+    size: 1858076672 # expanded image size in bytes; check with: zstd -lv image.zst
 blocked:
  - v1.34.0
 ```

-### Monitoring the upgrades
+## Monitor upgrades

-kubectl get osugrades
-```
-NAME            DESIRED    RESOLVED   PHASE       TARGETS   OK   FAIL   AGE
-my-upgrade-3    stable     v1.35.4    RollingOut  3         1    0      1m
-my-upgrade-2    v1.35.3    v1.35.3    Accepted    2         0    0      1m
-my-downgrade-1  v1.33.2    v1.33.2    Rejected    2         0    2      1m
+List upgrade requests:
+
+```bash
+kubectl get osupgrades
 ```

-kubectl get osupgradeprogress
+Example output:
+
+```text
+NAME            DESIRED    RESOLVED   PHASE
+my-upgrade-3    stable     v1.35.4    Pending
+my-upgrade-2    v1.35.3    v1.35.3    Accepted
+my-downgrade-1  v1.33.2    v1.33.2    Rejected
 ```
+
+List per-node progress:
+
+```bash
+kubectl get osupgradeprogresses
+```
+
+Example output:
+
+```text
 NAME                NODE        SOURCE        CURRENT  TARGET   STATUS
-osupgrade-abc123f   node-1      my-upgrade-2  v1.34.1  v1.35.3  downloading
-osupgrade-cde456g   node-2      my-upgrade-2  v1.35.3  v1.35.3  completed
+osupgrade-abc123f   node-1      my-upgrade-2  v1.34.1  v1.35.3  Downloading
+osupgrade-cde456g   node-2      my-upgrade-2  v1.35.3  v1.35.3  Completed
 ```

+Inspect one node's progress:

+```bash
 kubectl describe osupgradeprogress osupgrade-abc123f
+```
+
+Example resource:
+
 ```yaml
 apiVersion: monok8s.io/v1alpha1
 kind: OSUpgradeProgress
 metadata:
-  name: "osupgrade-abc123f"
+  name: osupgrade-abc123f
 spec:
  sourceRef:
    name: my-upgrade-2
  nodeName: node-1
 status:
-  currentVersion: "v1.34.1"
-  targetVersion: "v1.35.3"
+  currentVersion: v1.34.1
+  targetVersion: v1.35.3
  phase: Downloading
  startedAt: null
  completedAt: null
  lastUpdatedAt: null
  retryCount: 0
-  inactivePartition: "B"
+  inactivePartition: B
  failureReason: ""
  message: ""
 ```

+## Retry a failed upgrade
+
+If an upgrade fails, for example because the image download failed, edit `spec.retryNonce` on the affected `OSUpgradeProgress` resource.
+
+Any changed value is enough. The field is only used to tell the node agent that the user intentionally requested a retry.
+
+Example:
+
+```bash
+kubectl patch osupgradeprogress osupgrade-abc123f \
+  --type merge \
+  -p '{"spec":{"retryNonce":"retry-1"}}'
+```
+
+If the same node fails again and you want to retry again, change the nonce to a new value:
+
+```bash
+kubectl patch osupgradeprogress osupgrade-abc123f \
+  --type merge \
+  -p '{"spec":{"retryNonce":"retry-2"}}'
+```
+
 ## Development notes

-### Flashing manually into partition B
+### Flash an image manually into partition B

-**Use nmap ncat**. Otherwise we'll have all kinds of fabulous issues sending it.
+Use nmap's `ncat`. Other tools may work, but they are more likely to cause annoying stream or connection behavior.

-Sending side
-```
-pv "out/rootfs.ext4.zst" | ncat 10.0.0.10 1234 --send-only
+On the sending machine:
+
+```bash
+pv out/rootfs.ext4.zst | ncat 10.0.0.10 1234 --send-only
 ```

-Receiving side
-```
-ncat -l 1234 --recv-only | zstd -d -c | dd of=/dev/sda3 bs=4M status=progress && sync && echo "SUCCESS"
+On the receiving machine:
+
+```bash
+ncat -l 1234 --recv-only | \
+  zstd -d -c | \
+  dd of=/dev/sda3 bs=4M status=progress && \
+  sync && \
+  echo "SUCCESS"
 ```
+
+Be careful with the target partition. The example writes to `/dev/sda3`, which is assumed to be rootfs B in that setup. Verify the partition layout before running this on real hardware.