PodOpsLifecycle
Kubernetes provides a set of default controllers for workload management, like StatefulSet, Deployment, DaemonSet for instances. While user services outside Kubernetes have difficulty to participate in the operation lifecycle of a pod.
PodOpsLifecycle attempts to provide Kubernetes administrators and developers with finer-grained control the entire lifecycle of a pod. For example, we can develop a controller to do some necessary things in both the PreCheck and PostCheck phases to avoid traffic loss.
Goals
- Provides extensibility that allows users to control the whole lifecycle of pods using the PodOpsLifecycle mechanism.
- Provide some concurrency, multi controllers can operate the pod in the same time. For example, when a pod is going to be updated, other controllers may want to delete it.
- All the lifecycle phases of a pod can be traced.
Proposal
User Stories
Story 1
As a developer that focuses on pod traffic, I should remove the endpoint once the readiness gate pod.kusionstack.io/service-ready
set to false which means traffic to the pod should be turned off, and I should add the endpoint once the readiness gate pod.kusionstack.io/service-ready
set to false and pod is ready which means traffic to the pod should be turned on.
The finalizer can be added and removed automatically if we implement interface ReconcileAdapter provided by resourceconsist controller.
Story 2
- As a developer that maintain a system that provide pod operations like update and scale, I should add the label
operating.podopslifecycle.kusionstack.io/<id>=<time>
andoperation-type.podopslifecycle.kusionstack.io/<id>=<type>
at the same time when I want to operate a pod. - If the operation is completed I should remove the label
operating.podopslifecycle.kusionstack.io/<id>=<time>
andoperation-type.podopslifecycle.kusionstack.io/<id>=<type>
at the same time when. - If I want to cancel the operation, I need to add the label
undo-operation-type.podopslifecycle.kusionstack.io/<id>=<type>
.
The sequence diagram below describes how to update a pod.
Story 3
As a developer that cares about pod operation observability, I can use the <id>=<time>
and <id>=<type>
in the labels to tracing a pod. The <time>
is a unix nano time, and the <type>
is a string that describe the operation type, and the <id>
is a string that used in the whole operation lifecycle.
Design Details
- Podopslifecycle mechanism is provided by a mutating webhook server and a controller. The mutating webhook server will chage the labels at the right time, and the controller will set the readinessgate
pod.kusionstack.io/service-ready
to true or false if necessary. The controller will also chage the label at some time. - The label
operating.podopslifecycle.kusionstack.io/<id>=<time>
andoperation-type.podopslifecycle.kusionstack.io/<id>=<type>
will be validated by a validating webhook server, they must be added or removed at the same time by the operation controller. - Traffic controller should turn the traffic on or off based on the readiness gate
pod.kusionstack.io/service-ready
and pod conditionReady
. - Protection finalizer names must have prefix
prot.podopslifecycle.kusionstack.io
. They are used to determine whether the traffic has been completely removed or is fully prepared. - The special label
podopslifecycle.kusionstack.io/service-available
indicate a pod is available to serve. - We can use the message
<id>=<time>
and<id>=<type>
in the labels to tracing a pod. The<time>
is a unix time.
Below we use a sequence diagram to show how to use podopslifecycle mechanism to avoid traffic loss. You can also use this podopslifecycle mechanism to do others things, for example, to prevent tasks to be interrupted when they are need to run for a long time.