19 KiB
OAM Rollout Controller Design
- Owner: Ryan Zhang (@ryangzhang-oss)
- Date: 01/14/2021
- Status: Draft
Table of Contents
Introduction
Rollout or Upgrade is one of the most essential "day 2" operation on any application
. KubeVela, as an application centric platform, definitely needs to provide a customized solution
to alleviate the burden on the application operators. There are several popular rollout solutions
, i.e. flagger, in the open source community. However, none of them
work directly with our OAM framework. Therefore, we propose to create an OAM native rollout
framework that can address all the application rollout/upgrade needs in Kubevela.
Two flavors of Rollout
After hearing from the OAM community, it is clear to us that there are two flavors of rollout that we need to support.
- One way is through an OAM trait. This flavor works for existing OAM applications that don't have many specialized requirements such as interaction with other traits. For example, most rollout operations don't work well with scaling operations. Thus, the application operator needs to remove any scalar traits from the component before applying the rollout trait. Another example is rollout operations usually also involve traffic spliting of sorts. Thus, the application operators might also need to manually adjust the related traffic trait before and after applying the rollout trait.
- The other ways is through a new applicationDeployment CR which directly reference different versions of applications instead of workloads. This opens up the possibility for the controller to solve conflicts between traits automatically. This resource, however, not a trait and requires the application to be immutable.
Design Principles and Goals
We design our controllers with the following principles in mind
- First, we want all flavors of rollout controllers share the same core rollout related logic. The trait and application related logic can be easily encapsulated into its own package.
- Second, the core rollout related logic is easily extensible to support different type of workloads, i.e. Deployment, Cloneset, Statefulset, Daemonset or even customized workloads.
- Thirdly, the core rollout related logic has a well documented state machine that does state transition explicitly.
- Finally, the controllers can support all the rollout/upgrade needs of an application running in a production environment.
Proposal Layout
Here is the rest of the proposal
- First, we will present the exact rollout CRD spec.
- Second, we will give a high level design on how do we plan to implement the controller.
- Third, we will present the state machine and their transition events.
- Finally, we will list common rollout scenarios and their corresponding user experience and implementation details.
Rollout API Design
Let's start with the rollout trait spec definition. The applicationDeployment spec is very similar.
// RolloutTraitSpec defines the desired state of RolloutTrait
type RolloutTraitSpec struct {
// TargetRef references a target resource that contains the newer version
// of the software. We assumed that new resource already exists.
// This is the only resource we work on if the resource is a stateful resource (cloneset/statefulset)
TargetRef corev1.ObjectReference `json:"targetRef"`
// SourceRef references the list of resources that contains the older version
// of the software. We assume that it's the first time to deploy when we cannot find any source.
// +optional
SourceRef []corev1.ObjectReference `json:"sourceRef,omitempty"`
// RolloutPlan is the details on how to rollout the resources
RolloutPlan RolloutPlan `json:"rolloutPlan"`
}
The target and source here are the same as other OAM traits that refers to a workload instance with its GVK and name.
We can see that the core part of the rollout logic is encapsulated by the RolloutPlan object. This
allows us to write multiple controllers that share the same logic without duplicating the code
. Here is the definition of the RolloutPlan structure.
// RolloutPlan fines the details of the rollout plan
type RolloutPlan struct {
// RolloutStrategy defines strategies for the rollout plan
// +optional
RolloutStrategy RolloutStrategyType `json:"rolloutStrategy,omitempty"`
// The size of the target resource. The default is the same
// as the size of the source resource.
// +optional
TargetSize *int32 `json:"targetSize,omitempty"`
// The number of batches, default = 1
// mutually exclusive to RolloutBatches
// +optional
NumBatches *int32 `json:"numBatches,omitempty"`
// The exact distribution among batches.
// mutually exclusive to NumBatches
// +optional
RolloutBatches []RolloutBatch `json:"rolloutBatches,omitempty"`
// All pods in the batches up to the batchPartition (included) will have
// the target resource specification while the rest still have the source resource
// This is designed for the operators to manually rollout
// Default is the the number of batches which will rollout all the batches
// +optional
BatchPartition *int32 `json:"lastBatchToRollout,omitempty"`
// RevertOnDelete revert the rollout when the rollout CR is deleted, default is false
//+optional
RevertOnDelete bool `json:"revertOnDelete,omitempty"`
// Paused the rollout, default is false
//+optional
Paused bool `json:"paused,omitempty"`
// RolloutWebhooks provides a way for the rollout to interact with an external process
// +optional
RolloutWebhooks []RolloutWebhook `json:"rolloutWebhooks,omitempty"`
// CanaryMetric provides a way for the rollout process to automatically check certain metrics
// before complete the process
// +optional
CanaryMetric []CanaryMetric `json:"canaryMetric,omitempty"`
}
User Experience Workflow
OAM rollout experience is different from flagger in some key areas and here are the implications on its impact on the user experience.
- We assume that the resources it refers to are immutable. In contrast, flagger watches over a
target resource and reacts whenever the target's specification changes.
- The trait version of the controller refers to componentRevision.
- The application version of the controller refers to immutable application.
- The rollout logic works only once and stops after it reaches a terminal state. One can still change the rollout plan in the middle of the rollout as long as it does not change the pods that are already updated.
- The applicationDeployment controller only rollout one component in the applications for now.
- Users in general should rely on the rollout CR to do the actual rollout which means they
shall set the
replicasorpartitionfield of the new resources to the starting value indicated in the detailed rollout plan design.
Notable implementation level design decisions
Here are some high level implementation design decisions that will impact the user experience of rolling out.
Rollout workflows
As we mentioned in the introduction section, we will implement two rollout controllers that work on different levels. At the end, they both emit an in-memory rollout plan object which includes references to the target and source kubernetes resources that the rollout planner will execute upon. For example, the applicationDeployment controller will get the component from the application and extract the real workload from it before passing it to the rollout plan object.
With that said, two controllers operate differently to extract the real workload. Here are the high level descriptions of how each works.
Application inplace upgrade workflow
The most natural way to upgrade an application is to upgrade it in-place which means the users just change the application, and the system will pick up the change, then apply to the runtime . The implementation of this type of upgrade looks like this:
- The application controller compute a hash value of the applicationConfiguration. The application controller always use the component revision name in the AC it generates. This guaranteed that the AC also changes when the component changes.
- The application controller creates the applicationConfiguration with a new name (with a suffix ) upon changing of its hash value and with a pre-determined annotation "app.oam.dev/appconfig-rollout" set to true.
- The AC controller have special handle logic in the apply part of the logic. The exact logic depends on the workload type and we will list each in the [rollout with different workload](#Rollout plan work with different type of workloads) section . This special AC logic is also the real magic for the other rollout scenario to work as AC controller is the only entity that is directly responsible for emitting the workload to the k8s.
ApplicationDeployment workflow
When an appDeployment is used to do application level rollout, the target application is not reconciled by the application controller yet. This is to make sure the appDeployment controller has the full control of the new application from the beginning. We will use a pre-defined annotation "app.oam.dev/rollout-template" that equals to "true" to facilitate that. We expect any system, such as the kubevela apiserver, that utilizes an appDeployment object to follow this rule.
- Upon creation, the appDeployment controller marks itself as the owner of the application. The application controller will have built-in logic to ignore any applications that has the "app.oam.dev/rollout-template" annotation set to true.
- the appDeployment controller will also add another annotation "app.oam.dev/creating" to the application to be passed down to the ApplicationConfiguration CR it generates to mark that the AC is reconciled for the first time.
- The ApplicationConfiguration controller recognizes this annotation, and it will see if there is anything it needs to do before emitting the workload to the k8s. The AC controller removes this annotation at the end of a successful reconcile.
- The appDeployment controller can change the target application fields. For example,
- It might remove all the conflict traits, such as HPA during upgrade.
- It might modify the label selectors fields in the services to make sure there are ways to differentiate traffic routing to the old and new application resources.
- The appDeployment controller will return the control of the new application back to the
application controller after it makes the initial adjustment of the application by removing the
annotation.
- We will use a webhook to ensure that the "rollout" annotation cannot be added back once removed.
- Upon a successful rollout, the appDeployment controller leaves no pods running for the old application.
- Upon a failed rollout, the condition is not determined, it could result in an unstable state since both the old and new applications have been modified.
- Thus, we introduced a
revertOnDeletefield so that a user can delete the appDeployment and expect the old application to be intact, and the new application takes no effect.
Rollout trait workflow
The rollout traits controller only works with componentRevision.
- The component controller emits the new component revision when a new component is created or updated.
- The application configuration controller emits the new component and assign the componentRevision as the source and target of rollout trait.
- We assume that there is no other scalar related traits deployed at the same time. We will use
conflict-withfields in the traitDefinition and webhooks to enforce that. - Upon a successful rollout, the rollout trait will keep the old component revision with no pod left.
- Upon a failed rollout,the rollout trait will just stop and leaves the resource mixed. This state mostly should still work since the other traits are not touched.
Rollout plan work with different type of workloads
The rollout plan part of the rollout logic is shared between all rollout controller. It comes with references to the target and source workload. The controller is responsible for fetching the different revisions of the resources. Deployment and Cloneset represents the two major types of resources in that Cloneset can contain both the new and old revisions at a stable state while deployment only contains one version when it's stable.
Rollout plan works with deployment
It's pretty straightforward for the outer controller to create the in-memory target and source deployment object since they are just a copy of the kubernetes resources.
- The deployment workload should set the
Pausedfield to be true by the user in the appDeployment case. - Another options is for the user to leave the
replicasfield as 0 if the rollout does not have access to that field. - If the rollout is successful, the source deployment
replicasfield will be zero and the target deployment will be the same as the original source. - If the rollout failed, we will leave the state as it is.
- If the rollout failed and
revertOnDeleteistrueand the rollout CR is deleted, then the source deploymentreplicasfield will be turned back to before rollout and the target deployment'sreplicasfield will be zero.
Rollout plan works with cloneset
The outer controller creates the in-memory target and source cloneset object with different image ids. The source is only used when we do need to do rollback.
- The user should set the Cloneset workload's
Pausedfield to be true by the user in the appDeployment case. - Another options is for the user to leave the
partitionfield in a value that effectively stop upgrade if the rollout does not have access to that field. - The rollout plan mostly just adjusts the
partitionfield in the Cloneset and leaves the rest of the rollout logic to the Cloneset controller. - If the rollout is successful, the
partitionfield will be zero - If the rollout failed, we will leave the
partitionfield as the last time we touch it. - If the rollout failed and
revertOnDeleteistrueand the rollout CR is deleted, we will perform a revert on the Cloneset. Note that only the latest Cloneset controller allows rollback when one increases thepartitionfield.
Operational features
- We will use the same service mesh model as flagger in the sense that user needs to provide the
service mesh provider type and give us the reference to an ingress object.
- We plan to directly import the various flagger mesh implementation.
- We plan to import the implementation of notifiers from flagger too
- We can consider adding an alert rule in the rollout Plan api in the future
We use webhook to validate whether the change to the rollout CRD is valid.
We will go with strict restrictions on the CRD update in that nothing can be updated other than the following fields:
- the BatchPartition field can only increase unless the target ref has changed
- the RolloutBatches field can only change the part after the BatchPartition field
- the CanaryMetric/Paused/RevertOnDelete can be modified freely.
- the rollout controller will simply replace the existing rollout CR value in the in-memory map which will lead to its in-memory execution to stop. The new CR will kick off a new execution loop which will resume the rollout operation based on the rollout and its resources status which follows the pre-determined state machine transition.
The controller have extension points setup for the following plug-ins:
- workloads. Each workload handler needs to implement the following operations:
- scale the resources
- determine the health of the workload
- report how many replicas are upgraded/ready/available
- (future) metrics provider.
- (future) service mesh provider. Each mesh provider needs to implement the following operations:
- direct certain percent of the traffic to the source/target workload
- fetch the current traffic split
State Transition
Here is the state transition graph
Here are the various top-level states of the rollout
// VerifyingSpecState indicates that the rollout is in the stage of verifying the rollout settings
// and the controller can locate both the target and the source
VerifyingSpecState RollingState = "verifyingSpec"
// InitializingState indicates that the rollout is initializing all the new resources
InitializingState RollingState = "initializing"
// RollingInBatchesState indicates that the rollout starts rolling
RollingInBatchesState RollingState = "rollingInBatches"
// FinalisingState indicates that the rollout is finalizing, possibly clean up the old resources, adjust traffic
FinalisingState RollingState = "finalising"
// RolloutFailingState indicates that the rollout is failing
// one needs to finalize it before mark it as failed by cleaning up the old resources, adjust traffic
RolloutFailingState RollingState = "rolloutFailing"
// RolloutSucceedState indicates that rollout successfully completed to match the desired target state
RolloutSucceedState RollingState = "rolloutSucceed"
// RolloutAbandoningState indicates that the rollout is abandoned, can be restarted. This is a terminal state
RolloutAbandoningState RollingState = "rolloutAbandoned"
// RolloutFailedState indicates that rollout is failed, the target replica is not reached
// we can not move forward anymore, we will let the client to decide when or whether to revert.
RolloutFailedState RollingState = "rolloutFailed"
)
These are the sub-states of the rollout when its in the rolling state.
// BatchInitializingState still rolling the batch, the batch rolling is not completed yet
BatchInitializingState BatchRollingState = "batchInitializing"
// BatchInRollingState still rolling the batch, the batch rolling is not completed yet
BatchInRollingState BatchRollingState = "batchInRolling"
// BatchVerifyingState verifying if the application is ready to roll.
BatchVerifyingState BatchRollingState = "batchVerifying"
// BatchRolloutFailedState indicates that the batch didn't get the manual or automatic approval
BatchRolloutFailedState BatchRollingState = "batchVerifyFailed"
// BatchFinalizingState indicates that all the pods in the are available, we can move on to the next batch
BatchFinalizingState BatchRollingState = "batchFinalizing"
// BatchReadyState indicates that all the pods in the are upgraded and its state is ready
BatchReadyState BatchRollingState = "batchReady"
)
Future work
The applicationDeployment should also work on traits. For example, if someone plans to update the HPA traits formula, there should be a way for them to rolling out the HPA change step by step too.
