- The current control logic could not be customized. Once the Vela controller renders final k8s resources, it simply applies them without any extension points. In some scenarios, users want to do more complex operations like:
Based on the above formula, we will take `1s` as the min time and `60s` as the max time. You can change the max time by setting `MaxWorkflowWaitBackoffTime`.
For this case, we will retry the workflow step 10 times by default, and if the workflow step is still `failed`, we will terminate this workflow, and it's message will be `The workflow terminates automatically because the failed times of steps have reached the limit`. You can change the retry times by setting `MaxWorkflowStepErrorRetryTimes`.
The spec change also means that the application needs to be re-executed, and the application controller will clear the status of application includes workflow status.
- The Task Manager will apply the workflow object with annotation `app.oam.dev/workflow-context`. This annotation will pass in the context marshalled in json defined as the following:
- The workflow object's status condition should turn to be `True` status and `Succeeded` reason, and `observedGeneration` to match the resource's generation per se.
This is to solve the [issue of passing data from the old generation][1].
We will provide CUE op library to check this condition to decide whether to wait.
In this section we will walk through how we implement workflow solutions for the following use cases.
### Case 1: Multi-cluster
In this case, users want to distribute workflow to multiple clusters. The dispatcher implementation is flexible and could be based on [open-cluster-management](https://open-cluster-management.io/) or other methods.
- During infra setup, the Cluster objects are applied and agents are setup in each cluster to manage lifecycle of k8s clusters.
- Once the Application is applied, the OCM controller can retrieve all rendered resources from AppRevision. It will apply a ManifestWork object including all resources. Then the OCM agent will execute the workload creation in each cluster.
### Case 2: Blue-green rollout
In this case, users want to rollout a new version of the application components in a blue-green rolling upgrade style.
- By default, each modification of the Application object will generate an AppRevision object. The rollout controller will get the current revision from the context and retrieve the previous revision via kube API.
- Then the rollout controller will do the operation to rollings replicas between two revisions (the actual behavior depends on the workload type, e.g. Deployment or CloneSet).
- Once the rollover is done, the rollout controller can shift partial traffic to the new revision too.
- The rollout controller will wait for the manual approval. In this case, it is in the status of Rollout object:
```yaml
kind: Rollout
status:
pause: true # change this to false
```
The reference to the rollout object will be in the Application object:
In this case, users want to deploy a database component first, wait the database to be up and ready, and then deploy the application with database connection secret.
In this case, users just want Vela to provide final k8s resources and push them to Git, and then integrate with ArgoCD/Flux to do final rollout. Users will setup a GitOps workflow like below:
- Everytime an Application event is triggered, the GitOps workflow controller will push the rendered resources to a Git repo. This will trigger ArgoCD/Flux to do continuous deployment.
In this case, a template for Application object has already been defined. Instead of writing the `spec.components`, users will reference the template and provide parameters/patch to it.
- On creating the application, app controller will apply the HelmTemplate/KustomizePatch objects, and wait for its status.
- The HelmTemplate/KustomizePatch controller would read the template from specified source, render the final config. It will compare the config with the Application object -- if there is difference, it will write back to the Application object per se.
- The update of Application will trigger another event, the app controller will apply the HelmTemplate/KustomizePatch objects with new context. But this time, the HelmTemplate/KustomizePatch controller will find no diff after the rendering. So it will skip this time.
In this case, users want to execute different steps based on the responseCode. When the `if` condition is not met, the step will be skipped.
```yaml
workflow:
steps:
- name: request
type: webhook
- name: handle-200
type: deploy
if: request.output.responseCode == 200
- name: handle-400
type: notification
if: request.output.responseCode == 400
- name: handle-500
type: rollback
if: request.output.responseCode == 500
```
If users want to execute one step no matter what, they can use `if: always` in the step. In this way, whether the workflow is successful or not, the step will be executed`.
In this case, the user runs multiple workflow steps in the `step-group` workflow type. subSteps in a step group will be executed in dag mode.
```yaml
workflow:
steps:
- type: step-group
name: run-step-group1
subSteps:
- name: sub-step1
type: ...
...
- name: sub-step2
type: ...
...
```
The process is as follows:
- When executing a `step-group` step, the subSteps in the step group are executed in dag mode. A step group will only complete when all subSteps have been executed to completion.
The workflow defined here are k8s resource based and very simple one direction workflow. It's mainly used to customize Vela control logic to do more complex deployment operations.
While Argo Workflow/Tekton shares similar idea to provide workflow functionalities, they are container based and provide more complex features like parameters sharing (using volumes and sidecars). More importantly, these projects couldn't satisfy our needs. Otherwise we can just use them in our implementation.