upodroid
dedd4df0a2
fetch cni plugins from GitHub releases
2024-12-18 19:48:06 +01:00
Paco Xu
59dfb0e779
skip if cri proxy is disabled/undefined
2024-11-19 11:17:07 +08:00
Laura Lorenz
9ab0d81d76
Now that sleep is shorter, only expect to reach 3 within 30s
...
Focused too much on the container restart one in commit that fixed that
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-13 01:39:58 +00:00
Laura Lorenz
59f9858086
Move function specific to container restart test inline
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 23:59:30 +00:00
Laura Lorenz
529d5ba9d3
Don't overly indirect image name
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 23:34:57 +00:00
Laura Lorenz
8e7b2af712
Use a better util
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 23:30:03 +00:00
Laura Lorenz
285d433dea
Clearer image pull test and utils
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 23:30:00 +00:00
Laura Lorenz
e03d0f60ef
Orient tests to run faster, but tolerate infra slowdowns up to 5 minutes
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 21:48:28 +00:00
Laura Lorenz
d293c5088f
Fix spelling
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 21:12:20 +00:00
Laura Lorenz
1da8ca816e
Extract restart number properly
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 20:00:11 +00:00
Laura Lorenz
2732d57e33
Missed refactor of container name here
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 19:50:11 +00:00
Laura Lorenz
e6059d7386
Fix typecheck and verify
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 19:48:38 +00:00
Laura Lorenz
f032068ef7
Focus on restart numbers instead of timing
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 07:12:24 +00:00
Laura Lorenz
bad037b505
Formatting
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 04:48:10 +00:00
Laura Lorenz
15bae1eadf
Add container restart test too
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 04:30:46 +00:00
Laura Lorenz
fc4ac5efeb
Move image pull backoff test to be with other image pull tests
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 01:27:44 +00:00
Laura Lorenz
2479d91f2a
Fix test to count pull tries
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-12 01:27:34 +00:00
Laura Lorenz
6ef05dbd01
The idea of how this test should work
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-11 17:55:41 +00:00
Laura Lorenz
6337a28a68
Organize into its own context
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-11 17:55:41 +00:00
Laura Lorenz
f913b7afe8
Adding imagepull backoff test
...
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-11 17:55:41 +00:00
Kubernetes Prow Robot
1dd81aa1c9
Merge pull request #126653 from zhifei92/fix-podstatus
...
fix the issue of losing the pending phase after a node restart.
2024-11-07 21:06:54 +00:00
Kubernetes Prow Robot
ef37cb503b
Merge pull request #128634 from thockin/remove_PodHostIPs_gate_for_1.32
...
Remove PodHostIPs feature gates
2024-11-07 13:47:54 +00:00
zhifei92
bed96b4eb6
fix: fix the issue of losing the pending phase after a node restart.
2024-11-07 21:10:11 +08:00
Lan Liang
6e5a3cde50
Remove PodHostIPs feature gates.
...
Signed-off-by: Lan Liang <gcslyp@gmail.com>
2024-11-06 23:10:36 -08:00
Kubernetes Prow Robot
6cc3570466
Merge pull request #128190 from HarshalNeelkamal/external-jwt
...
Add plugin and key-cache for ExternalJWTSigner integration
2024-11-07 06:29:45 +00:00
Kubernetes Prow Robot
c462d4c8e5
Merge pull request #126096 from utam0k/support-disabling-oom-group-kill
...
kubelet: new kubelet config option for disabling group oom kill
2024-11-07 06:29:36 +00:00
Harshal Neelkamal
6fdacf0411
Add plugin and key-cache for ExternalJWTSigner integration
2024-11-07 03:16:23 +00:00
utam0k
4f909c14a0
kubelet: new kubelet config option for disabling group oom kill
...
Signed-off-by: utam0k <k0ma@utam0k.jp>
2024-11-07 12:03:04 +09:00
Kubernetes Prow Robot
48c65d1870
Merge pull request #128576 from bart0sh/PR166-refactor-kubelet-stop-and-restart
...
e2e_node: refactor Kubelet stopping and restarting
2024-11-06 20:10:40 +00:00
Patrick Ohly
33ea278c51
DRA: use v1beta1 API
...
No code is left which depends on the v1alpha3, except of course the code
implementing that version.
2024-11-06 13:03:19 +01:00
Ed Bartosh
3aa95dafea
e2e_node: refactor stopping and restarting kubelet
...
Moved Kubelet health checks from test cases to the stopKubelet API.
This should make the API cleaner and easier to use.
2024-11-06 11:34:48 +02:00
Kubernetes Prow Robot
98b4ee6bfa
Merge pull request #126525 from dshebib/addSidecarE2EImgTest
...
Restart sidecar container when the image has changed
2024-11-06 00:35:35 +00:00
Kubernetes Prow Robot
f64eeb523d
Merge pull request #128096 from bart0sh/PR161-e2e_node-consolidate-NFSServer-APIs
...
e2e_node: consolidated NFSServer APIs.
2024-11-05 00:33:35 +00:00
Abhijit Hoskeri
d86debe500
e2e_node: Pass e2eCriProxy instead of updating global.
...
e2eCriProxy is defined in a _test.go and referenced
in a non-test file. This confuses gopls.
It's also clearer to future readers.
2024-11-02 17:40:49 -07:00
Kubernetes Prow Robot
453efd7a4b
Merge pull request #121604 from pacoxu/image-pull-e2e
...
[node-e2e] add test cases for serialize and parallel image pulling
2024-10-31 08:01:26 +00:00
Paco Xu
82df7a7d82
use cri proxy injector for parallel pulling image tests
2024-10-31 14:50:50 +08:00
Kubernetes Prow Robot
daef8c2419
Merge pull request #127266 from pohly/dra-admin-access-in-status
...
DRA API: AdminAccess in DeviceRequestAllocationResult + DRAAdminAccess feature gate
2024-10-30 03:41:25 +00:00
Kubernetes Prow Robot
5fcef4f79d
Merge pull request #128422 from bart0sh/PR163-density-e2e_node-adjust-limits
...
density test: adjust CPU and memory limits
2024-10-30 02:37:31 +00:00
Kubernetes Prow Robot
a339a36a36
Merge pull request #127506 from ffromani/cpu-pool-size-metrics
...
node: metrics: add metrics about cpu pool sizes
2024-10-30 00:17:24 +00:00
Ed Bartosh
04f7a86001
density test: adjust CPU and memory limits
...
Adjusted limits based on recent job log:
I1028 20:05:42.079182 1002 resource_usage_test.go:199] Resource usage:
container cpu(cores) memory_working_set(MB) memory_rss(MB)
"kubelet" 0.024 22.17 14.20
"runtime" 0.041 409.70 84.21
I1028 20:05:42.079274 1002 resource_usage_test.go:206] CPU usage of containers:
container 50th% 90th% 95th% 99th% 100th%
"/" N/A N/A N/A N/A N/A
"runtime" 0.014 0.834 0.834 0.834 1.083
"kubelet" 0.023 0.093 0.093 0.093 0.164
Increasing 95th percentile for runtime CPU usage should also make
pull-kubernetes-node-kubelet-containerd-flaky less flaky.
2024-10-30 00:48:56 +02:00
Patrick Ohly
f3fef01e79
DRA API: AdminAccess in DeviceRequestAllocationResult
...
Drivers need to know that because admin access may also grant additional
permissions. The allocator needs to ignore such results when determining which
devices are considered as allocated.
In both cases it is conceptually cleaner to not rely on the content of the
ClaimSpec.
2024-10-29 09:50:07 +01:00
Kubernetes Prow Robot
685b8b3ba1
Merge pull request #126981 from kannon92/stable-empty-dir-promotion
...
KEP-1967: promote size backed memory volumes to stable
2024-10-29 01:00:54 +00:00
Kubernetes Prow Robot
1d8828ce70
Merge pull request #128091 from saschagrunert/cni-plugins
...
Update cni-plugins to v1.6.0
2024-10-27 03:01:06 +00:00
Francesco Romani
14ec0edd10
node: metrics: add metrics about cpu pool sizes
...
Add metrics about the sizing of the cpu pools.
Currently the cpumanager maintains 2 cpu pools:
- shared pool: this is where all pods with non-exclusive
cpu allocation run
- exclusive pool: this is the union of the set of exclusive
cpus allocated to containers, if any (requires static policy in use).
By reporting the size of the pools, the users (humans or machines)
can get better insights and more feedback about how the resources
actually allocated to the workload and how the node resources are used.
2024-10-24 15:35:51 +02:00
Kubernetes Prow Robot
8c7160205d
Merge pull request #127922 from PiotrProkop/topology-manager-policy-options-e2e
...
add e2e tests for prefer-closest-numa-nodes TopologyManagerPolicyOption
2024-10-24 14:17:03 +01:00
PiotrProkop
a6eb3281cc
add e2e tests for prefer-closest-numa-nodes TopologyManagerPolicyOption suboptimal allocation
...
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2024-10-24 11:45:39 +02:00
Ed Bartosh
2ac5dfe379
e2e_node: check container metrics conditionally
...
When PodAndContainerStatsFromCRI FG is enabled, Kubelet tries to get
list of metrics from the CRI runtime using CRI API 'ListMetricDescriptors'.
As this API is not implemented in neither CRI-O nor Containerd versions
used in the test-infra, ResourceMetrics test case fails to gather
certain container metrics.
Excluding container metrics from the expected list of metrics if
PodAndContainerStatsFromCRI is enabled should solve the issue.
2024-10-23 21:08:36 +03:00
Kubernetes Prow Robot
c6669ea7d6
Merge pull request #127155 from ffromani/alignment-metrics
...
node: metrics: add resource alignment metrics
2024-10-23 09:54:58 +01:00
Francesco Romani
c025861e0c
node: metrics: add resource alignment metrics
...
In order to improve the observability of the resource management
in kubelet, cpu allocation and NUMA alignment, we add more metrics
to report if resource alignment is in effect.
The more precise reporting would probably be using pod status,
but this would require more invasive and riskier changes,
and possibly extra interactions to the APIServer.
We start adding metrics to report if containers got their
compute resources aligned.
If metrics are growing, the assingment is working as expected;
If metrics stay consistent, perhaps at zero, no resource
alignment is done.
Extra fixes brought by this work
- retroactively add labels for existing tests
- running metrics test demands precision accounting to avoid flakes;
ensure the node state is restored pristine between each test, to
minimize the aforementioned risk of flakes.
- The test pod command line was wrong, with this the pod could not
reach Running state. That gone unnoticed so far because
no test using this utility function actually needed a pod
in running state.
Signed-off-by: Francesco Romani <fromani@redhat.com>
2024-10-23 08:05:38 +02:00
Davanum Srinivas
abbc5ad346
Copy limited pieces of code we use from runc's apparmor and utils packages
...
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-10-22 09:56:22 -04:00