minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	bb07df7e7b	do not list dangling objects with unmatched ECs (#20351 ) This mostly applies to all new objects, this simply ignores these objects and no application would have to deal with getting 503s on them.	2024-08-30 09:02:26 -07:00
Harshavardhana	504e52b45e	protect bpool from buffer pollution by invalid buffers (#20342 )	2024-08-28 18:40:52 -07:00
Anis Eleuch	38c0840834	bucket-metadata: Reload events/repl-targets for all buckets (#20334 ) Currently, the bucket events and replication targets are only reloaded with buckets that failed to load during the first cluster startup, which is wrong because if one bucket change was done in one node but that node was not able to notify other nodes; the other nodes will reload the bucket metadata config but fails to set the events and bucket targets in the memory.	2024-08-28 08:32:18 -07:00
Harshavardhana	fb2360ff88	when a drive is closed cancel the cleanupTrash goroutine (#20337 ) when a hung drive is hot-unplugged, the server might go into a loop where the previous `format.json` is somehow still accessible to the process, we try to re-init() drives, but that seems to cause a previous goroutine to hang around since it is not canceled away when the drive is closed. Bonus: add deadline for immediate purge routine, to unblock it if the drive is blocking mutations.	2024-08-28 08:31:42 -07:00
jiuker	1a2de1bdde	fix: string format when log IAM refresh take over 5s (#20331 )	2024-08-26 23:40:33 -07:00
Harshavardhana	af55f37b27	do not fallback on the drives to load groups for LDAP (#20320 ) if a user policy is found, avoid reading from the drives for missing group mappings, group mappings are not mandatory and conditional. This PR restores the older behavior while making sure that if a direct user policy is not found, we would still attempt to load from the group from the drives.	2024-08-25 17:22:45 -07:00
Andreas Auernhammer	2d67c26794	improve multipart decryption (#20324 ) This commit simplifies and optimizes the decryption of large (multipart) objects. This PR does two things: - Re-write the init logic for the decryption reader - Reduce the number of OEK decryptions Before, the init logic copied some SSE HTTP request headers to parse them later. This is simplified to parsing them right away. This removes some fields from the decryption reader struct. Further, the decryption reader decrypted the OEK using the client-provided key (SSE-C) or the KMS (SSE-S3 / SSE-KMS) for each part. This is redundant since the OEK is the same for all parts. In particular, a KMS call might be a network request. Now, the OEK is decrypted once for the entire multipart object. This should improve latency when reading encrypted multipart objects and reduce requests to the KMS. Signed-off-by: Andreas Auernhammer <github@aead.dev>	2024-08-25 11:07:13 -07:00
Harshavardhana	006cacfefb	to turn-off healing drop legacy ENV (#20315 )	2024-08-23 15:43:31 -07:00
bestgopher	c28f09d4a7	refactor: displays the OS-specific doc url (#20313 )	2024-08-23 07:11:35 -07:00
Anis Eleuch	73992d2b9f	s3: DeleteBucket to use listing before returning bucket not empty error (#20301 ) Use Walk(), which is a recursive listing with versioning, to check if the bucket has some objects before being removed. This is beneficial because the bucket can contain multiple dangling objects in multiple drives. Also, this will prevent a bug where a bucket is deleted in a deployment that has many erasure sets but the bucket contains one or few objects not spread to enough erasure sets.	2024-08-22 14:57:20 -07:00
Anis Eleuch	a8f143298f	heal: Reset healing params when a retry is decided (#20285 ) Currently, retry healing of a new drive healing does not reset HealedBuckets means that the next healing retry will skip those buckets. The commit will fix this behavior. Also, the skipped objects counter will include objects uploaded that are uploaded after the healing is started.	2024-08-22 05:35:43 -07:00
jiuker	2d44c161c7	fix: support export bucket policy with ExportBucketMetadata (#20308 )	2024-08-22 03:44:35 -07:00
Mark Theunissen	fb4ad000b6	support parseObjectAttributes to handle multiple header values (#20295 )	2024-08-21 14:13:59 -07:00
shandongzhejiang	a8ff12bc72	chore: fix some comments (#20294 ) Signed-off-by: shandongzhejiang <shandongzhejiang@icloud.com>	2024-08-21 13:14:24 -07:00
jiuker	1e1bd3afd9	use io.NopCloser replace closeWrapper (#20287 )	2024-08-21 05:20:54 -07:00
Anis Eleuch	7b239ae154	sftp: Fix operations with a internal service account (#20293 ) sftp sends local requests to the S3 port while passing the session token header when the account corresponds to a service account. However, this is not permitted and will throw an error: "The security token included in the request is invalid" This commit will avoid passing the session token to the upper layer that initializes MinIO client to avoid this error.	2024-08-20 13:00:29 -07:00
Anis Eleuch	85c3db3a93	heal: Add finished flag to .healing.bin to avoid removing this latter (#20250 ) Sometimes, we need historical information in .healing.bin, such as the number of expired objects that the healing avoids to heal and that can create drive usage disparency in the same erasure set. For that reason, this commit will not remove .healing.bin anymore and it will have a new field called Finished so we know healing is finished in that drive.	2024-08-20 08:42:49 -07:00
Mark Theunissen	6378ca10a4	kms.ListKeys returns CreatedBy/CreatedAt when information is available (#20223 )	2024-08-17 23:43:03 -07:00
Harshavardhana	72cff79c8a	add missing STS accounts loading (#20279 ) PR #20268 missed loading STS accounts map properly	2024-08-16 18:24:54 -07:00
Harshavardhana	a5702f978e	remove requests deadline, instead just reject the requests (#20272 ) Additionally set - x-ratelimit-limit - x-ratelimit-remaining To indicate the request rates.	2024-08-16 01:43:49 -07:00
Poorna	4687c4616f	try loading temp account if not in cache (#20266 )	2024-08-15 23:12:42 -07:00
Harshavardhana	cc0c41d216	remove region locks and make them simpler (#20268 ) - single flight approach is now optional, instead of default. - parallelize the loaders upto 32 items per assets (more room for improvement possible)	2024-08-15 08:41:03 -07:00
Klaus Post	f1302c40fe	Fix uninitialized replication stats (#20260 ) Services are unfrozen before `initBackgroundReplication` is finished. This means that the globalReplicationStats write is racy. Switch to an atomic pointer. Provide the `ReplicationPool` with the stats, so it doesn't have to be grabbed from the atomic pointer on every use. All other loads and checks are nil, and calls return empty values when stats still haven't been initialized.	2024-08-15 05:04:40 -07:00
Klaus Post	d96798ae7b	Add support profile deadlines and concurrent operations (#20244 ) * Allow a maximum of 10 seconds to start profiling operations. * Download up to 16 profiles concurrently, but only allow 10 seconds for each (does not include write time). * Add cluster info as the first operation. * Ignore remote download errors. * Stop remote profiles if the request is terminated.	2024-08-15 03:36:00 -07:00
Anis Eleuch	b508264ac4	sr: Avoid recursion when loading site replicator credentials (#20262 ) If the site replication is enabled and the code tries to extract jwt claims while the site replication service account credentials are still not loaded yet, the code will enter an infinite loop, causing in a high CPU usage. Another possibility of the infinite loop is having some service accounts created by an old deployment version where the service account JWT was signed by the root credentials, but not anymore. This commit will remove the possibility of the infinite loop in the code and add root credential fallback to extract claims from old service accounts.	2024-08-14 18:29:20 -07:00
Harshavardhana	db78431b1d	avoid crash when initializing bucket quota cache (#20258 )	2024-08-14 17:34:56 -07:00
Klaus Post	3ffeabdfcb	Fix govet+staticcheck issues (#20263 ) This is better: https://github.com/golang/go/issues/60529	2024-08-14 10:11:51 -07:00
Anis Eleuch	51b1f41518	heal: Persist MRF queue in the disk during shutdown (#19410 )	2024-08-13 15:26:05 -07:00
Harshavardhana	e7a56f35b9	flatten out audit tags, do not send as free-form (#20256 ) move away from map[string]interface{} to map[string]string to simplify the audit, and also provide concise information. avoids large allocations under load(), reduces the amount of audit information generated, as the current implementation was a bit free-form. instead all datastructures must be flattened.	2024-08-13 15:22:04 -07:00
rubyisrust	516af01a12	chore: fix some function names (#20243 ) Signed-off-by: rubyisrust <rustrover@icloud.com>	2024-08-13 11:23:33 -07:00
Harshavardhana	acdb355070	update deps and update azure WARM tier implementation (#20247 )	2024-08-13 11:21:34 -07:00
Mark Theunissen	37c02a5f7b	Add dummy DeleteBucketCors for safety (#20253 )	2024-08-13 08:25:16 -07:00
Krishnan Parthasarathi	04be352ae9	Relax quorum agreement on DataDir values (#20232 ) Previously, we checked if we had a quorum on the DataDir value. We are removing this check, which allows reading objects with different DataDir values in a few drives (due to a rebalance-stop race bug) provided their eTags or ModTimes match.	2024-08-12 12:02:21 -07:00
Klaus Post	53eb7656de	Add admin info timeouts (#20249 ) Since a lot of operations load from storage, do remote calls, add a 10 second timeout to each operation. This should make `mc admin info` return values even under extreme conditions.	2024-08-12 10:24:29 -07:00
Harshavardhana	2e0fd2cba9	implement a safer completeMultipart implementation (#20227 ) - optimize writing part.N.meta by writing both part.N and its meta in sequence without network component. - remove part.N.meta, part.N which were partially success ful, in quorum loss situations during renamePart() - allow for strict read quorum check arbitrated via ETag for the given part number, this makes it double safer upon final commit. - return an appropriate error when read quorum is missing, instead of returning InvalidPart{}, which is non-retryable error. This kind of situation can happen when many nodes are going offline in rotation, an example of such a restart() behavior is statefulset updates in k8s. fixes #20091	2024-08-12 01:38:15 -07:00
Harshavardhana	909b169593	avoid source index to be same as destination index (#20238 ) during rebalance stop, it can possibly happen that Put() would race by overwriting the same object again. This may very well if done "successfully" it can potentially proceed to delete the object from the pool, causing data loss. This PR enhances #20233 to handle more scenarios such as these.	2024-08-09 19:30:44 -07:00
Krishnan Parthasarathi	4e67a4027e	Prevent overwrites due to rebalance-stop race (#20233 ) Rebalance-stop can race with ongoing rebalance operations. This change prevents these operations from overwriting objects by checking the source and destination pool indices are different.	2024-08-08 19:05:14 -07:00
Klaus Post	49055658a9	Fix missing hash in GetObjectAttributes (#20231 ) SHA256/SHA1 were mixed up. Simplify code as well.	2024-08-08 13:19:41 -07:00
Harshavardhana	89c58ce87d	enhance getActualSize() to return valid values for most situations (#20228 )	2024-08-08 08:29:58 -07:00
Mark Theunissen	2681219039	Add dummy PutBucketCors for functional test compatibility (#20220 )	2024-08-06 08:41:38 -07:00
Harshavardhana	dea9abed29	use singleflight when bucket metadata is reloaded() (#20216 ) this allows for de-duplicating the callers when called concurrently, allowing for bucketmetadata reads to be single call. All concurrent callers will get the same data as the first one.	2024-08-05 09:50:11 -07:00
Harshavardhana	e3eb5c1328	batch-exp: Remove 1000 maximum objects per call (#20212 ) It seems ObjectAPI.DeleteObjects() is clogging up when it is removing 10k versions of a single object. Authored-by: Anis Eleuch <anis@min.io>	2024-08-04 21:55:25 -07:00
Poorna	74c047cb03	fix replication last hour metric (#20199 ) also adding missing recent_backlog_count metric to v3 metrics	2024-08-01 17:55:27 -07:00
jiuker	50a5ad48fc	feat: support batch replication prefix slice (#20033 )	2024-08-01 05:53:30 -07:00
Harshavardhana	a9dc061d84	count metrics properly for any failures during drive heal (#20193 ) or via `mc admin heal --set 1 --pool 1`	2024-07-30 22:46:26 -07:00
Krishnan Parthasarathi	01a8c09920	Add fmt-gen subcommand (#20192 ) fmt-gen subcommand is only available when built with build tag `fmtgen`.	2024-07-30 15:59:48 -07:00
Aditya Manthramurthy	4c8562bcec	Fix v2 metrics: Send all ttfb api labels (#20191 ) Fix a regression in #19733 where TTFB metrics for all APIs except GetObject were removed in v2 and v3 metrics. This causes breakage for existing v2 metrics users. Instead we continue to send TTFB for all APIs in V2 but only send for GetObject in V3.	2024-07-30 15:28:46 -07:00
Harshavardhana	f13c04629b	allow multipart uploads expiration to be dynamic (#20190 ) allow multipart uploads expiration to be dyamic It would seem like the new values will take effect only after a restart for changes in multipart_expiration. This PR fixes this by making it dynamic as it should have been.	2024-07-30 12:01:06 -07:00
Harshavardhana	80ff907d08	add DeleteBulk support, add sufficient deadlines per rename() (#20185 ) deadlines per moveToTrash() allows for a more granular timeout approach for syscalls, instead of an aggregate timeout. This PR also enhances multipart state cleanup to be optimal by removing 100's of multipart network rename() calls into single network call.	2024-07-29 18:56:40 -07:00
Poorna	2d40433bc1	remove replication throttle deadline for objects > 128MiB (#20184 ) context deadline was introduced to avoid a slow transfer from blocking replication queue(s) shared by other buckets that may not be under throttling. This PR removes this context deadline for larger objects since they are anyway restricted to a limited set of workers. Otherwise, objects would get dequeued when the throttle limit is exceeded and cannot proceed within the deadline.	2024-07-29 15:14:52 -07:00
Harshavardhana	a17f14f73a	separate lock from common grid to avoid epoll contention (#20180 ) epoll contention on TCP causes latency build-up when we have high volume ingress. This PR is an attempt to relieve this pressure. upstream issue https://github.com/golang/go/issues/65064 It seems to be a deeper problem; haven't yet tried the fix provide in this issue, but however this change without changing the compiler helps. Of course, this is a workaround for now, hoping for a more comprehensive fix from Go runtime.	2024-07-29 11:10:04 -07:00
Poorna	6651c655cb	fix replication of checksum when encryption is enabled (#20161 ) - Adding functional tests - Return checksum header on GET/HEAD, previously this was returning InvalidPartNumber error	2024-07-29 01:02:16 -07:00
Harshavardhana	3ae104edae	change Read* calls over net/http to move to http.MethodGet (#20173 ) - ReadVersion - ReadFile - ReadXL Further changes include to - Compact internode resource RPC paths - Compact internode query params To optimize on parsing by gorilla/mux as the length of this string increases latency in gorilla/mux - reduce to a meaningful string.	2024-07-29 01:00:12 -07:00
jiuker	c87a489514	fix: support prefix when batchJob replicate enable the snowball (#20178 )	2024-07-29 00:59:50 -07:00
Poorna	641a56da0d	fix panic in replication queuing (#20169 ) Regression from #20077 ``` Jul 26 19:08:29 minio-dr-0101a minio[275423]: Error: grid handler (NSScanner) panic: runtime error: index out of range [4] with length 1 (errors.errorString) Jul 26 19:08:29 minio-dr-0101a minio[275423]: 33: internal/logger/logger.go:268:logger.LogIf() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 32: internal/grid/connection.go:50:grid.gridLogIf() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 31: internal/grid/muxserver.go:234:grid.(muxServer).handleRequests.func1() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 30: cmd/bucket-replication.go:2165:cmd.(ReplicationPool).queueReplicaTask() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 29: cmd/bucket-replication.go:3440:cmd.queueReplicationHeal() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 28: cmd/data-scanner.go:1396:cmd.(scannerItem).healReplication() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 27: cmd/data-scanner.go:1220:cmd.(scannerItem).applyActions() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 26: cmd/xl-storage.go:627:cmd.(xlStorage).NSScanner.func2() ```	2024-07-26 13:48:21 -07:00
Harshavardhana	a16193bb50	remove fdatasync() discard, we write with O_SYNC (#20168 ) fdatasync() discard for page-cached READs is not needed, it would seem like this can cause latencies in situations when things are loaded.	2024-07-26 10:27:56 -07:00
jiuker	132e7413ba	fix: check once ready for site-replication (#20149 )	2024-07-26 10:27:42 -07:00
Klaus Post	1966668066	Avoid Batch Replication Job log spam (#20158 ) Only print once per job and error location. Set default retry to default 1 second wait, and use as minimum.	2024-07-26 05:55:50 -07:00
Harshavardhana	064f36ca5a	move to GET for internal stream READs instead of POST (#20160 ) the main reason is to let Go net/http perform necessary book keeping properly, and in essential from consistency point of view its GETs all the way. Deprecate sendFile() as its buggy inside Go runtime.	2024-07-26 05:55:01 -07:00
Krishnan Parthasarathi	4a1edfd9aa	Different read quorum for tiered objects (#20115 ) For a non-tiered object, MinIO requires that EcM (# of data blocks) of xl.meta agree, corresponding to the number of data blocks needed to read this object. OTOH, tiered objects have metadata in the hot tier and data in the warm tier. The data and its integrity are offloaded to the warm tier. This allows us to reduce the read quorum from EcM (typically > N/2, where N - erasure stripe width) to N/2 + 1. The simple majority of metadata ensures consensus on what the object is and where it is located.	2024-07-25 14:02:50 -07:00
Anis Eleuch	b7f319b62a	properly reload a fresh drive when found in a failed state during startup (#20145 ) When a drive is in a failed state when a single node multiple drives deployment is started, a replacement of a fresh disk will not be properly healed unless the user restarts the node. Fix this by always adding the new fresh disk to globalLocalDrivesMap. Also remove globalLocalDrives for simplification, a map to store local node drives can still be used since the order of local drives of a node is not defined.	2024-07-24 16:30:33 -07:00
Anis Eleuch	33c101544d	kms: Expose API when bucket federation is enabled (#20143 ) kms: Expose API available when bucket federation is enabled When bucket federation feature is enabled, KMS API will not work, such as `mc admin kms key list` The commit will fix the issue by disabling bucket forwarding when this is a KMS request.	2024-07-24 15:44:29 -07:00
Harshavardhana	3b21bb5be8	use unixNanoTime instead of time.Time in lockRequestorInfo (#20140 ) Bonus: Skip Source, Quorum fields in lockArgs that are never sent during Unlock() phase.	2024-07-24 03:24:01 -07:00
Harshavardhana	6fe2b3f901	avoid sendFile() for ranges or object lengths < 4MiB (#20141 )	2024-07-24 03:22:50 -07:00
Taran Pelkey	b368d4cc13	Fix `updateGroupMembershipsForLDAP` behavior with unicode (#20137 )	2024-07-23 19:10:03 -07:00
Klaus Post	0680af7414	Set O_NONBLOCK for reads and writes on unix (#20133 ) Tracing syscalls, opening and reading an `xl.meta` looks like this: ``` openat(AT_FDCWD, "/mnt/drive1/ss8-old/testbucket/ObjSize4MiBThreads72/(554O51H/peTb(0iztdbTKw59.csv/xl.meta", O_RDONLY\|O_NOATIME\|O_CLOEXEC) = 34 <0.000> fcntl(34, F_GETFL) = 0x48000 (flags O_RDONLY\|O_LARGEFILE\|O_NOATIME) <0.000> fcntl(34, F_SETFL, O_RDONLY\|O_NONBLOCK\|O_LARGEFILE\|O_NOATIME) = 0 <0.000> epoll_ctl(4, EPOLL_CTL_ADD, 34, {events=EPOLLIN\|EPOLLOUT\|EPOLLRDHUP\|EPOLLET, data={u32=3172471557, u64=8145488475984499461}}) = -1 EPERM (Operation not permitted) <0.000> fcntl(34, F_GETFL) = 0x48800 (flags O_RDONLY\|O_NONBLOCK\|O_LARGEFILE\|O_NOATIME) <0.000> fcntl(34, F_SETFL, O_RDONLY\|O_LARGEFILE\|O_NOATIME) = 0 <0.000> fstat(34, {st_mode=S_IFREG\|0644, st_size=354, ...}) = 0 <0.000> read(34, "XL2 \1\0\3\0\306\0\0\1P\2\2\1\304$\225\304\20\0\0\0\0\0\0\0\0\0\0\0"..., 354) = 354 <0.000> close(34) = 0 <0.000> ``` Everything until `fstat` is the `os.Open` call. Looking at the code: https://github.com/golang/go/blob/master/src/os/file_unix.go#L212-L243 It seems for every file it "tries" to see if it is pollable. This causes `syscall.SetNonblock(fd, true)` to be called. This is the first `F_SETFL`. It then calls `f.pfd.Init("file", true)`. This will attempt to set it as pollable using `epoll_ctl`. This will always fail for files. It therefore calls `syscall.SetNonblock(fd, false)` resulting in the second `F_SETFL`. If we set the `O_NONBLOCK` call on the initial open, we should avoid the 4 `fcntl` syscalls per file. I don't see any way to avoid the `epoll_ctl` call, since kind is either `kindOpenFile` or `kindNonBlock`, so "pollable" will always be true. However avoiding 4 of 6 syscalls still seems worth it. This should not have any effect, since files will end up with "nonblock" anyway.	2024-07-23 09:36:24 -07:00
Harshavardhana	91805bcab6	add optimizations to bring performance on unversioned READS (#20128 ) allow non-inlined on disk to be inlined via an unversioned ReadVersion() call, we only need ReadXL() to resolve objects with multiple versions only. The choice of this block makes it to be dynamic and chosen by the user via `mc admin config set` Other bonus things - Start measuring internode TTFB performance. - Set TCP_NODELAY, TCP_CORK for low latency	2024-07-23 03:53:03 -07:00
jiuker	b3a94c4e85	fix: Use xtime duration to parse batch job (#20117 )	2024-07-23 00:05:53 -07:00
Harshavardhana	8e618d45fc	remove unnecessary LRU for internode auth token (#20119 ) removes contentious usage of mutexes in LRU, which were never really reused in any manner; we do not need it. To trust hosts, the correct way is TLS certs; this PR completely removes this dependency, which has never been useful. ``` 0 0% 100% 25.83s 26.76% github.com/hashicorp/golang-lru/v2/expirable.(LRU[...]) 0 0% 100% 28.03s 29.04% github.com/hashicorp/golang-lru/v2/expirable.(LRU[...]) ``` Bonus: use `x-minio-time` as a nanosecond to avoid unnecessary parsing logic of time strings instead of using a more straightforward mechanism.	2024-07-22 00:04:48 -07:00
Harshavardhana	3ef59d2821	do not set KMSSecretKey env from KMSSecretKeyFile (#20122 ) fixes #20121	2024-07-21 14:39:15 -07:00
Anis Eleuch	d9ee668b6d	s3: Fix wrong continuation token during listing with ILM enabled bucket (#20113 )	2024-07-18 13:37:34 -07:00
Anis Eleuch	2e5d792f0c	batch-expiry: Save progress regularly in the drives and at the end (#20098 ) - Also, fix failure reporting at the end. - Also, avoid parsing report objects when listing or resuming jobs, this does not cause any bugs, it is only printing, not useful errors.	2024-07-17 09:42:32 -07:00
Poorna	3535197f99	replication: proxy only on missing object or read quorum err (#20101 )	2024-07-16 16:46:41 -07:00
Mark Theunissen	698bb93a46	Allow a KMS Action to specify keys in the Resources of a policy (#20079 )	2024-07-16 07:03:03 -07:00
Harshavardhana	e8c54c3d6c	add validation test for v3 metrics for all its endpoints (#20094 ) add unit test for v3 metrics for all its exposed endpoints Bonus: - support OpenMetrics encoding - adds boot time for prometheus - continueOnError is better to serve as much metrics as possible.	2024-07-15 09:28:02 -07:00
Shubhendu	f944a42886	Removed user and group details from logs (#20072 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-07-14 11:12:07 -07:00
Harshavardhana	eff0ea43aa	fix: typo in BucketUsageMetrics group registration in v3 metrics (#20090 ) ``` curl http://localhost:9000/minio/metrics/v3/cluster/usage/buckets ``` Did not work as documented, due to the fact that there was a typo in the bucket usage metrics registration group. This endpoint is a cluster endpoint and does not require any `buckets` argument.	2024-07-14 11:11:42 -07:00
Harshavardhana	7fcb428622	do not print unexpected logs (#20083 )	2024-07-12 13:51:54 -07:00
Klaus Post	83adc2eebf	Fix ListObjects aborting after 3 minute on async request (#20074 ) When creating the async listing, if the first request does not return within 3 minutes, it is stopped, since it isn't being kept alive. Keep updating `lastHandout` while we are waiting for the initial request to be fulfilled.	2024-07-12 09:23:16 -07:00
Poorna	989c318a28	replication: make large workers configurable (#20077 ) This PR also improves throttling by reducing tokens requested from rate limiter based on available tokens to avoid exceeding throttle wait deadlines	2024-07-12 07:57:31 -07:00
Taran Pelkey	f5d2fbc84c	Add DecodeDN and QuickNormalizeDN functions to LDAP config (#20076 )	2024-07-11 18:04:53 -07:00
Allan Roger Reid	e139673969	Audit failure in batch job key rotate (#20073 )	2024-07-11 16:13:15 -07:00
Harshavardhana	a8c6465f22	hide some deprecated fields from 'get' output (#20069 ) also update wording on `subnet license="" api_key=""`	2024-07-10 13:16:44 -07:00
Taran Pelkey	6c6f0987dc	Add groups to policy entities (#20052 ) * Add groups to policy entities * update comment --------- Co-authored-by: Harshavardhana <harsha@minio.io>	2024-07-10 11:41:49 -07:00
Austin Chang	5f64658faa	clarify error message for root user credential (#20043 ) Signed-off-by: Austin Chang <austin880625@gmail.com>	2024-07-10 09:57:01 -07:00
Anis Eleuch	ce183cb2b4	heal: List and heal again for any listing error (#19999 ) When a fresh drive healing is finished, add more checks for the drive listing errors. If any, re-list and heal again. Although this is an infrequent use case to have listPathRaw() returning nil when minDisks is set to 1, we still need to handle all possible use cases to avoid missing healing any object. Also, check for HealObject result to decide of an object is healed in the fresh disk since HealObject returns nil if an object is healed in any disk, and not in the new fresh drive.	2024-07-10 09:55:36 -07:00
Klaus Post	b3bac73c0f	Clarify post policy error message (#20067 ) It is not really clear that the listed keys are missing. Clarify the error	2024-07-10 07:18:44 -07:00
Anis Eleuch	e726d8ff0f	list: Hide objects/versions with pending/failed replicated deletion (#20047 ) In regular listing, this commit will avoid showing an object when its latest version has a pending or failed deletion. In replicated setup. It will also prevent showing older versions in the same case.	2024-07-09 15:26:42 -07:00
Shubhendu	f4230777b3	Log replication errors once (#20063 ) Also, sort the error map for multiple sites in ascending order of deployment IDs, so that the error message generated is always definitive order and same. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-07-09 10:10:31 -07:00
Krishnan Parthasarathi	380233d646	batch: Update job info object on success (#20053 )	2024-07-08 18:45:54 -07:00
Klaus Post	0d0b0aa599	Abstract grid connections (#20038 ) Add `ConnDialer` to abstract connection creation. - `IncomingConn(ctx context.Context, conn net.Conn)` is provided as an entry point for incoming custom connections. - `ConnectWS` is provided to create web socket connections.	2024-07-08 14:44:00 -07:00
Anis Eleuch	b433bf14ba	Add typos check to Makefile (#20051 )	2024-07-08 14:39:49 -07:00
Klaus Post	107d951893	Log ILM failed object name (#20040 ) Log so we know which object we are dealing with. Log each object once.	2024-07-04 07:25:45 -07:00
Shireesh Anjal	22c53b1c70	Remove license update job (#20037 )	2024-07-03 11:49:48 -07:00
Mark Theunissen	88926ad8e9	return appropriate error upon tier update for incorrect credentials (#20034 )	2024-07-03 00:17:20 -07:00
Harshavardhana	32d04091a2	resume any batch jobs in a goroutine (#20035 ) Bonus: move batch job initialization to the last item after all other initialization, allowing for faster startup time for different subsystems.	2024-07-03 00:16:05 -07:00
Harshavardhana	be84a4fd68	do not proxy invalid object names (#20031 )	2024-07-02 14:28:55 -07:00
Anis Eleuch	2ec1f404ac	info: Always refresh the root disk status (#20023 ) Add root drive status in the disk info cache function, so unmounting a drive without restarting a local node reflects the correct value.	2024-07-02 13:41:29 -07:00
Klaus Post	2040559f71	Fix SkipReader performance with small initial read (#20030 ) If `SkipReader` is called with a small initial buffer it may be doing a huge number if Reads to skip the requested number of bytes. If a small buffer is provided grab a 32K buffer and use that. Fixes slow execution of `testAPIGetObjectWithMPHandler`. Bonuses: * Use `-short` with `-race` test. * Do all suite test types with `-short`. * Enable compressed+encrypted in `testAPIGetObjectWithMPHandler`. * Disable big file tests in `testAPIGetObjectWithMPHandler` when using `-short`.	2024-07-02 08:13:05 -07:00
Anis Eleuch	ca0ce4c6ef	tests: Fix setting max openfds as memory limit (#20029 ) The code was advertenly passing max openfds to debug.SetMemoryLimit(), fixing this accelerate go test in my machine. This is only a testing bug, since the server context has always a valid MaxMem, so the buggy code was never called in users environments.	2024-07-02 08:09:36 -07:00
Anis Eleuch	757cf413cb	Add batch status API (#19679 ) Currently the status of a completed or failed batch is held in the memory, a simple restart will lose the status and the user will not have any visibility of the job that was long running. In addition to the metrics, add a new API that reads the batch status from the drives. A batch job will be cleaned up three days after completion. Also add the batch type in the batch id, the reason is that the batch job request is removed immediately when the job is finished, then we do not know the type of batch job anymore, hence a difficulty to locate the job report	2024-07-02 01:17:52 -07:00
Anis Eleuch	b35acb3dbc	heal: Add support of healing particular pool/set (#20024 )	2024-07-01 15:02:25 -07:00
Sveinn	e404abf103	Letting password enable auth bypass caPublicKey (only if passauth is … (#20022 )	2024-07-01 15:02:01 -07:00
jiuker	f7ff19cb18	fix: warning for decommissioned pool while start (#20019 )	2024-07-01 07:38:46 -07:00
Poorna	91faaa1387	fix panic in batch replicate (#20014 ) Fixes: ``` panic: send on closed channel panic: close of closed channel goroutine 878 [running]: github.com/minio/minio/internal/ioutil.SafeClose[...](...) /Users/kp/code/src/github.com/minio/minio/internal/ioutil/ioutil.go:407 github.com/minio/minio/cmd.(erasureServerPools).Walk.func2.2() /Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2229 +0xc0 panic({0x108c25e60?, 0x1090b28d0?}) /usr/local/go/src/runtime/panic.go:770 +0x124 github.com/minio/minio/cmd.(erasureServerPools).Walk.func2.3({{0x1400e397316, 0x5}, {0x1400d88b8a8, 0x8}, {0x1f99d80, 0xede101c42, 0x0}, 0x3bc, 0x0, 0x0, ...}) /Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2235 +0xb4 github.com/minio/minio/cmd.(erasureServerPools).Walk.func2() /Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2277 +0xabc created by github.com/minio/minio/cmd.(erasureServerPools).Walk in goroutine 575 /Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2210 +0x33c ```	2024-06-28 18:20:47 -07:00
Harshavardhana	f365a98029	fix: hot-reloading STS credential policy documents (#20012 ) * fix: hot-reloading STS credential policy documents * Support Role ARNs hot load policies (#28) --------- Co-authored-by: Anis Eleuch <vadmeste@users.noreply.github.com>	2024-06-28 16:17:22 -07:00
Taran Pelkey	7ca4ba77c4	Update tests to use AttachPolicy(LDAP) instead of deprecated SetPolicy (#19972 )	2024-06-28 02:06:25 -07:00
Poorna	13512170b5	list: Do not decrypt SSE-S3 Etags in a non encrypted format (#20008 )	2024-06-27 19:44:56 -07:00
Krishnan Parthasarathi	154fcaeb56	Allow rebalance start when it's stopped/completed (#20009 )	2024-06-27 17:22:30 -07:00
Anis Eleuch	722118386d	iam: Hot load of the policy during request authorization (#20007 ) Hot load a policy document when during account authorization evaluation to avoid returning 403 during server startup, when not all policies are already loaded. Add this support for group policies as well.	2024-06-27 17:03:07 -07:00
Harshavardhana	709612cb37	fix: rebalance upon pool expansion would crash when in progress (#20004 ) you can attempt a rebalance first i.e, start with 2 pools. ``` mc admin rebalance start alias/ ``` and after that you can add a new pool, this would potentially crash. ``` Jun 27 09:22:19 xxx minio[7828]: panic: runtime error: invalid memory address or nil pointer dereference Jun 27 09:22:19 xxx minio[7828]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x22cc225] Jun 27 09:22:19 xxx minio[7828]: goroutine 1 [running]: Jun 27 09:22:19 xxx minio[7828]: github.com/minio/minio/cmd.(*erasureServerPools).findIndex(...) ```	2024-06-27 11:35:34 -07:00
Harshavardhana	b35d083872	fix; change retry-after 60sec for 503s and 10s for 429s (#19996 )	2024-06-26 01:32:06 -07:00
Harshavardhana	5e7b243bde	extend cluster health to return errors for IAM, and Bucket metadata (#19995 ) Bonus: make API freeze to be opt-in instead of default	2024-06-26 00:44:34 -07:00
Taran Pelkey	3c2141513f	add `ListAccessKeysLDAPBulk` API to list accessKeys for multiple/all LDAP users (#19835 )	2024-06-25 14:21:28 -07:00
Aditya Manthramurthy	602f6a9ad0	Add IAM (re)load timing logs (#19984 ) This is useful to debug large IAM load times - the usual cause is when there are a large amount of temporary accounts.	2024-06-25 10:33:10 -07:00
Harshavardhana	22c5a5b91b	add healing retries when there are failed heal attempts (#19986 ) transient errors for long running tasks are normal, allow for drive to retry again upto 3 times before giving up on healing the drive.	2024-06-25 10:32:56 -07:00
jiuker	41f508765d	fix: format the scanner object error (#19991 )	2024-06-25 08:54:24 -07:00
Aditya Manthramurthy	7dccd1f589	fix: bootstrap msgs should only be sent at startup (#19985 )	2024-06-24 19:30:28 -07:00
Harshavardhana	be97ae4c5d	fix: gcs tier going offline due to customer HTTPclient (#19973 ) specifying customer HTTP client makes the gcs SDK ignore the passed credentials, instead let the GCS SDK manage the transport. this PR fixes #19922 a regression from #19565	2024-06-21 22:26:45 -07:00
Anis Eleuch	4d7d008741	bootstrap: Speed up bucket metadata loading (#19969 ) Currently, bucket metadata is being loaded serially inside ListBuckets Objet API. Fix that by loading the bucket metadata as the number of erasure sets * 10, which is a good approximation.	2024-06-21 15:22:24 -07:00
Klaus Post	2d7a3d1516	Return error from mergeEntryChannels (#19970 ) - Add error from mergeEntryChannels to `results.` - Make sure we check the context error before we close the channel.	2024-06-21 12:06:51 -07:00
Harshavardhana	dfab400d43	reject bootup, if binaries are different in a cluster (#19968 )	2024-06-21 07:49:49 -07:00
Shireesh Anjal	e200808ab7	fix errors in metrics code on macos (#19965 ) - do not load proc fs metrics in case of macos - null-check TimeStat before accessing	2024-06-20 10:55:03 -07:00
Klaus Post	fae563b85d	Add fixed timed restarts to updates (#19960 )	2024-06-20 07:49:22 -07:00
Anis Eleuch	95e4cbbfde	Do not ping event targets during cluster initialization (#19959 ) S3 operations are frozen during startup, therefore we should avoid pinging event targets during the initialization since it can stall.	2024-06-20 07:46:02 -07:00
Harshavardhana	2825294b7b	allow server startup to come online with READ success (#19957 )	2024-06-19 22:21:31 -07:00
Sveinn	bce93b5cfa	Removing timeout on shutdown (#19956 )	2024-06-19 11:42:47 -07:00
Harshavardhana	7a4b250c8b	avoid waiting for quorum health while debugging (#19955 )	2024-06-19 10:12:20 -07:00
Harshavardhana	69e41f87ef	compute localIPs only once per server startup() (#19951 ) repeatedly calling this function is not necessary, on systems with lots of interfaces, including virtual ones can make this reasonably delayed.	2024-06-19 07:34:00 -07:00
Harshavardhana	ee48f9f206	perform healthchecks before initializing everything fully (#19953 ) adds more informative logs that provide details on which erasure set is losing quorum etc.	2024-06-19 07:33:40 -07:00
Sveinn	9ba39d7fad	Removing a channel that was not being used (#19948 )	2024-06-19 01:59:39 -07:00
Harshavardhana	d2fb371f80	do not need response record body (#19949 ) since the connection is active, the response recorder body can grow endlessly causing leak, as this bytes buffer is never given back to GC due to an goroutine.	2024-06-19 01:59:21 -07:00
Klaus Post	2f9018f03b	Do regular checks for healing status while scanning (#19946 )	2024-06-18 09:11:04 -07:00
Harshavardhana	bbb64eaade	skip healing properly in the scanner when a drive is hotplugged (#19939 ) skip healing properly in scanner when drive is hotplugged due to how the state is passed around the SkipHealing might not be the true state() of the system always, causing a situation where we might healing from the scanner on the same drive which is being. Due to this competing heals get triggered that slow each other down.	2024-06-17 16:39:11 -07:00
Harshavardhana	7bd1d899bc	remove overzealous check during HEAD() (#19940 ) due to a historic bug in CopyObject() where an inlined object loses its metadata, the check causes an incorrect fallback verifying data-dir. CopyObject() bug was fixed in `ffa91f9794` however the occurrence of this problem is historic, so the aforementioned check is stretching too much. Bonus: simplify fileInfoRaw() to read xl.json as well, also recreate buckets properly.	2024-06-17 07:29:18 -07:00
Harshavardhana	c91d1ec2e3	fix: avoid metadata cache without data for all callers (#19935 )	2024-06-14 06:28:35 -07:00
Shubhendu	3bd3470d0b	Corrected names of node replication metrics (#19932 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-06-13 15:26:54 -07:00
Harshavardhana	ba39ed9af7	loadUser() if not able to load() credential return error (#19931 )	2024-06-13 15:26:38 -07:00
jiuker	62e6dc950d	fix: do not update metadata cache upon headObject() (#19929 )	2024-06-13 08:42:02 -07:00
Klaus Post	ad04afe381	Fix SSEC multipart checksum replication (#19915 ) * Multipart SSEC checksums were not transferred. * Remove key mismatch logging. This key is user-controlled with SSEC. * If the source is SSEC and the destination reports ErrSSEEncryptedObject, assume replication is good.	2024-06-12 23:56:12 -07:00
Harshavardhana	d06b63d056	load credential for in-flights requests as singleflight (#19920 ) avoid concurrent callers for LoadUser() to even initiate object read() requests, if an on-going operation is in progress. this avoids many callers hitting the drives causing I/O spikes, also allows for loading credentials faster.	2024-06-12 13:47:56 -07:00
Harshavardhana	e3ac4035b9	decrement requests inqueue correctly after the request is processed (#19918 )	2024-06-12 01:13:12 -07:00
Harshavardhana	d21b6daa49	fix: avoid crash when delete() returns an error in batch expiration (#19909 )	2024-06-11 06:50:53 -07:00
Harshavardhana	55aa431578	fix: on windows avoid ':' as part of the object name (#19907 ) fixes #18865 avoid-colon	2024-06-10 20:13:30 -07:00
Harshavardhana	614981e566	allow purge expired STS while loading credentials (#19905 ) the reason for this is to avoid STS mappings to be purged without a successful load of other policies, and all the credentials only loaded successfully are properly handled. This also avoids unnecessary cache store which was implemented earlier for optimization.	2024-06-10 11:45:50 -07:00
Klaus Post	d2eed44c78	Fix replication checksum transfer (#19906 ) Compression will be disabled by default if SSE-C is specified. So we can still honor SSE-C.	2024-06-10 10:40:33 -07:00
Anis Eleuch	789cbc6fb2	heal: Dangling check to evaluate object parts separately (#19797 )	2024-06-10 08:51:27 -07:00
jiuker	0662c90b5c	fix: copyObject restore with a specific version, update test cases (#19895 )	2024-06-10 08:50:49 -07:00
Klaus Post	a2cab02554	Fix SSE-C checksums (#19896 ) Compression will be disabled by default if SSE-C is specified. So we can still honor SSE-C.	2024-06-10 08:31:51 -07:00
Harshavardhana	6c7a21df6b	turn-off unexpected debug logging in List() calls (#19903 )	2024-06-09 21:34:26 -07:00
Harshavardhana	29a25a538f	fix: make sure we list freeVersions like DEL marker with --versions (#19878 ) freeVersions() was being incorrectly skipped; list it as valid objects properly. Co-authored-by: Krishnan Parthasarathi <Krishnan Parthasarathi>	2024-06-07 15:18:44 -07:00
Harshavardhana	2dd8faaedc	remove unnecessary log in Listing()	2024-06-07 14:52:55 -07:00
Krishnan Parthasarathi	069c4015cd	Don't tier directory objects (#19891 ) Directory objects are used by applications that simulate the folder structure of an on-disk filesystem. These are zero-byte objects with names ending with '/'. They are only used to check whether a 'folder' exists in the namespace.	2024-06-07 08:43:17 -07:00
Shubhendu	2f6e03fb60	Calculate correct object size while replication (#19888 ) It was missing in case of `replicateObject` but was present for `replicateAll` already Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-06-06 12:31:01 -07:00
Klaus Post	0fbb945e13	Disable caching of encrypted objects (#19890 ) Don't write encrypted objects to cache, if configured.	2024-06-06 11:39:18 -07:00
Anis Eleuch	b94dd835c9	decom: Fix CurrentSize output when generating the status (#19883 ) StartSize starts with the raw free space of all disks in the given pool, however during the status, CurrentSize is not showing the current free raw space, as expected at least by `mc admin decom status` since it was written.	2024-06-06 07:30:43 -07:00
Poorna	5aaef9790f	replication: pass checksum headers to replica (#19834 )	2024-06-06 02:36:42 -07:00
Bala FA	7edc352d23	Add ILM metrics in metrics-v3 (#19539 ) Signed-off-by: Bala.FA <bala@minio.io>	2024-06-06 02:36:25 -07:00
Poorna	850a84b08a	simplify site replication multipart proxying (#19885 )	2024-06-05 18:01:15 -07:00
Taran Pelkey	4148754ce0	Check both given and normalized group DN on LDAP policy detach requests (#19876 )	2024-06-05 15:42:40 -07:00
Harshavardhana	2107722829	upgrade go-oidc to fix GO-2024-2631 (#19884 )	2024-06-05 15:00:34 -07:00
jiuker	d326ba52e9	feat: support batchJob for windows (#19877 )	2024-06-05 08:44:53 -07:00
Sveinn	91e1487de4	Add LDAP public key authentication to SFTP (#19833 )	2024-06-05 00:51:13 -07:00
jiuker	90a9f2dd70	fix: log diskerror when detect the disk space failed (#19861 )	2024-06-04 09:42:03 -07:00
Harshavardhana	d5e48cfd65	fix: remove DriveOPTimeout for REST callers as they don't work properly (#19873 ) Go's net/http is notoriously difficult to have a streaming deadlines per READ/WRITE on the net.Conn if we add them they interfere with the Go's internal requirements for a HTTP connection. Remove this support for now fixes #19853	2024-06-04 08:12:57 -07:00
Anis Eleuch	d274566463	race: Fix rare race detected by testing (#19872 ) Below is the race warning: ``` WARNING: DATA RACE Write at 0x00c02d3d27c0 by goroutine 1210: github.com/minio/minio/cmd.(healingTracker).bucketDone() github.com/minio/minio/cmd/background-newdisks-heal-ops.go:273 +0x13a github.com/minio/minio/cmd.(erasureObjects).healErasureSet() github.com/minio/minio/cmd/global-heal.go:525 +0x2158 github.com/minio/minio/cmd.healFreshDisk() github.com/minio/minio/cmd/background-newdisks-heal-ops.go:450 +0x107e github.com/minio/minio/cmd.monitorLocalDisksAndHeal.func1() github.com/minio/minio/cmd/background-newdisks-heal-ops.go:528 +0x150 github.com/minio/minio/cmd.monitorLocalDisksAndHeal.gowrap2() github.com/minio/minio/cmd/background-newdisks-heal-ops.go:538 +0x82 Previous read at 0x00c02d3d27c0 by goroutine 1446: github.com/minio/minio/cmd.(*erasureObjects).healErasureSet.func5() github.com/minio/minio/cmd/global-heal.go:232 +0xfd ```	2024-06-04 08:12:32 -07:00
Shubhendu	21b6204692	Test proxying of DEL marker for bucket replication (#19870 ) Make sure to avoid proxying for DEL markers Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-06-04 04:38:26 -07:00
Taran Pelkey	d98faeb26a	Check if LDAP User has attached policy before creating Service Account (#19843 ) Check if ldap user has policy before creating	2024-06-03 12:58:48 -07:00
Klaus Post	0a63dc199c	Add trace sizes to more trace types (#19864 ) Add trace sizes to * ILM traces * Replication traces * Healing traces * Decommission traces * Rebalance traces * (s)ftp traces * http traces.	2024-06-03 08:45:54 -07:00
Klaus Post	e72429c79c	Add sizes to traces (#19851 ) added to storage and grid traces. Can provide more context for traces that aren't HTTP. Others may apply.	2024-05-31 22:17:37 -07:00
Klaus Post	c5b3f5553f	Add per connection RPC metrics (#19852 ) Provides individual and aggregate stats for each RPC connection. Example: ``` "rpc": { "collectedAt": "2024-05-31T14:33:29.1373103+02:00", "connected": 30, "disconnected": 0, "outgoingStreams": 69, "incomingStreams": 0, "outgoingBytes": 174822796, "incomingBytes": 175821566, "outgoingMessages": 768595, "incomingMessages": 768589, "outQueue": 0, "lastPongTime": "2024-05-31T12:33:28Z", "byDestination": { "http://127.0.0.1:9001": { "collectedAt": "2024-05-31T14:33:29.1373103+02:00", "connected": 5, "disconnected": 0, "outgoingStreams": 2, "incomingStreams": 0, "outgoingBytes": 38432543, "incomingBytes": 66604052, "outgoingMessages": 229496, "incomingMessages": 229575, "outQueue": 0, "lastPongTime": "2024-05-31T12:33:27Z" }, "http://127.0.0.1:9002": { "collectedAt": "2024-05-31T14:33:29.1373103+02:00", "connected": 5, "disconnected": 0, "outgoingStreams": 6, "incomingStreams": 0, "outgoingBytes": 38215680, "incomingBytes": 66121283, "outgoingMessages": 228525, "incomingMessages": 228510, "outQueue": 0, "lastPongTime": "2024-05-31T12:33:27Z" }, ... ```	2024-05-31 22:16:24 -07:00
Klaus Post	d3ae0aaad3	Add max buffering to SFTP (#19848 ) Prevent OOM by adversarial use of SFTP upload by setting a 100MB max upload buffer.	2024-05-31 14:28:07 -07:00
Anis Eleuch	1277ad69a6	heal: Remove .healing.bin when all ES drives are healing (#19846 ) In the very rare case when all drives in a erasure set need to be healed, remove .healing.bin from all drives, otherwise it will be stuck in a loop Also, fix a unit test that fails sometimes due to wrong test.	2024-05-31 07:48:50 -07:00
Harshavardhana	8f93e81afb	change service account embedded policy size limit (#19840 ) Bonus: trim-off all the unnecessary spaces to allow for real 2048 characters in policies for STS handlers and re-use the code in all STS handlers.	2024-05-30 11:10:41 -07:00
Harshavardhana	4af31e654b	avoid pre-populating buffers for deployments < 32GiB memory (#19839 )	2024-05-30 04:58:12 -07:00
Harshavardhana	aad50579ba	fix: wire up ILM sub-system properly for help (#19836 )	2024-05-30 01:14:58 -07:00
Harshavardhana	38d059b0ae	fix: single node multi-drive must register local drives properly (#19832 ) since #19688 there was a regression introduced during drive lookups for single node multi-drive setups, drive replacement would not work correctly without this PR.	2024-05-29 13:12:44 -07:00
Klaus Post	bd4eeb4522	Fix flipped EcM, EcN in metadata header (#19831 ) Since this is a tuple encoded field we can just flip the struct members.	2024-05-29 12:14:09 -07:00
jiuker	03e3493288	fix: correct parse the tagging error for PostPolicyBucketHandler (#19825 )	2024-05-29 11:50:46 -07:00
Harshavardhana	64baedf5a4	fix: hide prefixes for Hadoop properly (#19821 )	2024-05-28 15:53:15 -07:00
Anis Eleuch	f79a4ef4d0	policy: More defensive code validating svc:DurationSeconds (#19820 ) This does not fix any current issue, but merging https://github.com/minio/madmin-go/pull/282 can lose the validation of the service account expiration time. Add more defensive code for now. In the future, we should avoid doing validation in another library.	2024-05-28 10:19:04 -07:00
Taran Pelkey	2d53854b19	Restrict access keys for users and groups to not allow '=' or ',' (#19749 ) * initial commit * Add UTF check --------- Co-authored-by: Harshavardhana <harsha@minio.io>	2024-05-28 10:14:16 -07:00
jiuker	c904ef966e	feat: support tags for PostPolicy upload (#19816 )	2024-05-27 21:44:00 -07:00
Harshavardhana	e0fe7cc391	fix: information disclosure bug in preconditions GET (#19810 ) precondition check was being honored before, validating if anonymous access is allowed on the metadata of an object, leading to metadata disclosure of the following headers. ``` Last-Modified Etag x-amz-version-id Expires: Cache-Control: ``` although the information presented is minimal in nature, and of opaque nature. It still simply discloses that an object by a specific name exists or not without even having enough permissions.	2024-05-27 12:17:46 -07:00
Harshavardhana	9d20dec56a	Revert "remove dataErrs from er.deleteIfDangling code" This reverts commit `7d75b1e758`. This fails multipart tests we need this code to handle existing challenges, so wait for the comprehensive fix.	2024-05-26 11:13:29 -07:00
Harshavardhana	597a785253	fix: authenticate LDAP via actual DN instead of normalized DN (#19805 ) fix: authenticate LDAP via actual DN instead of normalized DN Normalized DN is only for internal representation, not for external communication, any communication to LDAP must be based on actual user DN. LDAP servers do not understand normalized DN. fixes #19757	2024-05-25 06:43:06 -07:00
Harshavardhana	7d75b1e758	remove dataErrs from er.deleteIfDangling code avoid this until a comprehensive change is merged such as https://github.com/minio/minio/pull/19797	2024-05-24 18:20:04 -07:00
Aditya Manthramurthy	5f78691fcf	ldap: Add user DN attributes list config param (#19758 ) This change uses the updated ldap library in minio/pkg (bumped up to v3). A new config parameter is added for LDAP configuration to specify extra user attributes to load from the LDAP server and to store them as additional claims for the user. A test is added in sts_handlers.go that shows how to access the LDAP attributes as a claim. This is in preparation for adding SSH pubkey authentication to MinIO's SFTP integration.	2024-05-24 16:05:23 -07:00
Shireesh Anjal	a591e06ae5	Add cluster scanner metrics in metrics-v3 (#19517 ) endpoint: /minio/metrics/v3/cluster/scanner metrics: - bucket_scans_finished (counter) - bucket_scans_started (counter) - directories_scanned (counter) - last_activity_nano_seconds (gauge) - objects_scanned (counter) - versions_scanned (counter)	2024-05-24 12:29:25 -07:00
Harshavardhana	443c93c634	compute time spent in ILM properly (#19806 )	2024-05-24 12:28:51 -07:00
Shireesh Anjal	5659cddc84	Add cluster config metrics in metrics-v3 (#19507 ) endpoint: /minio/metrics/v3/cluster/config metrics: - write_quorum - rrs_parity - standard_parity	2024-05-24 05:50:46 -07:00
Shireesh Anjal	673a521711	Change endpoint of v3 notification metrics (#19804 ) from /cluster/notification to /notification	2024-05-24 04:10:24 -07:00
Shireesh Anjal	7981509cc8	Add cluster and bucket replication metrics in metrics-v3 (#19546 ) endpoint: /minio/metrics/v3/cluster/replication metrics: - average_active_workers - average_queued_bytes - average_queued_count - average_transfer_rate - current_active_workers - current_transfer_rate - last_minute_queued_bytes - last_minute_queued_count - max_active_workers - max_queued_bytes - max_queued_count - max_transfer_rate - recent_backlog_count endpoint: /minio/metrics/v3/api/bucket/replication metrics: - last_hour_failed_bytes - last_hour_failed_count - last_minute_failed_bytes - last_minute_failed_count - latency_ms - proxied_delete_tagging_requests_total - proxied_get_requests_failures - proxied_get_requests_total - proxied_get_tagging_requests_failures - proxied_get_tagging_requests_total - proxied_head_requests_failures - proxied_head_requests_total - proxied_put_tagging_requests_failures - proxied_put_tagging_requests_total - sent_bytes - sent_count - total_failed_bytes - total_failed_count - proxied_delete_tagging_requests_failures	2024-05-23 00:41:18 -07:00
Harshavardhana	d38e020b29	remove errant logs for disconnected remote (#19793 ) Signed-off-by: Harshavardhana <harsha@minio.io>	2024-05-22 18:12:23 -07:00
Poorna	7d29030292	fix list results returned for spark max-keys=2 listing (#19791 ) This PR continues fix #19725 for some unhandled cases	2024-05-22 16:16:34 -07:00
Shubhendu	7c7650b7c3	Add sufficient deadlines and countermeasures to handle hung node scenario (#19688 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2024-05-22 16:07:14 -07:00
Harshavardhana	ca80eced24	usage of deadline conn at Accept() breaks websocket (#19789 ) fortunately not wired up to use, however if anyone enables deadlines for conn then sporadically MinIO startups fail.	2024-05-22 10:49:27 -07:00
jiuker	9906b3ade9	fix: reject ilm rule when bucket LockEnabled (#19785 )	2024-05-21 23:50:03 -07:00
Anis Eleuch	bf1769d3e0	xl: Avoid marking a drive offline after one part read failure (#19779 ) This commit will fix one rare case of a multipart object that can be read in theory but GetObject API returned an error. It turned out that a six years old code was marking a drive offline when the bitrot streaming fails to read a part in a disk with any error. This can affect reading a subsequent part, though having enough shards, but unable to construct because one drive was marked offline earlier. This commit will remove the drive marking offline code. It will also close the bitrotstreaming reader before marking it as nil.	2024-05-21 07:36:21 -07:00
Harshavardhana	63e1ad9f29	fix: the user-agent for Veeam	2024-05-20 11:54:52 -07:00
Harshavardhana	1fd90c93ff	re-use StorageAPI while loading drive formats (#19770 ) Bonus: safe settings for deployment ID to avoid races	2024-05-19 01:06:49 -07:00
Krishnan Parthasarathi	1228d6bf1a	Return NumVersions in quorum when available (#19766 ) Similar to https://github.com/minio/minio/pull/17925	2024-05-17 13:57:37 -07:00
Shireesh Anjal	fc4561c64c	Start callhome immediately after enabling (#19764 ) Currently, on enabling callhome (or restarting the server), the callhome job gets scheduled. This means that one has to wait for 24hrs (the default frequency duration) to see it in action and to figure out if it is working as expected. It will be a better user experience to perform the first callhome execution immediately after enabling it (or on server start if already enabled). Also, generate audit event on callhome execution, setting the error field in case the execution has failed.	2024-05-17 09:53:34 -07:00
Klaus Post	3b7747b42b	Tweak multipart uploads (#19756 ) * Store ModTime in the upload ID; return it when listing instead of the current time. * Use this ModTime to expire and skip reading the file info. * Consistent upload sorting in listing (since it now has the ModTime). * Exclude healing disks to avoid returning an empty list.	2024-05-17 09:40:09 -07:00
Harshavardhana	e432e79324	avoid calling 'admin info' for disk, cpu, net metrics collection (#19762 ) resource metrics collection was incorrectly making fan-out liveness peer calls where it's not needed.	2024-05-17 08:15:13 -07:00
Harshavardhana	08d74819b6	handle racy updates to globalSite config (#19750 ) ``` ================== WARNING: DATA RACE Read at 0x0000082be990 by goroutine 205: github.com/minio/minio/cmd.setCommonHeaders() Previous write at 0x0000082be990 by main goroutine: github.com/minio/minio/cmd.lookupConfigs() ```	2024-05-16 16:13:47 -07:00
Poorna	aa3fde1784	Add ListObjectsV2 unit test (#19753 ) for PR: #19725	2024-05-15 20:40:51 -07:00
Harshavardhana	0b3eb7f218	add more deadlines and pass around context under most situations (#19752 )	2024-05-15 15:19:00 -07:00
Klaus Post	b792b36495	Add Veeam storage class override (#19748 ) Recent Veeam is very picky about storage class names. Add `_MINIO_VEEAM_FORCE_SC` env var. It will override the storage class returned by the storage backend if it is non-standard and we detect a Veeam client by checking the User Agent. Applies to HeadObject/GetObject/ListObject*	2024-05-15 11:04:16 -07:00
Harshavardhana	d3db7d31a3	fix: add deadlines for all synchronous REST callers (#19741 ) add deadlines that can be dynamically changed via the drive max timeout values. Bonus: optimize "file not found" case and hung drives/network - circuit break the check and return right away instead of waiting.	2024-05-15 09:52:29 -07:00
Shireesh Anjal	c05ca63158	Fix crash on /minio/metrics/v3?list (#19745 ) An unchecked map access was causing panic.	2024-05-15 09:06:35 -07:00
Shireesh Anjal	0e59e50b39	Capture ttfb api metrics only for GetObject (#19733 ) as that is the only API where the TTFB metric is beneficial, and capturing this for all APIs exponentially increases the response size in large clusters.	2024-05-14 23:25:13 -07:00
Klaus Post	d4b391de1b	Add PutObject Ring Buffer (#19605 ) Replace the `io.Pipe` from streamingBitrotWriter -> CreateFile with a fixed size ring buffer. This will add an output buffer for encoded shards to be written to disk - potentially via RPC. This will remove blocking when `(*streamingBitrotWriter).Write` is called, and it writes hashes and data. With current settings, the write looks like this: ``` Outbound ┌───────────────────┐ ┌────────────────┐ ┌───────────────┐ ┌────────────────┐ │ │ Parr. │ │ (http body) │ │ │ │ │ Bitrot Hash │ Write │ Pipe │ Read │ HTTP buffer │ Write (syscall) │ TCP Buffer │ │ Erasure Shard │ ──────────► │ (unbuffered) │ ────────────► │ (64K Max) │ ───────────────────► │ (4MB) │ │ │ │ │ │ (io.Copy) │ │ │ └───────────────────┘ └────────────────┘ └───────────────┘ └────────────────┘ ``` We write a Hash (32 bytes). Since the pipe is unbuffered, it will block until the 32 bytes have been delivered to the TCP buffer, and the next Read hits the Pipe. Then we write the shard data. This will typically be bigger than 64KB, so it will block until two blocks have been read from the pipe. When we insert a ring buffer: ``` Outbound ┌───────────────────┐ ┌────────────────┐ ┌───────────────┐ ┌────────────────┐ │ │ │ │ (http body) │ │ │ │ │ Bitrot Hash │ Write │ Ring Buffer │ Read │ HTTP buffer │ Write (syscall) │ TCP Buffer │ │ Erasure Shard │ ──────────► │ (2MB) │ ────────────► │ (64K Max) │ ───────────────────► │ (4MB) │ │ │ │ │ │ (io.Copy) │ │ │ └───────────────────┘ └────────────────┘ └───────────────┘ └────────────────┘ ``` The hash+shard will fit within the ring buffer, so writes will not block - but will complete after a memcopy. Reads can fill the 64KB buffer if there is data for it. If the network is congested, the ring buffer will become filled, and all syscalls will be on full buffers. Only when the ring buffer is filled will erasure coding start blocking. Since there is always "space" to write output data, we remove the parallel writing since we are always writing to memory now, and the goroutine synchronization overhead probably not worth taking. If the output were blocked in the existing, we would still wait for it to unblock in parallel write, so it would make no difference there - except now the ring buffer smoothes out the load. There are some micro-optimizations we could look at later. The biggest is that, in most cases, we could encode directly to the ring buffer - if we are not at a boundary. Also, "force filling" the Read requests (i.e., blocking until a full read can be completed) could be investigated and maybe allow concurrent memory on read and write.	2024-05-14 17:11:04 -07:00
Olli Janatuinen	534e7161df	SFTP: Correctly inform client about unsupported commands (#19735 )	2024-05-14 03:29:30 -07:00
Harshavardhana	9b219cd646	fix: return quorum based error, temporary failures must be ignored (#19732 )	2024-05-14 03:29:17 -07:00
Shireesh Anjal	3bab4822f3	Add logger webhook metrics in metrics-v3 (#19515 ) endpoint: /minio/metrics/v3/cluster/webhook metrics: - failed_messages (counter) - online (gauge) - queue_length (gauge) - total_messages (counter)	2024-05-14 00:27:33 -07:00
coderwander	3c5f2d8916	fix some typo in struct name comments (#19513 ) Signed-off-by: coderwander <770732124@qq.com>	2024-05-14 00:26:50 -07:00
Shireesh Anjal	5808190398	Add more metrics to v3/cluster/erasure-set (#19714 ) Metrics being added: - read_tolerance: No of drive failures that can be tolerated without disrupting read operations - write_tolerance: No of drive failures that can be tolerated without disrupting write operations - read_health: Health of the erasure set in a pool for read operations (1=healthy, 0=unhealthy) - write_health: Health of the erasure set in a pool for write operations (1=healthy, 0=unhealthy)	2024-05-14 00:25:56 -07:00
Shireesh Anjal	b2a82248b1	Move /system/go to /debug/go (#19707 )	2024-05-14 00:25:37 -07:00
Klaus Post	c36eaedb93	Re-add "Fix incorrect merging of slash-suffixed objects (#19729 ) Adds regression test for #19699 Failures are a bit luck based, since it requires objects to be placed on different sets. However this generates a failure prior to #19699 * Revert "Revert "Fix incorrect merging of slash-suffixed objects (#19699)"" This reverts commit `f30417d9a8`. * Don't override when suffix doesn't match. Instead rely on quorum for each.	2024-05-13 09:30:24 -07:00
Poorna	7752b03add	optimize max-keys=2 listing for spark workloads (#19725 ) to return results appropriately for versioned buckets, especially when underlying prefixes have been deleted	2024-05-13 07:57:42 -07:00
Shireesh Anjal	074d70112d	Consolidate drive health related metrics into single metric (#19706 ) Instead of having "online" and "healing" as two metrics, replace with a single metric "health" which can have following values: 0 = offline 1 = healthy 2 = healing	2024-05-12 10:23:50 -07:00
Harshavardhana	e8d14c0d90	verify preconditions during CompleteMultipart (#19713 ) Bonus: hold the write lock properly to apply optimistic concurrency during NewMultipartUpload()	2024-05-10 17:31:22 -07:00
Shireesh Anjal	60d7e8143a	Move /cluster/audit to /audit (#19708 ) As the audit metrics are server level and not overall cluster level.	2024-05-10 07:50:39 -07:00
Klaus Post	9667a170de	Add usage cache cleanup and lower forced top compaction (#19719 ) Lower forced compaction to 250K entries. If there is more than 250K entries on the top level force compact it and log an error.	2024-05-10 07:49:50 -07:00
Harshavardhana	b598402738	fix: unexpected credentials missing while passing	2024-05-09 18:41:38 -07:00
Harshavardhana	72ff69d9bb	add log-prefix name for specifying custom log-name (#19712 )	2024-05-09 14:29:37 -07:00
Harshavardhana	f30417d9a8	Revert "Fix incorrect merging of slash-suffixed objects (#19699 )" This reverts commit `2f7a10ab31`.	2024-05-09 12:32:05 -07:00
jiuker	47a4ad3cd7	fix: truncate Expiration to second when Add ServiceAccount (#19674 ) Truncate Expiration at the second when Add ServiceAccount	2024-05-09 11:08:04 -07:00
Klaus Post	2f7a10ab31	Fix incorrect merging of slash-suffixed objects (#19699 ) If two objects share everything but one object has a slash prefix, those would be merged in listings, with secondary properties used for a tiebreak. Example: An object with the key `prefix/obj` would be merged with an object named `prefix/obj/`. While this violates the [no object can be a prefix of another](https://min.io/docs/minio/linux/operations/concepts/thresholds.html#conflicting-objects), let's resolve these. If we have an object with 'name' and a directory named 'name/' discard the directory only - but allow objects of 'name' and 'name/' (xldir) to be uniquely returned. Regression from #15772	2024-05-09 11:05:45 -07:00
Harshavardhana	b534dc69ab	deprecate unexpected healing failed counters (#19705 ) simplify this to avoid verbose metrics, and make room for valid metrics to be reported for alerting etc.	2024-05-09 11:04:41 -07:00
Harshavardhana	7b7d2ea7d4	pass around correct endpoint while registering remote storage (#19710 )	2024-05-09 11:03:54 -07:00
Aditya Manthramurthy	e00de1c302	ldap-import: Add additional logs (#19691 ) These logs are being added to provide better debugging of LDAP normalization on IAM import.	2024-05-09 10:52:53 -07:00
Harshavardhana	3549e583a6	results must be a single channel to avoid overwriting `healing.bin` (#19702 )	2024-05-09 10:15:03 -07:00
Andi	f5e3eedf34	chore: use errors.New to replace fmt.Errorf with no parameters (#19568 ) Signed-off-by: ChengenH <hce19970702@gmail.com>	2024-05-09 01:44:07 -07:00
Harshavardhana	9a267f9270	allow caller context during reloads() to cancel (#19687 ) canceled callers might linger around longer, can potentially overwhelm the system. Instead provider a caller context and canceled callers don't hold on to them. Bonus: we have no reason to cache errors, we should never cache errors otherwise we can potentially have quorum errors creeping in unexpectedly. We should let the cache when invalidating hit the actual resources instead.	2024-05-08 17:51:34 -07:00
Klaus Post	ec49fff583	Accept multipart checksums with part count (#19680 ) Accept multipart uploads where the combined checksum provides the expected part count. It seems this was added by AWS to make the API more consistent, even if the data is entirely superfluous on multiple levels. Improves AWS S3 compatibility.	2024-05-08 09:18:34 -07:00
Andreas Auernhammer	8b660e18f2	kms: add support for MinKMS and remove some unused/broken code (#19368 ) This commit adds support for MinKMS. Now, there are three KMS implementations in `internal/kms`: Builtin, MinIO KES and MinIO KMS. Adding another KMS integration required some cleanup. In particular: - Various KMS APIs that haven't been and are not used have been removed. A lot of the code was broken anyway. - Metrics are now monitored by the `kms.KMS` itself. For basic metrics this is simpler than collecting metrics for external servers. In particular, each KES server returns its own metrics and no cluster-level view. - The builtin KMS now uses the same en/decryption implemented by MinKMS and KES. It still supports decryption of the previous ciphertext format. It's backwards compatible. - Data encryption keys now include a master key version since MinKMS supports multiple versions (~4 billion in total and 10000 concurrent) per key name. Signed-off-by: Andreas Auernhammer <github@aead.dev>	2024-05-07 16:55:37 -07:00
Harshavardhana	981497799a	return appropriate error upon reaching maxClients() (#19669 )	2024-05-07 13:41:56 -07:00
Olli Janatuinen	b413ff9fdb	Support user certificate based authentication on SFTP (#19650 )	2024-05-06 23:41:25 -07:00
Harshavardhana	6a15580817	fix: collect quorum errors for deletePrefix() (#19685 ) do not return error for single drive being offline.	2024-05-06 22:44:46 -07:00
Cesar N	39633a5581	Set Console Redirect URL env variable (#19683 )	2024-05-06 19:47:59 -07:00
Harshavardhana	888d2bb1d8	support ETag value to be '' (#19682 ) This supports '' as per behavior to comply with AWS S3 behavior for - 'If-Match: ' - 'If-None-Match: '	2024-05-06 17:08:42 -07:00
Klaus Post	847ee5ac45	Make WalkDir return errors (#19677 ) If used, 'opts.Marker` will cause many missed entries since results are returned unsorted, and pools are serialized. Switch to fully concurrent listing and merging across pools to return sorted entries.	2024-05-06 13:27:52 -07:00
jiuker	9a9a49aa84	fix: Ignore AWSAccessKeyId check for SignV2 policy condition (#19673 )	2024-05-06 03:52:41 -07:00
Harshavardhana	a03ca80269	support 'mc support perf object' with root login disabled (#19672 ) It is expected that whoever is using the credentials which has the proper set of permissions must be able to run. `mc support perf object` While the root login is disabled.	2024-05-06 02:45:10 -07:00
Harshavardhana	523bd769f1	add support for specific error response for InvalidRange (#19668 ) fixes #19648 AWS S3 returns the actual object size as part of XML response for InvalidRange error, this is used apparently by SDKs to retry the request without the range.	2024-05-05 09:56:21 -07:00
Harshavardhana	8ff70ea5a9	turn-off coloring if we have std{err,out} dumb terminals (#19667 )	2024-05-03 17:17:57 -07:00
Harshavardhana	da3e7747ca	avoid using 10MiB EC buffers in maxAPI calculations (#19665 ) max requests per node is more conservative in its value causing premature serialization of the calls, avoid it for newer deployments.	2024-05-03 13:08:20 -07:00
Klaus Post	4afb59e63f	fix: walk missing entries with opts.Marker set (#19661 ) 'opts.Marker` is causing many missed entries if used since results are returned unsorted. Also since pools are serialized. Switch to do fully concurrent listing and merging across pools to return sorted entries. Returning errors on listings is impossible with the current API, so document that. Return an error at once if no drives are found instead of just returning an empty listing and no error.	2024-05-03 10:26:51 -07:00

... 3 4 5 6 7 ...

6493 Commits