If you are in a rootless environment using chroot builds, you are
likely to get failures when mounting /sys file systems onto your
container. The problem is certain directories are not able to be
mounted on by rootless users. Since we are logging at Warn level
now, and users can not do anything to fix this situation, I am
dropping this message to info.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
It should work fine on linux and not linux boxes. Since there
is no glibc added, we can safely compile and run this code
on non SELinux boxes.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2644: chroot: fix handling of errno seccomp rules r=rhatdan a=nalind
#### What type of PR is this?
/kind bug
#### What this PR does / why we need it:
When converting seccomp rules from the runtime spec to the structure that we can feed to libseccomp, combine the prescribed errno value with the action when we're mapping the "return an errno" action from one to the other.
#### How to verify it
Currently, chroot isolation hits an error processing this seccomp rule:
```
{
"names": [
"socket"
],
"action": "SCMP_ACT_ERRNO",
"args": [
{
"index": 0,
"value": 16,
"valueTwo": 0,
"op": "SCMP_CMP_EQ"
},
{
"index": 2,
"value": 9,
"valueTwo": 0,
"op": "SCMP_CMP_EQ"
}
],
"comment": "",
"includes": {},
"excludes": {
"caps": [
"CAP_AUDIT_WRITE"
]
},
"errnoRet": 22
},
```
on Fedora 33.
#### Which issue(s) this PR fixes:
None
#### Special notes for your reviewer:
Definitely going to need to backport this to older branches.
#### Does this PR introduce a user-facing change?
```
None
```
Co-authored-by: Nalin Dahyabhai <nalin@redhat.com>
Create the target mountpoints for bind mounts, when they don't already
exist, with 0755 permissions, for better consistency with runc.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
When converting seccomp rules from the runtime spec to the structure
that we can feed to libseccomp, combine the prescribed errno value with
the action when we're mapping the "return an errno" action from one to
the other.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
if setgroups is blocked to set up the user namespace, do not attempt
to use it to clear the additional groups.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
When a seccomp rule includes multiple equality checks for the same
argument for a syscall, they can never ALL be satisfied. Because that's
how they're supposed to be treated, libseccomp returns an error when we
try to add them as part of the same conditional rule. Try to detect
this exact case, and if we detect it, treat each condition as its own
rule.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #2105
Approved by: rhatdan
We have moved share code from buildah, podman and others into containers/common.
Specifically for this PR we are moving to use containers/common/pkg/unshare and
containers/common/pkg/cgroups.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #2010
Approved by: QiWang19
Unmounting the rootfs with MNT_DETACH should unmount everything below
it, so we don't need to use the more exhaustive method that our bind
package uses for its bind mounts.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #1996
Approved by: rhatdan
If a masked object is already a /dev/null device then don't mask over it.
This logic is backwords and is breaking SELinux.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1776
Approved by: @TomSweeneyRedHat
This commit enabled to golint linter in golangci-lint and applies all
necessary fixes.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Closes: #1740
Approved by: rhatdan
This commit enabled the errcheck linter and fixes an uncovered stat to
`os.DevNull`. Beside this, we disable go modules within the
`tests/tools/Makefile` to allow independent offline builds.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Closes: #1713
Approved by: vrothberg
We do not want to mount /dev/null over a masked path, if the path is
already /dev/null.
This prevents an containers running buildah from requiring additional privs
to mount on a /dev/null, when the target is already mounted.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1705
Approved by: TomSweeneyRedHat
The last remaining function is not being used anymore.
Reported by golangci-lint.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Closes: #1678
Approved by: rhatdan
errors.Wrap(err) and friends will return nil if err is nil, so make
setting the error conditional.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Closes: #1624
Approved by: TomSweeneyRedHat
make the stdin pipe not blocking, so that it won't hang if the other
end is not reading from it.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1668
Approved by: rhatdan
Checks to see if the $HOME envvar has been set
and if not, trys to set it as best as possible.
Fixes: #1592
Signed-off-by: TomSweeneyRedHat <tsweeney@redhat.com>
Closes: #1594
Approved by: rhatdan
Overlay mounts allow buildah bud and buildah from to
specify a directory on the disk that will be mounted
as an overlay into the container, where the overlay can be written to
but when the RUN or buildah run exits, the modified files will dissapear.
The basic idea is to be able to mount cache from the disk for things like yum/dnf/apt
to be able to be used and modified in the contianer on a run command, but to be
kept fresh for each RUN.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1560
Approved by: giuseppe
When we're built with support for SELinux, refrain from setting process
and mount labels if SELinux isn't detected as enabled at runtime.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #1542
Approved by: rhatdan
This will make vendoring in pkg/unshare easier into other
packages like skopeo.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1532
Approved by: TomSweeneyRedHat
For some reason, the CI does not report any of these; on macOS
I see many more reports (including complaints about the standard
library), this only cleans up the trivial cases.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Closes: #1365
Approved by: rhatdan
When reading the last of the output from a child process, ignore an EIO,
since we already got the HUP indication.
Avoid double-logging errors in our I/O loop when using isolation other
than chroot (spotted by @afbjorklund).
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #1273
Approved by: rhatdan
Move the setting of capabilites and the seccomp filter to after we've
set the supplemental groups list and set our primary GID.
Set capabilities after we set the seccomp filter, because we won't be
able to set a filter if we're dropping CAP_SYS_ADMIN. Set them as the
very last thing before dropping to the runtime UID. Leave CAP_SETUID in
if we're going to become an unprivileged user, so that we'll be allowed
to switch UIDs -- the capability will be dropped then anyway.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #1069
Approved by: rhatdan
Correctly handle setting capabilities: the Clear() and Apply() methods
on the Capabilities interface take a bitmask of capability kinds, not
specific capability types.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #1069
Approved by: rhatdan
When running with chroot isolation, only create a new user namespace
when we have mappings to set.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #1069
Approved by: rhatdan
When ensuring that the target for a volume mount is present, be sure to
create any leading directories which are also not yet present.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #997
Approved by: rhatdan
When we're polling to handle stdio for a container, when we detect a HUP
on our stdin, read all that we can from stdin before closing it, instead
of reading only, at most, a single chunk of bytes.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #980
Approved by: rhatdan
Make the chroot() call before applying a seccomp filter, which might not
allow us to do it. Add more debugging messages.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #979
Approved by: rhatdan
In chroot isolation, when we attempt to mask a directory, use a
read-only bind mount of an empty directory instead of a read-only mount
of a fresh tmpfs with size=0, which is more likely to be be denied by
mandatory access controls.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #923
Approved by: rhatdan
The OOM score adjustment is an optional field in the runtime spec, so
only try to set it if it's set in the spec.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #906
Approved by: rhatdan
When using chroot isolation, if we're configured to raise any process
limits above their current values, do so in the grandparent process,
before it transfers execution to a child that it starts in a user
namespace, which won't have the privileges to do so.
The child can still lower resource limits and set limits to the values
that it inherited, so let it continue to do so.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #891
Approved by: rhatdan
Move the resource limits name map out of the setRlimits() function, and
use it to set up a reverse of the same map in init().
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #891
Approved by: rhatdan
When we're run by an unprivileged user, default to BUILDAH_ISOLATION=chroot.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #836
Approved by: rhatdan
Add an IsolationChroot that trades flexibility and isolation for being
able to do what it does in a host environment that's already isolated to
the point where we're not allowed to set up some of that isolation,
producing a result that leans more toward chroot(1) than runc(1) does.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Closes: #836
Approved by: rhatdan