This saves about 128 MB of baseline RAM usage per Unicorn and
Sidekiq process (!).
Linguist wasn't detecting languages anymore from CE/EE since
9ae8b57467. However, Linguist::BlobHelper
was still being depended on by BlobLike and others.
This removes the Linguist gem, given it isn't required anymore.
EscapeUtils were pulled in as dependency, but given Banzai depends on
it, it is now added explicitly.
Previously, Linguist was used to detect the best ACE mode. Instead,
we rely on ACE to guess the best mode based on the file extension.
Cleanup code, and refactor tests that still use Rugged. After this, there should
be no Rugged code that access the instance's repositories on non-test
environments. There is still some rugged code for other tasks like the
repository import task, but since it doesn't access any repository storage path
it can stay.
Previously, this wasn't needed: text was normally set to the highlighted
contents anyway. Now, it is: we store different things in text and rich_text.
This caused https://gitlab.com/gitlab-com/production/issues/439.
Conflicts used to take a `Repository` and pass that to
`Gitlab::Highlight.highlight`, which would call `#gitattribute` on the
repository. Now they use a `Gitlab::Git::Repository`, which didn't have that
method defined - but defining it on `Gitlab::Git::Repository` does make it
available on `Repository` through `method_missing`, so we can do that and both
cases will work.
I don't know why this happens exactly, but given an upstream and fork repository
from a customer, both of which required GC, resolving conflicts would corrupt
the fork so badly that it couldn't be cloned.
This isn't a perfect fix for that case, because the MR may still need to be
merged manually, but it does ensure that the repository is at least usable.
My best guess is that when we generate the index for the conflict
resolution (which we previously did in the target project), we obtain a
reference to an OID that doesn't exist in the source, even though we already
fetch the refs from the target into the source.
Explicitly setting the source project as the place to get the merge index from
seems to prevent repository corruption in this way.
We wanted to check that the text could be encoded as JSON, because
conflict resolutions are passed back and forth in that format, so the
file itself must be UTF-8. However, all strings from the repository come
back without an encoding from Rugged, making them ASCII_8BIT.
We force to UTF-8, and reject if it's invalid. This still leaves the
problem of a file that 'looks like' UTF-8 (contains valid UTF-8 byte
sequences), but isn't. However:
1. If the conflicts contain the problem bytes, the user will see that
the file isn't displayed correctly.
2. If the problem bytes are outside of the conflict area, then we will
write back the same bytes when we resolve the conflicts, even though
we though the encoding was UTF-8.
These can't be resolved in the UI because if they aren't in a UTF-8
compatible encoding, they can't be rendered as JSON. Even if they could,
we would be implicitly changing the file encoding anyway, which seems
like a bad idea.
- Add match line header to expected result for `File#sections`.
- Lowercase CSS colours.
- Remove unused `diff_refs` keyword argument.
- Rename `parent` -> `parent_file`, to be more explicit.
- Skip an iteration when highlighting.