Commit Graph

4 Commits

Author SHA1 Message Date
Zhao Zhili 4d90a76986 swscale/aarch64: Add argb/abgr to yuv
Test on Apple M1 with kperf:
				: -O3		: -O3 -fno-vectorize
abgr_to_uv_8_c			: 19.4		: 26.1
abgr_to_uv_8_neon		: 29.9		: 51.1
abgr_to_uv_128_c		: 146.4		: 558.9
abgr_to_uv_128_neon		: 85.1		: 83.4
abgr_to_uv_1080_c		: 1162.6	: 4786.4
abgr_to_uv_1080_neon		: 819.6		: 826.6
abgr_to_uv_1920_c		: 2063.6	: 8492.1
abgr_to_uv_1920_neon		: 1435.1	: 1447.1
abgr_to_uv_half_8_c		: 16.4		: 11.4
abgr_to_uv_half_8_neon		: 35.6		: 20.4
abgr_to_uv_half_128_c		: 108.6		: 359.4
abgr_to_uv_half_128_neon	: 75.4		: 42.6
abgr_to_uv_half_1080_c		: 883.4		: 2885.6
abgr_to_uv_half_1080_neon	: 460.6		: 481.1
abgr_to_uv_half_1920_c		: 1553.6	: 5106.9
abgr_to_uv_half_1920_neon	: 817.6		: 820.4
abgr_to_y_8_c			: 6.1		: 26.4
abgr_to_y_8_neon		: 40.6		: 6.4
abgr_to_y_128_c			: 99.9		: 390.1
abgr_to_y_128_neon		: 67.4		: 55.9
abgr_to_y_1080_c		: 735.9		: 3170.4
abgr_to_y_1080_neon		: 534.6		: 536.6
abgr_to_y_1920_c		: 1279.4	: 6016.4
abgr_to_y_1920_neon		: 932.6		: 927.6

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 16:32:31 +08:00
Zhao Zhili 52422133ae swscale/aarch64: Add bgra/rgba to yuv
Test on Apple M1 with kperf
				: -O3		: -O3 -fno-vectorize
bgra_to_uv_8_c			: 13.4		: 27.5
bgra_to_uv_8_neon		: 37.4		: 41.7
bgra_to_uv_128_c		: 155.9		: 550.2
bgra_to_uv_128_neon		: 91.7		: 92.7
bgra_to_uv_1080_c		: 1173.2	: 4558.2
bgra_to_uv_1080_neon		: 822.7		: 809.5
bgra_to_uv_1920_c		: 2078.2	: 8115.2
bgra_to_uv_1920_neon		: 1437.7	: 1438.7
bgra_to_uv_half_8_c		: 17.9		: 14.2
bgra_to_uv_half_8_neon		: 37.4		: 10.5
bgra_to_uv_half_128_c		: 103.9		: 326.0
bgra_to_uv_half_128_neon	: 73.9		: 68.7
bgra_to_uv_half_1080_c		: 850.2		: 3732.0
bgra_to_uv_half_1080_neon	: 484.2		: 490.0
bgra_to_uv_half_1920_c		: 1479.2	: 4942.7
bgra_to_uv_half_1920_neon	: 824.2		: 824.7
bgra_to_y_8_c			: 8.2		: 29.5
bgra_to_y_8_neon		: 18.2		: 32.7
bgra_to_y_128_c			: 101.4		: 361.5
bgra_to_y_128_neon		: 74.9		: 73.7
bgra_to_y_1080_c		: 739.4		: 3018.0
bgra_to_y_1080_neon		: 613.4		: 544.2
bgra_to_y_1920_c		: 1298.7	: 5326.0
bgra_to_y_1920_neon		: 918.7		: 934.2

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 16:32:31 +08:00
Zhao Zhili b8b71be07a swscale/aarch64: Add bgr24 to yuv
Test on Apple M1 with kperf
				: -O3		: -O3 -fno-vectorize
bgr24_to_uv_8_c			: 28.5		: 52.5
bgr24_to_uv_8_neon		: 54.5		: 59.7
bgr24_to_uv_128_c		: 294.0		: 830.7
bgr24_to_uv_128_neon		: 99.7		: 112.0
bgr24_to_uv_1080_c		: 965.0		: 6624.0
bgr24_to_uv_1080_neon		: 751.5		: 754.7
bgr24_to_uv_1920_c		: 1693.2	: 11554.5
bgr24_to_uv_1920_neon		: 1292.5	: 1307.5
bgr24_to_uv_half_8_c		: 54.2		: 37.0
bgr24_to_uv_half_8_neon		: 27.2		: 22.5
bgr24_to_uv_half_128_c		: 127.2		: 392.5
bgr24_to_uv_half_128_neon	: 63.0		: 52.0
bgr24_to_uv_half_1080_c		: 880.2		: 3329.0
bgr24_to_uv_half_1080_neon	: 401.5		: 390.7
bgr24_to_uv_half_1920_c		: 1585.7	: 6390.7
bgr24_to_uv_half_1920_neon	: 694.7		: 698.7
bgr24_to_y_8_c			: 21.7		: 22.5
bgr24_to_y_8_neon		: 797.2		: 25.5
bgr24_to_y_128_c		: 88.0		: 280.5
bgr24_to_y_128_neon		: 63.7		: 55.0
bgr24_to_y_1080_c		: 616.7		: 2208.7
bgr24_to_y_1080_neon		: 900.0		: 452.0
bgr24_to_y_1920_c		: 1093.2	: 3894.7
bgr24_to_y_1920_neon		: 777.2		: 767.5

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 16:32:31 +08:00
Zhao Zhili 9dac8495b0 swscale/aarch64: Add rgb24 to yuv implementation
Test on Apple M1:

rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0.2
rgb24_to_uv_half_8_neon: 0.2
rgb24_to_uv_half_128_c: 1.0
rgb24_to_uv_half_128_neon: 0.5
rgb24_to_uv_half_1080_c: 6.2
rgb24_to_uv_half_1080_neon: 3.0
rgb24_to_uv_half_1920_c: 11.2
rgb24_to_uv_half_1920_neon: 5.2
rgb24_to_y_8_c: 0.2
rgb24_to_y_8_neon: 0.0
rgb24_to_y_128_c: 0.5
rgb24_to_y_128_neon: 0.5
rgb24_to_y_1080_c: 4.7
rgb24_to_y_1080_neon: 3.2
rgb24_to_y_1920_c: 8.0
rgb24_to_y_1920_neon: 5.7

On Pixel 6:

rgb24_to_uv_8_c: 30.7
rgb24_to_uv_8_neon: 56.9
rgb24_to_uv_128_c: 213.9
rgb24_to_uv_128_neon: 173.2
rgb24_to_uv_1080_c: 1649.9
rgb24_to_uv_1080_neon: 1424.4
rgb24_to_uv_1920_c: 2907.9
rgb24_to_uv_1920_neon: 2480.7
rgb24_to_uv_half_8_c: 36.2
rgb24_to_uv_half_8_neon: 33.4
rgb24_to_uv_half_128_c: 167.9
rgb24_to_uv_half_128_neon: 99.4
rgb24_to_uv_half_1080_c: 1293.9
rgb24_to_uv_half_1080_neon: 778.7
rgb24_to_uv_half_1920_c: 2292.7
rgb24_to_uv_half_1920_neon: 1328.7
rgb24_to_y_8_c: 19.7
rgb24_to_y_8_neon: 27.7
rgb24_to_y_128_c: 129.9
rgb24_to_y_128_neon: 96.7
rgb24_to_y_1080_c: 995.4
rgb24_to_y_1080_neon: 767.7
rgb24_to_y_1920_c: 1747.4
rgb24_to_y_1920_neon: 1337.2

Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:12:09 +08:00