Zhao Zhili
|
4d90a76986
|
swscale/aarch64: Add argb/abgr to yuv
Test on Apple M1 with kperf:
: -O3 : -O3 -fno-vectorize
abgr_to_uv_8_c : 19.4 : 26.1
abgr_to_uv_8_neon : 29.9 : 51.1
abgr_to_uv_128_c : 146.4 : 558.9
abgr_to_uv_128_neon : 85.1 : 83.4
abgr_to_uv_1080_c : 1162.6 : 4786.4
abgr_to_uv_1080_neon : 819.6 : 826.6
abgr_to_uv_1920_c : 2063.6 : 8492.1
abgr_to_uv_1920_neon : 1435.1 : 1447.1
abgr_to_uv_half_8_c : 16.4 : 11.4
abgr_to_uv_half_8_neon : 35.6 : 20.4
abgr_to_uv_half_128_c : 108.6 : 359.4
abgr_to_uv_half_128_neon : 75.4 : 42.6
abgr_to_uv_half_1080_c : 883.4 : 2885.6
abgr_to_uv_half_1080_neon : 460.6 : 481.1
abgr_to_uv_half_1920_c : 1553.6 : 5106.9
abgr_to_uv_half_1920_neon : 817.6 : 820.4
abgr_to_y_8_c : 6.1 : 26.4
abgr_to_y_8_neon : 40.6 : 6.4
abgr_to_y_128_c : 99.9 : 390.1
abgr_to_y_128_neon : 67.4 : 55.9
abgr_to_y_1080_c : 735.9 : 3170.4
abgr_to_y_1080_neon : 534.6 : 536.6
abgr_to_y_1920_c : 1279.4 : 6016.4
abgr_to_y_1920_neon : 932.6 : 927.6
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
|
2024-07-05 16:32:31 +08:00 |
Zhao Zhili
|
52422133ae
|
swscale/aarch64: Add bgra/rgba to yuv
Test on Apple M1 with kperf
: -O3 : -O3 -fno-vectorize
bgra_to_uv_8_c : 13.4 : 27.5
bgra_to_uv_8_neon : 37.4 : 41.7
bgra_to_uv_128_c : 155.9 : 550.2
bgra_to_uv_128_neon : 91.7 : 92.7
bgra_to_uv_1080_c : 1173.2 : 4558.2
bgra_to_uv_1080_neon : 822.7 : 809.5
bgra_to_uv_1920_c : 2078.2 : 8115.2
bgra_to_uv_1920_neon : 1437.7 : 1438.7
bgra_to_uv_half_8_c : 17.9 : 14.2
bgra_to_uv_half_8_neon : 37.4 : 10.5
bgra_to_uv_half_128_c : 103.9 : 326.0
bgra_to_uv_half_128_neon : 73.9 : 68.7
bgra_to_uv_half_1080_c : 850.2 : 3732.0
bgra_to_uv_half_1080_neon : 484.2 : 490.0
bgra_to_uv_half_1920_c : 1479.2 : 4942.7
bgra_to_uv_half_1920_neon : 824.2 : 824.7
bgra_to_y_8_c : 8.2 : 29.5
bgra_to_y_8_neon : 18.2 : 32.7
bgra_to_y_128_c : 101.4 : 361.5
bgra_to_y_128_neon : 74.9 : 73.7
bgra_to_y_1080_c : 739.4 : 3018.0
bgra_to_y_1080_neon : 613.4 : 544.2
bgra_to_y_1920_c : 1298.7 : 5326.0
bgra_to_y_1920_neon : 918.7 : 934.2
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
|
2024-07-05 16:32:31 +08:00 |
Zhao Zhili
|
b8b71be07a
|
swscale/aarch64: Add bgr24 to yuv
Test on Apple M1 with kperf
: -O3 : -O3 -fno-vectorize
bgr24_to_uv_8_c : 28.5 : 52.5
bgr24_to_uv_8_neon : 54.5 : 59.7
bgr24_to_uv_128_c : 294.0 : 830.7
bgr24_to_uv_128_neon : 99.7 : 112.0
bgr24_to_uv_1080_c : 965.0 : 6624.0
bgr24_to_uv_1080_neon : 751.5 : 754.7
bgr24_to_uv_1920_c : 1693.2 : 11554.5
bgr24_to_uv_1920_neon : 1292.5 : 1307.5
bgr24_to_uv_half_8_c : 54.2 : 37.0
bgr24_to_uv_half_8_neon : 27.2 : 22.5
bgr24_to_uv_half_128_c : 127.2 : 392.5
bgr24_to_uv_half_128_neon : 63.0 : 52.0
bgr24_to_uv_half_1080_c : 880.2 : 3329.0
bgr24_to_uv_half_1080_neon : 401.5 : 390.7
bgr24_to_uv_half_1920_c : 1585.7 : 6390.7
bgr24_to_uv_half_1920_neon : 694.7 : 698.7
bgr24_to_y_8_c : 21.7 : 22.5
bgr24_to_y_8_neon : 797.2 : 25.5
bgr24_to_y_128_c : 88.0 : 280.5
bgr24_to_y_128_neon : 63.7 : 55.0
bgr24_to_y_1080_c : 616.7 : 2208.7
bgr24_to_y_1080_neon : 900.0 : 452.0
bgr24_to_y_1920_c : 1093.2 : 3894.7
bgr24_to_y_1920_neon : 777.2 : 767.5
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
|
2024-07-05 16:32:31 +08:00 |
Zhao Zhili
|
9dac8495b0
|
swscale/aarch64: Add rgb24 to yuv implementation
Test on Apple M1:
rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0.2
rgb24_to_uv_half_8_neon: 0.2
rgb24_to_uv_half_128_c: 1.0
rgb24_to_uv_half_128_neon: 0.5
rgb24_to_uv_half_1080_c: 6.2
rgb24_to_uv_half_1080_neon: 3.0
rgb24_to_uv_half_1920_c: 11.2
rgb24_to_uv_half_1920_neon: 5.2
rgb24_to_y_8_c: 0.2
rgb24_to_y_8_neon: 0.0
rgb24_to_y_128_c: 0.5
rgb24_to_y_128_neon: 0.5
rgb24_to_y_1080_c: 4.7
rgb24_to_y_1080_neon: 3.2
rgb24_to_y_1920_c: 8.0
rgb24_to_y_1920_neon: 5.7
On Pixel 6:
rgb24_to_uv_8_c: 30.7
rgb24_to_uv_8_neon: 56.9
rgb24_to_uv_128_c: 213.9
rgb24_to_uv_128_neon: 173.2
rgb24_to_uv_1080_c: 1649.9
rgb24_to_uv_1080_neon: 1424.4
rgb24_to_uv_1920_c: 2907.9
rgb24_to_uv_1920_neon: 2480.7
rgb24_to_uv_half_8_c: 36.2
rgb24_to_uv_half_8_neon: 33.4
rgb24_to_uv_half_128_c: 167.9
rgb24_to_uv_half_128_neon: 99.4
rgb24_to_uv_half_1080_c: 1293.9
rgb24_to_uv_half_1080_neon: 778.7
rgb24_to_uv_half_1920_c: 2292.7
rgb24_to_uv_half_1920_neon: 1328.7
rgb24_to_y_8_c: 19.7
rgb24_to_y_8_neon: 27.7
rgb24_to_y_128_c: 129.9
rgb24_to_y_128_neon: 96.7
rgb24_to_y_1080_c: 995.4
rgb24_to_y_1080_neon: 767.7
rgb24_to_y_1920_c: 1747.4
rgb24_to_y_1920_neon: 1337.2
Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
|
2024-06-11 01:12:09 +08:00 |