FFmpeg/libavfilter/x86
Jun Zhao 91ae6d10ab lavfi/nlmeans: add aarch64 neon for compute_weights_line
Implement NEON optimization for compute_weights_line.

Also update the function signature to use ptrdiff_t for stack arguments
(max_meaningful_diff, startx, endx). This is done to unify the stack
layout between Apple platforms (which pack 32-bit stack arguments tightly)
and the generic AAPCS64 ABI (which requires 8-byte stack slots for 32-bit
arguments). Using ptrdiff_t ensures 8-byte slots are used on all AArch64
platforms, avoiding ABI mismatches with the assembly implementation.

The x86 AVX2 prototype is updated to match the new signature.

Performance benchmark (AArch64) in MacOS M4:
./tests/checkasm/checkasm --test=vf_nlmeans --bench
compute_weights_line_c:     151.1 ( 1.00x)
compute_weights_line_neon:  62.6 ( 2.42x)

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-09 16:10:10 +00:00
..
Makefile
af_afir.asm
af_afir_init.c
af_anlmdn.asm
af_anlmdn_init.c
af_volume.asm
af_volume_init.c
avf_showcqt.asm
avf_showcqt_init.c
colorspacedsp.asm
colorspacedsp_init.c
f_ebur128.asm
f_ebur128_init.c
scene_sad.asm
scene_sad_init.c
vf_atadenoise.asm
vf_atadenoise_init.c
vf_blackdetect.asm
vf_blackdetect_init.c
vf_blend.asm
vf_blend_init.c
vf_bwdif.asm
vf_bwdif_init.c
vf_colordetect.asm
vf_colordetect_init.c
vf_convolution.asm
vf_convolution_init.c
vf_eq.asm
vf_eq_init.c
vf_framerate.asm
vf_framerate_init.c
vf_fspp.asm
vf_fspp_init.c
vf_gblur.asm
vf_gblur_init.c
vf_gradfun.asm
vf_gradfun_init.c
vf_hflip.asm
vf_hflip_init.c
vf_hqdn3d.asm
vf_hqdn3d_init.c
vf_idetdsp.asm
vf_idetdsp_init.c
vf_interlace.asm
vf_limiter.asm
vf_limiter_init.c
vf_lut3d.asm
vf_lut3d_init.c
vf_maskedclamp.asm
vf_maskedclamp_init.c
vf_maskedmerge.asm
vf_maskedmerge_init.c
vf_nlmeans.asm
vf_nlmeans_init.c
vf_noise.c
vf_overlay.asm
vf_overlay_init.c
vf_pp7.asm
vf_pp7_init.c
vf_psnr.asm
vf_psnr_init.c
vf_pullup.asm
vf_pullup_init.c
vf_removegrain.asm
vf_removegrain_init.c
vf_spp.c
vf_ssim.asm
vf_ssim_init.c
vf_stereo3d.asm
vf_stereo3d_init.c
vf_threshold.asm
vf_threshold_init.c
vf_tinterlace_init.c
vf_transpose.asm
vf_transpose_init.c
vf_v360.asm
vf_v360_init.c
vf_w3fdif.asm
vf_w3fdif_init.c
vf_yadif.asm
vf_yadif_init.c
yadif-10.asm
yadif-16.asm