Change #265654

Category	ffmpeg
Changed by	Jeongkeun Kim <variety0724ohnoyoudont@gmail.com>
Changed at	Mon 27 Apr 2026 22:13:23
Repository	https://git.ffmpeg.org/ffmpeg.git
Project	ffmpeg
Branch	master
Revision	4ea59d5665b7961eab736d036d95ac8f1dea39ba

Comments

avcodec/aarch64: add NEON DCA LFE FIR filter functions
Port lfe_fir0_float and lfe_fir1_float to AArch64 NEON. These polyphase
FIR interpolation filters have an x86 SSE/AVX path but no AArch64
equivalent, falling back to scalar C.

The inner loop computes two dot products per output pair. Precomputing a
reversed LFE sample vector before the inner loop avoids per-iteration
shuffle overhead.

Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge):
  lfe_fir0_float: C 5902.0 cycles -> NEON 2135.0 cycles (2.77x)
  lfe_fir1_float: C 2836.3 cycles -> NEON 1527.8 cycles (1.86x)
Measured with: taskset -c 0 ./tests/checkasm/checkasm --test=dcadsp --bench,
3-run average, Ubuntu 22.04 (kernel 6.8.0-1052-aws), perf_event_paranoid=0.

Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>

Changed files

libavcodec/aarch64/Makefile
libavcodec/aarch64/dcadsp_init_aarch64.c
libavcodec/aarch64/dcadsp_neon.S
libavcodec/dcadsp.c
libavcodec/dcadsp.h