Change #265654
| Category | ffmpeg |
| Changed by | Jeongkeun Kim <variety0724@gmail.com> |
| Changed at | Mon 27 Apr 2026 22:13:23 |
| Repository | https://git.ffmpeg.org/ffmpeg.git |
| Project | ffmpeg |
| Branch | master |
| Revision | 4ea59d5665b7961eab736d036d95ac8f1dea39ba |
Comments
avcodec/aarch64: add NEON DCA LFE FIR filter functions Port lfe_fir0_float and lfe_fir1_float to AArch64 NEON. These polyphase FIR interpolation filters have an x86 SSE/AVX path but no AArch64 equivalent, falling back to scalar C. The inner loop computes two dot products per output pair. Precomputing a reversed LFE sample vector before the inner loop avoids per-iteration shuffle overhead. Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge): lfe_fir0_float: C 5902.0 cycles -> NEON 2135.0 cycles (2.77x) lfe_fir1_float: C 2836.3 cycles -> NEON 1527.8 cycles (1.86x) Measured with: taskset -c 0 ./tests/checkasm/checkasm --test=dcadsp --bench, 3-run average, Ubuntu 22.04 (kernel 6.8.0-1052-aws), perf_event_paranoid=0. Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>
Changed files
- libavcodec/aarch64/Makefile
- libavcodec/aarch64/dcadsp_init_aarch64.c
- libavcodec/aarch64/dcadsp_neon.S
- libavcodec/dcadsp.c
- libavcodec/dcadsp.h