Change #269829
| Category | ffmpeg |
| Changed by | Jun Zhao <barryjzhao@tencent.com> |
| Changed at | Mon 08 Jun 2026 01:29:33 |
| Repository | https://git.ffmpeg.org/ffmpeg.git |
| Project | ffmpeg |
| Branch | master |
| Revision | cfa3ceac7a8cb4f0c836b938a25d1579da154ed5 |
Comments
lavc/hevc: add aarch64 NEON for angular modes 10 and 26
Add NEON-optimized implementations for HEVC angular intra prediction
modes 10 (pure horizontal) and 26 (pure vertical) at 8-bit depth.
Mode 10 (Horizontal):
- Broadcasts left[y] to fill each row using ld2r/ld4r for efficiency
- Applies edge smoothing for luma blocks smaller than 32x32
Mode 26 (Vertical):
- Copies top reference row to all output rows
- Applies edge smoothing for luma blocks smaller than 32x32
Edge smoothing uses uhsub+usqadd to compute the filtered result
directly in 8-bit, avoiding widening to 16-bit intermediates.
The C pred_angular wrappers are made non-static with ff_ prefix to
allow the NEON dispatch to fall back to C for modes not yet optimized.
This will be reverted once all angular modes are implemented.
Note: since pred_angular[] is a per-size function pointer (not
per-mode), checkasm benchmarks will show '_neon' for all 33 modes
even though only modes 10/26 are truly accelerated; unoptimized
modes show ~1.0x speedup as they pass through the NEON wrapper to
the C fallback with negligible overhead.
Speedup over C on Apple M4 (checkasm --bench, 15-run average):
Mode 10 (Horizontal):
4x4: 4.66x 8x8: 5.80x 16x16: 16.86x 32x32: 24.89x
Mode 26 (Vertical):
4x4: 1.16x 8x8: 1.83x 16x16: 2.45x 32x32: 4.50x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Changed files
- libavcodec/aarch64/hevcpred_init_aarch64.c
- libavcodec/aarch64/hevcpred_neon.S
- libavcodec/hevc/pred.c
- libavcodec/hevc/pred.h
- libavcodec/hevc/pred_template.c