Builder ffmpeg-solaris10-sparc Build #13040
Results:
Failed shell_2 shell_3 shell_4 shell_5
SourceStamp:
| Project | ffmpeg |
| Repository | https://git.ffmpeg.org/ffmpeg.git |
| Branch | master |
| Revision | 8966101fa6b2b921bb395de9d9deaceca0f6d501 |
| Got Revision | 8966101fa6b2b921bb395de9d9deaceca0f6d501 |
| Changes | 4 changes |
BuildSlave:
unstable10sReason:
The SingleBranchScheduler scheduler named 'schedule-ffmpeg-solaris10-sparc' triggered this build
Steps and Logfiles:
-
git update ( 13 secs )
-
shell 'gsed -i ...' ( 0 secs )
-
shell_1 'gsed -i ...' ( 0 secs )
-
shell_2 'gsed -i ...' failed ( 0 secs )
-
shell_3 './configure --samples="../../../ffmpeg/fate-suite" ...' failed ( 9 secs )
-
shell_4 'gmake fate-rsync' failed ( 6 secs )
-
shell_5 '../../../ffmpeg/fate.sh ../../../ffmpeg/fate_config.sh' failed ( 1 secs )
Build Properties:
| Name | Value | Source |
|---|---|---|
| branch | master | Build |
| builddir | /export/home/buildbot-unstable10s/slave/ffmpeg-solaris10-sparc | slave |
| buildername | ffmpeg-solaris10-sparc | Builder |
| buildnumber | 13040 | Build |
| codebase | Build | |
| got_revision | 8966101fa6b2b921bb395de9d9deaceca0f6d501 | Git |
| project | ffmpeg | Build |
| repository | https://git.ffmpeg.org/ffmpeg.git | Build |
| revision | 8966101fa6b2b921bb395de9d9deaceca0f6d501 | Build |
| scheduler | schedule-ffmpeg-solaris10-sparc | Scheduler |
| slavename | unstable10s | BuildSlave |
| workdir | /export/home/buildbot-unstable10s/slave/ffmpeg-solaris10-sparc | slave (deprecated) |
Forced Build Properties:
| Name | Label | Value |
|---|
Responsible Users:
- Jun Zhaobarryjzhao@tencent.com
Timing:
| Start | Sun Jan 25 08:06:36 2026 |
| End | Sun Jan 25 08:07:08 2026 |
| Elapsed | 31 secs |
All Changes:
:
Change #256270
Category ffmpeg Changed by Jun Zhao <barryjzhao@tencent.com> Changed at Sun 25 Jan 2026 07:55:26 Repository https://git.ffmpeg.org/ffmpeg.git Project ffmpeg Branch master Revision 0886e50c6b8e812d14ed7acd329684251efbf69e Comments
lavc/hevc: add aarch64 neon for 8-bit dequant Implement NEON optimization for HEVC dequant at 8-bit depth. The NEON implementation uses srshr (Signed Rounding Shift Right) which does both the add with offset and right shift in a single instruction. Optimization details: - 4x4 (16 coeffs): Single load-process-store sequence - 8x8 (64 coeffs): Fully unrolled, no loop overhead - 16x16 (256 coeffs): Pipelined load/compute/store to hide memory latency - 32x32 (1024 coeffs): Pipelined with all available NEON registers Performance benchmark on Apple M4: ./tests/checkasm/checkasm --test=hevc_dequant --bench hevc_dequant_4x4_8_c: 11.3 ( 1.00x) hevc_dequant_4x4_8_neon: 6.3 ( 1.78x) hevc_dequant_8x8_8_c: 33.9 ( 1.00x) hevc_dequant_8x8_8_neon: 6.6 ( 5.11x) hevc_dequant_16x16_8_c: 153.8 ( 1.00x) hevc_dequant_16x16_8_neon: 9.0 (17.02x) hevc_dequant_32x32_8_c: 78.1 ( 1.00x) hevc_dequant_32x32_8_neon: 31.9 ( 2.45x) Note on Performance Anomaly: The observation that hevc_dequant_32x32_8_c is faster than 16x16 (78.1 vs 153.8) is due to Clang auto-vectorizing only for sizes >= 32x32. Compiler: Apple clang version 17.0.0 (clang-1700.6.3.2) Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Changed files
- libavcodec/aarch64/Makefile
- libavcodec/aarch64/hevcdsp_dequant_neon.S
- libavcodec/aarch64/hevcdsp_init_aarch64.c
- tests/checkasm/Makefile
- tests/checkasm/checkasm.c
- tests/checkasm/checkasm.h
- tests/checkasm/hevc_dequant.c
- tests/fate/checkasm.mak
Change #256271
Category ffmpeg Changed by Jun Zhao <barryjzhao@tencent.com> Changed at Sun 25 Jan 2026 07:55:26 Repository https://git.ffmpeg.org/ffmpeg.git Project ffmpeg Branch master Revision 24f296c7a1032accf28c35437e0212a2a8cf5032 Comments
lavc/hevc: optimize dequant for shift=0 case (identity transform) The HEVC dequantization uses: shift = 15 - bit_depth - log2_size When shift equals 0, the operation becomes an identity transform: - For shift > 0: output = (input + offset) >> shift - For shift < 0: output = input << (-shift) - For shift = 0: output = input << 0 = input (no change) This occurs in the following cases: - 10-bit, 32x32 block: shift = 15 - 10 - 5 = 0 - 12-bit, 8x8 block: shift = 15 - 12 - 3 = 0 Previously, the code would still iterate through all coefficients and perform redundant read-modify-write operations even when shift=0. This patch adds an early return for shift=0, avoiding unnecessary memory operations. checkasm benchmarks on Apple M4 show: - 10-bit 32x32: 69.1 -> 1.6 cycles (43x faster) - 12-bit 8x8: 30.9 -> 1.7 cycles (18x faster) Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Changed files
- libavcodec/hevc/dsp_template.c
Change #256272
Category ffmpeg Changed by Jun Zhao <barryjzhao@tencent.com> Changed at Sun 25 Jan 2026 07:55:26 Repository https://git.ffmpeg.org/ffmpeg.git Project ffmpeg Branch master Revision ce89d974c8764002f127829dc0ecf43725994ff0 Comments
lavc/hevc: add aarch64 neon for 10-bit dequant Implement NEON optimization for HEVC dequant at 10-bit depth. For 10-bit: shift = 15 - 10 - log2_size = 5 - log2_size Performance benchmark on Apple M4: ./tests/checkasm/checkasm --test=hevc_dequant --bench hevc_dequant_4x4_10_c: 16.6 ( 1.00x) hevc_dequant_4x4_10_neon: 7.4 ( 2.23x) hevc_dequant_8x8_10_c: 39.7 ( 1.00x) hevc_dequant_8x8_10_neon: 7.5 ( 5.28x) hevc_dequant_16x16_10_c: 168.7 ( 1.00x) hevc_dequant_16x16_10_neon: 10.2 (16.56x) hevc_dequant_32x32_10_c: 1.9 ( 1.00x) hevc_dequant_32x32_10_neon: 1.9 ( 1.01x) Note: 32x32 shift=0 is identity transform (no-op), so NEON has no advantage over C which is also optimized away by the compiler. Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Changed files
- libavcodec/aarch64/hevcdsp_dequant_neon.S
- libavcodec/aarch64/hevcdsp_init_aarch64.c
Change #256273
Category ffmpeg Changed by Jun Zhao <barryjzhao@tencent.com> Changed at Sun 25 Jan 2026 07:55:26 Repository https://git.ffmpeg.org/ffmpeg.git Project ffmpeg Branch master Revision 8966101fa6b2b921bb395de9d9deaceca0f6d501 Comments
lavc/hevc: add aarch64 neon for 12-bit dequant Implement NEON optimization for HEVC dequant at 12-bit depth. For 12-bit: shift = 15 - 12 - log2_size = 3 - log2_size. When shift is negative, we use shl (shift left) instead of srshr. Performance benchmark on Apple M4: ./tests/checkasm/checkasm --test=hevc_dequant --bench hevc_dequant_4x4_12_c: 9.9 ( 1.00x) hevc_dequant_4x4_12_neon: 5.7 ( 1.74x) hevc_dequant_8x8_12_c: 1.7 ( 1.00x) hevc_dequant_8x8_12_neon: 1.3 ( 1.30x) hevc_dequant_16x16_12_c: 131.1 ( 1.00x) hevc_dequant_16x16_12_neon: 7.9 (16.52x) hevc_dequant_32x32_12_c: 69.7 ( 1.00x) hevc_dequant_32x32_12_neon: 28.4 ( 2.46x) Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Changed files
- libavcodec/aarch64/hevcdsp_dequant_neon.S
- libavcodec/aarch64/hevcdsp_init_aarch64.c