Buildbot: ffmpeg-solaris10-sparc Build #13040

Results:

Failed shell_2 shell_3 shell_4 shell_5

SourceStamp:

Project	ffmpeg
Repository	https://git.ffmpeg.org/ffmpeg.git
Branch	master
Revision	8966101fa6b2b921bb395de9d9deaceca0f6d501
Got Revision	8966101fa6b2b921bb395de9d9deaceca0f6d501
Changes	4 changes

BuildSlave:

unstable10s

Reason:

The SingleBranchScheduler scheduler named 'schedule-ffmpeg-solaris10-sparc' triggered this build

Steps and Logfiles:

git update ( 13 secs )
1. stdio
shell 'gsed -i ...' ( 0 secs )
1. stdio
shell_1 'gsed -i ...' ( 0 secs )
1. stdio
shell_2 'gsed -i ...' failed ( 0 secs )
1. stdio
shell_3 './configure --samples="../../../ffmpeg/fate-suite" ...' failed ( 9 secs )
1. stdio
2. config.log
shell_4 'gmake fate-rsync' failed ( 6 secs )
1. stdio
shell_5 '../../../ffmpeg/fate.sh ../../../ffmpeg/fate_config.sh' failed ( 1 secs )

Build Properties:

Name	Value	Source
branch	master	Build
builddir	/export/home/buildbot-unstable10s/slave/ffmpeg-solaris10-sparc	slave
buildername	ffmpeg-solaris10-sparc	Builder
buildnumber	13040	Build
codebase		Build
got_revision	8966101fa6b2b921bb395de9d9deaceca0f6d501	Git
project	ffmpeg	Build
repository	https://git.ffmpeg.org/ffmpeg.git	Build
revision	8966101fa6b2b921bb395de9d9deaceca0f6d501	Build
scheduler	schedule-ffmpeg-solaris10-sparc	Scheduler
slavename	unstable10s	BuildSlave
workdir	/export/home/buildbot-unstable10s/slave/ffmpeg-solaris10-sparc	slave (deprecated)

Forced Build Properties:

Name	Label	Value

Responsible Users:

Jun Zhao
barryjzhaoohnoyoudont@tencent.com

Timing:

Start	Sun Jan 25 08:06:36 2026
End	Sun Jan 25 08:07:08 2026
Elapsed	31 secs

All Changes:

:

Change #256270

Category	ffmpeg
Changed by	Jun Zhao <barryjzhaoohnoyoudont@tencent.com>
Changed at	Sun 25 Jan 2026 07:55:26
Repository	https://git.ffmpeg.org/ffmpeg.git
Project	ffmpeg
Branch	master
Revision	0886e50c6b8e812d14ed7acd329684251efbf69e

Comments

lavc/hevc: add aarch64 neon for 8-bit dequant
Implement NEON optimization for HEVC dequant at 8-bit depth.

The NEON implementation uses srshr (Signed Rounding Shift Right) which
does both the add with offset and right shift in a single instruction.

Optimization details:
- 4x4 (16 coeffs): Single load-process-store sequence
- 8x8 (64 coeffs): Fully unrolled, no loop overhead
- 16x16 (256 coeffs): Pipelined load/compute/store to hide memory latency
- 32x32 (1024 coeffs): Pipelined with all available NEON registers

Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_8_c:                                   11.3 ( 1.00x)
hevc_dequant_4x4_8_neon:                                 6.3 ( 1.78x)

hevc_dequant_8x8_8_c:                                   33.9 ( 1.00x)
hevc_dequant_8x8_8_neon:                                 6.6 ( 5.11x)

hevc_dequant_16x16_8_c:                                153.8 ( 1.00x)
hevc_dequant_16x16_8_neon:                               9.0 (17.02x)

hevc_dequant_32x32_8_c:                                 78.1 ( 1.00x)
hevc_dequant_32x32_8_neon:                              31.9 ( 2.45x)

Note on Performance Anomaly:
The observation that hevc_dequant_32x32_8_c is faster than 16x16 (78.1 vs 153.8)
is due to Clang auto-vectorizing only for sizes >= 32x32.
Compiler: Apple clang version 17.0.0 (clang-1700.6.3.2)

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>

Changed files

libavcodec/aarch64/Makefile
libavcodec/aarch64/hevcdsp_dequant_neon.S
libavcodec/aarch64/hevcdsp_init_aarch64.c
tests/checkasm/Makefile
tests/checkasm/checkasm.c
tests/checkasm/checkasm.h
tests/checkasm/hevc_dequant.c
tests/fate/checkasm.mak

Change #256271

Category	ffmpeg
Changed by	Jun Zhao <barryjzhaoohnoyoudont@tencent.com>
Changed at	Sun 25 Jan 2026 07:55:26
Repository	https://git.ffmpeg.org/ffmpeg.git
Project	ffmpeg
Branch	master
Revision	24f296c7a1032accf28c35437e0212a2a8cf5032

Comments

lavc/hevc: optimize dequant for shift=0 case (identity transform)
The HEVC dequantization uses:
  shift = 15 - bit_depth - log2_size

When shift equals 0, the operation becomes an identity transform:
  - For shift > 0: output = (input + offset) >> shift
  - For shift < 0: output = input << (-shift)
  - For shift = 0: output = input << 0 = input (no change)

This occurs in the following cases:
  - 10-bit, 32x32 block: shift = 15 - 10 - 5 = 0
  - 12-bit, 8x8 block:   shift = 15 - 12 - 3 = 0

Previously, the code would still iterate through all coefficients
and perform redundant read-modify-write operations even when shift=0.

This patch adds an early return for shift=0, avoiding unnecessary
memory operations. checkasm benchmarks on Apple M4 show:
  - 10-bit 32x32: 69.1 -> 1.6 cycles (43x faster)
  - 12-bit 8x8:   30.9 -> 1.7 cycles (18x faster)

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>

Changed files

libavcodec/hevc/dsp_template.c

Change #256272

Category	ffmpeg
Changed by	Jun Zhao <barryjzhaoohnoyoudont@tencent.com>
Changed at	Sun 25 Jan 2026 07:55:26
Repository	https://git.ffmpeg.org/ffmpeg.git
Project	ffmpeg
Branch	master
Revision	ce89d974c8764002f127829dc0ecf43725994ff0

Comments

lavc/hevc: add aarch64 neon for 10-bit dequant
Implement NEON optimization for HEVC dequant at 10-bit depth.

For 10-bit: shift = 15 - 10 - log2_size = 5 - log2_size

Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_10_c:                                  16.6 ( 1.00x)
hevc_dequant_4x4_10_neon:                                7.4 ( 2.23x)

hevc_dequant_8x8_10_c:                                  39.7 ( 1.00x)
hevc_dequant_8x8_10_neon:                                7.5 ( 5.28x)

hevc_dequant_16x16_10_c:                               168.7 ( 1.00x)
hevc_dequant_16x16_10_neon:                             10.2 (16.56x)

hevc_dequant_32x32_10_c:                                 1.9 ( 1.00x)
hevc_dequant_32x32_10_neon:                              1.9 ( 1.01x)

Note: 32x32 shift=0 is identity transform (no-op), so NEON has no
advantage over C which is also optimized away by the compiler.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>

Changed files

libavcodec/aarch64/hevcdsp_dequant_neon.S
libavcodec/aarch64/hevcdsp_init_aarch64.c

Change #256273

Category	ffmpeg
Changed by	Jun Zhao <barryjzhaoohnoyoudont@tencent.com>
Changed at	Sun 25 Jan 2026 07:55:26
Repository	https://git.ffmpeg.org/ffmpeg.git
Project	ffmpeg
Branch	master
Revision	8966101fa6b2b921bb395de9d9deaceca0f6d501

Comments

lavc/hevc: add aarch64 neon for 12-bit dequant
Implement NEON optimization for HEVC dequant at 12-bit depth.

For 12-bit: shift = 15 - 12 - log2_size = 3 - log2_size. When shift
is negative, we use shl (shift left) instead of srshr.

Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_12_c:                                   9.9 ( 1.00x)
hevc_dequant_4x4_12_neon:                                5.7 ( 1.74x)

hevc_dequant_8x8_12_c:                                   1.7 ( 1.00x)
hevc_dequant_8x8_12_neon:                                1.3 ( 1.30x)

hevc_dequant_16x16_12_c:                               131.1 ( 1.00x)
hevc_dequant_16x16_12_neon:                              7.9 (16.52x)

hevc_dequant_32x32_12_c:                                69.7 ( 1.00x)
hevc_dequant_32x32_12_neon:                             28.4 ( 2.46x)

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>

Changed files

libavcodec/aarch64/hevcdsp_dequant_neon.S
libavcodec/aarch64/hevcdsp_init_aarch64.c

Builder ffmpeg-solaris10-sparc Build #13040

Results:

SourceStamp:

BuildSlave:

Reason:

Steps and Logfiles:

Build Properties:

Forced Build Properties:

Responsible Users:

Timing:

All Changes:

:

Change #256270

Comments

Changed files

Change #256271

Comments

Changed files

Change #256272

Comments

Changed files

Change #256273

Comments

Changed files