Builder collectd-60-solaris10-sparc Build #28
Results:
Build successful
SourceStamp:
Project | collectd/collectd |
Repository | https://github.com/collectd/collectd |
Branch | collectd-6.0 |
Revision | b18aed6930d9ce91534c41ce90453588ec006950 |
Got Revision | b18aed6930d9ce91534c41ce90453588ec006950 |
Changes | 10 changes |
BuildSlave:
unstable10sReason:
The AnyBranchScheduler scheduler named 'schedule-collectd-60' triggered this build
Steps and Logfiles:
Build Properties:
Name | Value | Source |
---|---|---|
branch | collectd-6.0 | Build |
builddir | /export/home/buildbot-unstable10s/slave/collectd-60-solaris10-sparc | slave |
buildername | collectd-60-solaris10-sparc | Builder |
buildnumber | 28 | Build |
ciflags | --disable-aggregation --disable-check_uptime --disable-csv --disable-java --disable-lua --disable-match_empty_counter --disable-match_hashed --disable-match_regex --disable-match_timediff --disable-match_value --disable-network --disable-perl --disable-postgresql --disable-target_notification --disable-target_replace --disable-target_scale --disable-target_set --disable-target_v5upgrade --disable-threshold --disable-write_graphite --disable-write_kafka --disable-write_mongodb --disable-write_pro .. [property value too long] | SetPropertyFromCommand Step |
codebase | Build | |
got_revision | b18aed6930d9ce91534c41ce90453588ec006950 | Git |
project | collectd/collectd | Build |
repository | https://github.com/collectd/collectd | Build |
revision | b18aed6930d9ce91534c41ce90453588ec006950 | Build |
scheduler | schedule-collectd-60 | Scheduler |
slavename | unstable10s | BuildSlave |
workdir | /export/home/buildbot-unstable10s/slave/collectd-60-solaris10-sparc | slave (deprecated) |
Forced Build Properties:
Name | Label | Value |
---|
Responsible Users:
- Eero Tammineneero.t.tamminen@intel.com
Timing:
Start | Wed Feb 1 07:56:38 2023 |
End | Wed Feb 1 08:25:45 2023 |
Elapsed | 29 mins, 6 secs |
All Changes:
:
Change #167826
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision 2179066f0bc207d24c833324fd9aa286df09e8b9 Comments
gpu_sysman: Add "pci_dev" label On large cluster with different types of GPUs, it helps knowing which card is of which type, not just their metrics. "pci_dev" label adds PCI device ID to the device metrics. Because GPUs within each cluster node are normally supposed to be identical i.e. differ only between nodes, and additional labels increase processing load, this is enabled only with the GpuInfo setting. Getting additional strings out of gpu_info() function required refactoring. GPU index in errors is now output only by gpu_scan(), and gpu_info() gets pointers to label string pointers instead.
Changed files
- src/collectd.conf.pod
- src/gpu_sysman.c
Change #167827
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision 22aad76dbe5f13bbf71722b9d744b330dae894f6 Comments
gpu_sysman: Add memory "health" label if memory health is known Already in L0 spec v1.0. Included only to memory usage metrics which are already querying memory state (unlike memory BW metrics).
Changed files
- src/gpu_sysman.c
Change #167828
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision 42b9044e1f7baf78c39378237b253b2e9a70aced Comments
gpu_sysman: Provide returned error code when logging Sysman failures To help in debugging issues with Sysman API usage. (Includes minor stylistic improvements from Ukri & Tuomas) Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Changed files
- src/gpu_sysman.c
Change #167829
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision 543147761e99026df30cf2f5ccee71e4787bfab1 Comments
gpu_sysman: add "throttled_by" label to frequency metric Which is empty/missing when frequency is not throttled. Already in L0 spec v1.0. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Changed files
- src/gpu_sysman.c
- src/gpu_sysman_test.c
Change #167830
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision eefd3e98fb74d703f026f3d9c93e10154e4bb3ef Comments
gpu_sysman: Fix memory metric comments Caught by Ukri. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Changed files
- src/gpu_sysman.c
Change #167831
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision cd68f6f24f5515eb8feafa73935a927ff0baed7e Comments
gpu_sysman: Minor improvements to test code Decrease max value and increase how many decimals are shown for metric values, so that tests verbose logging shows useful values also for ratios (which are in 0-1 range). Rest of changes improve 'gpu_sysman.c' test coverage by 1%. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Changed files
- src/gpu_sysman_test.c
Change #167832
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision b2d5aad2ab4862204710e5c813a341c38b3c76f9 Comments
gpu_sysman: make freq & mem handling more consistent Readability/consistency improvement: change frequency and memory metric handling to use new "reported" boolean instead of cache index, for checking when metrics need to be submitted. This is more consistent how other metric functions handle that. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Changed files
- src/gpu_sysman.c
Change #167833
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision ccbd648d12c0e0724b7581459e5426d7f9486bcb Comments
gpu_sysman: do metric reset on every loop round Not doing metric reset between loop rounds could result in extra incorrect metric label being reported for a metric, when earlier metric in the loop had a conditional label, but latter metric does not satisfy that condition (Sysman call for the info failed, but fail is ignored, or Sysman struct value used for given label is not set). This can happen e.g. with the conditional memory "health", frequency "throttled_by" and power "limit" labels. Other alternative would be either setting or removing (= using NULL) values for each of the possible labels on every round. Just reseting metric labels on every round seemed more robust (easier to review), and allowed simplifying the code slightly. Looking at collectd metric implementation, it causes more allocs / deallocs for the label array & label names though. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Changed files
- src/gpu_sysman.c
Change #167834
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision 55a9296a0ec1b4d8d1605387f20a1304b25baa32 Comments
gpu_sysman: improve power limit handling Limits can be reported to only a subset of power domains. Therefore querying limits (for given GPU) should be disabled only when querying fails for all domains. Added also TODO for upcoming spec change I noticed in the spec tracker. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Changed files
- src/gpu_sysman.c
Change #167835
Category None Changed by Eero Tamminen <eero.t.tamminen @intel.com>Changed at Wed 01 Feb 2023 07:55:27 Repository https://github.com/collectd/collectd Project collectd/collectd Branch collectd-6.0 Revision b18aed6930d9ce91534c41ce90453588ec006950 Comments
gpu_sysman: initialize struct .pNext members before use Next Sysman spec will explictly state that they need be initialized: https://github.com/oneapi-src/level-zero-spec/commit/98dfaaf041dedfd8c9bcf9a3957f334836e859e4 And latest Sysman backend versions corrupt memory / crash unless .pNext values in some of the structs given to Get functions are initialized. (Releases before fall 2022 did not use .pNext values in get* calls, and worked fine. It just took a long time until I was able to verify whether this was a regression that will be fixed, or intended change.) Additionally, validate in test code that .pNext values are set to NULL (because some structs lack those pointer members, ADD_METRIC() macro cannot do that check for the <statename> functions given for it, but otherwise everything is covered). Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Changed files
- src/gpu_sysman.c
- src/gpu_sysman_test.c