Talos 2 performance evaluated in 2018-2019
Talos 2 performance evaluated in 2018-2019
Let’s do a comeback in the past … Before I had my own Talos 2 machine, some benchmarks were published on Phoronix in 2018 and 2019. I remembered that some benchmarks gave good results but that few of them were not at the expected level of performance. On the other hand, and with benchmark reports here and there with various configurations, I forgot with which machines the Talos 2 was in competition.
Also, after the initial articles, some fixes were done very fast and proposed to concerned projects. There is an article, Improving performance of Phoronix benchmarks on POWER9, that analyzed some benchmarks run in the initial Phoronix article, focusing only on benchmarks that did not performed well on Talos 2: LBM Parboil, x264 video encoding, Primesieve, LAME, FLAC, OpenSSL, Scikit-Learn and Blender. The article proposes a description of the situation and suggests changes. Note that sometimes, changes were obvious, for example it appeared that a benchmark missed an optimization option!
Let’s remind that in the Phoronix benchmarks, some of them showed that the Talos 2 performed well. For example: Stockfish, LLVM Compilation, 7zip and Zstd compression, TinyMembench, Postgresql. We will come back on that in details.
TODO: Finalize: So, it was time to refresh and synthetize all that. In the end, the investigation will show if there are still some benchmarks to look at for fixes or improvements.
Compared hardwares
I started to read all the articles and thanks to the benchmark indentifier in some of them, I was able to run these old testsuites to have a snapshot on my own config, see what works or not, practice the phoronix-test-suite tool, etc.
I also took some comments from any articles and comments.
I kept a short list of common systems found in these articles, from the less to the most powerful, in theory:
TODO: Add the year of commercialization
Processor | Year | Cores/Threads | Base Freq | Max Freq | Cache | TDP | Memory Support |
---|---|---|---|---|---|---|---|
Intel Xeon E3-1280 v5 (Skylake) | Q4-2015 | 4/8 | 3.70 GHz | 4.00 GHz | 8 MB | 80 W | DDR4-2133, up to 64 GB |
Intel Core i9-7980XE | Q3-2017 | 18/36 | 2.60 GHz | 4.20 GHz | 24 MB | 165 W | DDR4-2666, up to 128 GB |
Intel Xeon Gold 6138 | Q3-2017 | 20/40 | 2.00 GHz | 3.70 GHz | 28 MB | 125 W | DDR4-2666, up to 768 GB |
AMD EPYC 7551 | Q2-2017 | 32/64 | 2.00 GHz | 3.00 GHz | 64 MB | 180 W | DDR4-2666, octa-channel |
AMD EPYC 7601 | Q2-2017 | 32/64 | 2.20 GHz | 3.20 GHz | 64 MB | 180 W | DDR4-2666, octa-channel |
AMD Ryzen Threadripper 2990WX | Q3-2018 | 32/64 | 3.00 GHz | 4.20 GHz | 64 MB | 250 W | DDR4-2933, quad-channel |
IBM POWER9 (dual 22-core) | Q4-2017 | 44/176 | 2.80 GHz | 3.40 GHz | 120 MB | N/A | DDR4, Up to 16 TB DDR4 |
So, still in theory, the big POWER9 configurations should compete with (and even beat) all these systems except the 2 x Xeon Gold 6138.
Results of old benchmarks (2018 and 2019)
That will highlight the comparison with different machines and also with variants of Talos 2. Note that the listed are sorted by increasing performance (the best machine at the end).
pts/build-gcc-1.0.0
Timed GCC Compilation 7.2:
On my Talos 2, this old version fails. At installation, there is a message No rule to make 'defconfig'
and then running the test:
pts/build-gcc-1.0.0 [Time To Compile]
E: ../.././gcc/match.pd:120:1 error: expected (, got NAME
So below are only results provided by Phoronix:
Test: Time to compile
Talos II 2 x 22c POWER9 1070.70
AMD EPYC 7551 926.08
AMD EPYC 7601 707.34
2 x Xeon Gold 6138 591.32
Phoronix wrote “Keep in mind the Talos II Secure Workstation was limited to a slow hard drive for this initial testing, but there are some build time references for those curious about the potential of Talos II serving as a POWER build platform.” With the provided results, let’s say that Talos 2 is rather close than EPYC 7551.”
Multi-threaded: YES Verdict: AVERAGE To do:
#### pts/build-llvm-1.1.0
Timed LLVM Compilation 6.0.1:
Test: Time To Compile Seconds < Lower Is Better
Talos 2 Power9 2x 4c 535.10
Talos II POWER9 Dual 4-Core 354.23
AMD EPYC 7551 247.00
AMD EPYC 7601 236.00
Core i9 7980XE 227.00
Threadripper 2990WX 221.00
Talos II 2 x 22c POWER9 183.00
AMD EPYC 7601 171.58
2x EPYC 7601 149.00
Talos II POWER9 Dual 18-Core 141.79
2 x Intel Xeon Gold 6138 127.08
There are some strange results, for example two very different results for the Talos 2 dual 4-core model, about results concerning the EPYC 7601 models … and also with the Talos 2 better with a dual 18-core than with a dual-22 core processor (maybe due to the slow drive evocated in the build-gcc test?).
Anyway, let’s say that high end Talos 2 models are in the same area than Threadripper 2990WX and AMD EPYC 7601. We will see in another article running the same benchmarks in their recent versions.
Multi-threaded: YES Verdict: GOOD To do:
pts/compress-7zip-1.7.1
7-Zip Compression 16.02:
Test: Compress Speed Test MIPS > Higher Is Better
Talos 2 Power9 2x 4c 40043
AMD EPYC 7551 79708
Threadripper 2990WX 85484
Core i9 7980XE 95662
AMD EPYC 7601 99574
2 x Intel Xeon Gold 6138 143505
Talos II POWER9 Dual 18-Core 158405
Talos II 2 x 22c POWER9 162969
Phoronix comment: “The 7-Zip compression performance was doing very well on the POWER9 hardware with the 22-core Talos II Lite was outperforming the 32 core EPYC 7601 processor and the dual 18-core Talos II system was outperforming the dual Xeon Gold 6138 Tyan server. 7-Zip is another workload that always scales well including with SMT systems and here the 176 threads of the Talos II paid off well for this compression test.”
A comment says: “Is there a reason why Rodinia is only ‘-O2’ (not ‘-O3’ like everything else), and for 7Zip, it seems no compile optimization at all? (Also, to make best use of the POWER9 processor, use ‘-mcpu=power9’).” That may explain discrepancies in results. However, I did set optimization options and they brought nothing in term of performance.
Multi-threaded: YES Verdict: GOOD To do: Check optimization options and results in a recent version of the benchmark
pts/compress-zstd-1.0.0
Zstd Compression 1.3.4:
Test: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 Seconds < Lower Is Better
EPYC 7601 163.35
Talos 2 Power9 2x 4c 134.46
2 x Xeon Gold 6138 117.96
Talos II POWER9 Dual 4 Core 109.09
Talos II POWER9 Dual 18-Core 106.94
Phoronix comment:”POWER9 was performing extremely well in the Zstd compression benchmark. The Xeon systems were outperforming the EPYC hardware in the Zstd benchmark while the POWER9 hardware managed to beat out the Intel x86_64 CPUs in this single-thread test case.”
Multi-threaded: NO Verdict: GOOD To do:
#### pts/c-ray-1.1.1
C-Ray - 4K 16 Rays Per Pixel pts/c-ray-1.1.1 Seconds < Lower Is Better
Talos 2 Power9 2x 4c 9.49
Raptor Talos II 4.65
AMD EPYC 7601 3.46
2 x Intel Xeon Gold 6138 3.15
Optimization does not change anything on my machine.
I don’t know what this config Raptor Talos II
is … That’s too difficult to compare, so I ran pts/c-ray-1.2.0
and obtained these results:
Talos 2 Power9 2x 4c 83.99
Core i9 7980XE 33.51
2 x Xeon Gold 6138 27.16
AMD EPYC 7601 25.36
AMD EPYC 7601 21.97 -march=native
Talos II 2 x 22c POWER9 19.14 -mcpu=power9 -mtune=power9
Threadripper 2990WX 17.97
Phoronix comment: “The C-Ray ray-tracing performance of the Talos II was in line with the AMD Ryzen Threadripper 2990WX but had come up shy of the dual EPYC 7601 server.”
Multi-threaded: YES Verdict: GOOD To do:
pts/encode-flac-1.6.0
FLAC Audio Encoding 1.3.2:
Test: WAV To FLAC Seconds < Lower Is Better
Talos II 2 x 22c POWER9 51.79
Talos II POWER9 Dual 4 Core 43.99
Talos II POWER9 Dual 18-Core 43.95
Talos II POWER9 Dual 4-Core 40.23
AMD EPYC 7551 12.71
AMD EPYC 7601 11.79
2 x Intel Xeon Gold 6138 10.27
Xeon E3-1280 v5 9.60
Optimization options does not change anything.
Phoronix comment: “For for audio encoding with FLAC and MP3 is another one of the areas where the POWER9 CPU performance is behind, but could possibly be improved with maturing POWER9 compiler support.”
The project lacks SIMD code for POWER. A patch series was done and integrated in FLAC 1.3.3, that improved the performance by 3.
Multi-threaded: NO Verdict: BAD To do: Check suggested improvements and measure their benefit
pts/encode-mp3-1.7.0
LAME MP3 Encoding 3.100:
Test: WAV To MP3 Seconds < Lower Is Better
Talos II 2 x 22c POWER9 75.27
Talos II POWER9 Dual 4-Core 75.57
Talos II POWER9 Dual 18-Core 67.48
AMD EPYC 7551 45.69
AMD EPYC 7601 42.67
2 x Xeon Gold 6138 32.29
Xeon E3-1280 v5 30.14
Results show that performance on POWER is very bad!
In the sthbrx article, it is said: “Due to configure options not being parsed correctly this benchmark is built without any optimisation regardless of architecture. We see a massive speedup by turning optimisations on, and a further 6-8% speedup by enabling USE_FAST_LOG (which is already enabled for Intel)”. It concludes on a x5 speedup. See the dedicated article for details. It mentions that the obtained speedup is x7! On my own machine, optimization options made the score flight to more or less 15 seconds! What is confirmed in a recent version of the benchmark.
Multi-threaded: NO Verdict: BAD To do: Check proposed improvements have been integrated in the official project
pts/openssl-1.11.0
OpenSSL 1.1.1:
Test: RSA 4096-bit Performance Signs Per Second > Higher Is Better
Talos 2 Power9 2x 4c 1616.9
Talos II dual 18-core 3971.9
AMD EPYC 7551 4387.4
AMD EPYC 7601 4598.4
Core i9 7980XE 4686.0
Threadripper 2990WX 5821.0
2 x Intel Xeon Gold 6138 7965.4
Phoronix comment: “The OpenSSL results also have a ways to improve, the performance on POWER9 was mixed against the AMD/Intel CPUs with the dual 18-core system failing to outperform the EPYC 7601.”
OpenSSL 1.1.0f did not include some improvements existing in the mainline. An update of the used version should improve the perfs on Power9 by a factor of 1.7.
Multi-threaded: YES Verdict: BAD To do:
pts/parboil-1.1.2
Parboil v2.5
Test: OpenMP LBM
Talos II POWER9 Dual 4 Core 113.43
AMD EPYC 7551 71.88
Talos II 2 x 22c POWER9 66.08
Talos II POWER9 Dual 18-Core 44.18
AMD EPYC 7601 37.39
2 x Xeon Gold 6138 30.18
Phoronix comment: “With the Lattice-Boltzmann Method Fluid Dynamics test case, the dual 18-core POWER9 configuration was competing with the EPYC CPUs in this round of OpenMP benchmarking.”
From sthbrx article: “Also this benchmark is compiled without any optimisation. Recompiling with -O3 improves the results 3.2x on POWER9.”
Test: OpenMP CUTCP
Talos II 2 x 22c POWER9 9.69
AMD EPYC 7551 2.76
AMD EPYC 7601 2.61
2 x Xeon Gold 6138 2.28
Phoronix comment: “Some tests like this Distance-Cutoff Coulombic Potential test appear just not well optimized for POWER9 at this point.”
To see if changes have been included in recent version of the benchmark and if the 3x speedup applied.
Test: OpenMP Stencil
AMD EPYC 7551 17.35
AMD EPYC 7601 14.26
Talos II 2 x 22c POWER9 10.51
2 x Xeon Gold 6138 6.01
Phoronix comment: “While in the stencil test, the Talos II system beat out both AMD EPYC systems and was mid-way to the performance of the dual Xeon Gold server.”
Multi-threaded: ?
Verdict: AVERAGE
To do: Focus on the test OPEN MP CUTCP
that does not seem to be optimized for POWER9.
#### pts/phpbench-1.1.5
PHPBench 0.8.1: pts/phpbench-1.1.5 [PHP Benchmark Suite] Score > Higher Is Better
Talos II POWER9 Dual 4-Core 166406
AMD EPYC 7551 365767
Talos II POWER9 Dual 18-Core 373681
AMD EPYC 7601 393659
Threadripper 2990WX 525276
2 x Xeon Gold 6138 606341
Xeon E3-1280 v5 651532
Core i9 7980XE 703666
Phoronix comment: “The Python and PHP benchmarks also show room for single-threaded performance improvements. POWER9 only came in line with the AMD EPYC hardware for the PHP language performance.”
Optimization options did not bring any visible enhancements.
Multi-threaded: NO Verdict: AVERAGE To do: Identify the source of the problem (but who will like to work on PHP?)
pts/povray-1.2.1
POV-Ray 3.7.0.7:
Test: Trace Time
Talos 2 2x 4c 93.57
Core i9 7980XE 28.29
Talos II 2 x 22c POWER9 25.28
AMD EPYC 7551 23.01
AMD EPYC 7601 22.61
2x Xeon Gold 6138 19.02
Threadripper 2990WX 17.92
Even with the benefit of the multi-thread support, the best Talos 2 system does not reach the performance of the AMC EPYC systems.
Multi-threaded: YES Verdict: BAD To do: Investigate …
pts/primesieve-1.4.1
Primesieve 6.2:
Test: 1e12 Prime Number Generation Seconds < Lower Is Better
Talos II POWER9 Dual 4-Core 44.84
Talos II POWER9 Dual 18-Core 18.81
Talos II 2 x 22c POWER9 16.42
EPYC 7551 .................... 12.93
EPYC 7601 .................... 12.15
2 x Xeon Gold 6138 ........... 10.63
The nominal results on POWER are not convincing, showing lower performance than AMD EPYC systems.
After a pull request by Anton Blanchard, the author had to make changes, having understood the issue. To check in a recent version of the benchmark.
Multi-threaded: YES Verdict: BAD To do: Check if changes proposed by the author have a positive and measurable impact
pts/pybench-1.1.2
PyBench 2018-02-16:
Test: Total For Average Test Times Milliseconds < Lower Is Better
Talos II 2 x 22c POWER9 4088
Talos 2 Power9 2x 4c 3671
Talos II POWER9 Dual 18-Core 1867
EPYC 7601 1538
Threadripper 2990WX 1147
2 x Xeon Gold 6138 1127
Xeon E3-1280 v5 1043
Core i9 7980XE 955
Note that I also collected results that are not really the same:
Talos II 2 x 22c POWER9 4859
AMD EPYC 7551 2216
AMD EPYC 7601 2086
2 x Intel Xeon Gold 6138 1395
Anyway, that does not change the order: Python on Power9 systems is 2 or 3 times slower than on x86-64 machines (2 times slower than EPYC based systems).
Multi-threaded: NO Verdict: VERY BAD To do: Investigate
pts/osbench-1.0.1
Test: Create Threads us Per Event < Lower Is Better
AMD EPYC 7551 38.25
AMD EPYC 7601 30.71
Talos 2 Power9 2x 4c 27.28
Raptor Talos II 27.17
2 x Intel Xeon Gold 6138 23.07
Test: Create Processes us Per Event < Lower Is Better
Talos 2 Power9 2x 4c 74.33
AMD EPYC 7601 59.61
AMD EPYC 7551 57.95
2 x Intel Xeon Gold 6138 42.95
Raptor Talos II 29.77
Test: Memory Allocations Ns Per Event < Lower Is Better
AMD EPYC 7551 96.32
2 x Intel Xeon Gold 6138 96.05
AMD EPYC 7601 95.14
Talos 2 Power9 2x 4c 94.70
Raptor Talos II 83.03
Phoronix comment: “While lastly for now are the OSBench synthetic operating system benchmarks with the Raptor Talos II doing well against the EPYC and Xeon platforms.”
Talos 2 performs very, close to the 2 x Intel Xeon Gold 6138
or even better!
On my model, adding optimization options, only the test Create Processes
had different results, with a better score of 49 instead of 74.
Multi-threaded: N/A
Verdict: GOOD
To do: Check optimization options, they improved greatly the test Create Processes
pts/pgbench-1.8.4
PostgreSQL pgbench 10.3:
Test: Scaling: Buffer Test - Test: Normal Load - Mode: Read Only TPS > Higher Is Better
Talos 2 Power9 2x 4c 11110
Xeon E3-1280 v5 116058
Talos 2 Power9 2x 4c optim 159835
Talos II POWER9 Dual 4 Core 222683
EPYC 7601 399625
Talos II Lite POWER9 22 Core 442106
Threadripper 2990WX 472250
Talos II 2 x 22c POWER9 544186 -mcpu=power9 -mtune=power9 (-march=native on x86_64)
Talos II POWER9 Dual 18-Core 574297
2 x Xeon Gold 6138 587539
Test: Scaling: Buffer Test - Test: Normal Load - Mode: Read Write TPS > Higher Is Better
Talos 2 Power9 2x 4c 542
Xeon E3-1280 v5 3803
Talos II POWER9 Dual 4 Core 6381
Talos II POWER9 Dual 18-Core 6451
EPYC 7601 6473
2 x Xeon Gold 6138 6588
Talos II POWER9 Dual 4-Core 14457
Talos 2 Power9 2x 4c optim 14507
Optimization clearly boosts the performance!
Phoronix comment: “The dual 18-core POWER9 system was managing to compete with the dual Xeon Gold server for the PostgreSQL database benchmarking.”
Multi-threaded: YES Verdict: GOOD To do: Check optimization options, that provide a boost
pts/redis-1.1.0
Test: GET Requests Per Second > Higher Is Better
Talos 2 Power9 2x 4c 904977
Raptor Talos II 1049994
Talos 2 Power9 2x 4c optim 1053740
AMD EPYC 7601 1703353
2 x Intel Xeon Gold 6138 2515784
Test: SET Requests Per Second > Higher Is Better
Talos 2 Power9 2x 4c optim 553403
Raptor Talos II 606874
Talos 2 Power9 2x 4c 615384
AMD EPYC 7601 1195935
2 x Intel Xeon Gold 6138 1744256
There is almost no CPU activity!
Multi-threaded: NO Verdict: BAD To do: Check CPU activity
pts/rodinia-1.2.2
Rodinia - OpenMP LavaMD pts/rodinia-1.2.2: Problem to install opencl packages
And also a problem of checksum on the rodinia_2.4.tar.bz2
archive.
AMD EPYC 7601 13.26
Talos II 2 x 22c POWER9 13.22
AMD EPYC 7551 12.71
2 x Intel Xeon Gold 6138 7.02
Not many results collected so let’s base our opinion on Phoronix comment: “First up was the Rodinia OpenMP benchmark where the Talos II with dual 22-core processors (44 cores / 176 threads) had the performance aligned with the Core i9 7980XE, which in turn were behind the AMD Ryzen Threadripper 2 WX series performance. With the Parboil and Rodinia scientific tests, the dual 22-core POWER9 system was just behind the EPYC 7551 for performance.”
A comment says: “Is there a reason why Rodinia is only ‘-O2’ (not ‘-O3’ like everything else), and for 7Zip, it seems no compile optimization at all? (Also, to make best use of the POWER9 processor, use ‘-mcpu=power9’).”
Multi-threaded: ? Verdict: AVERAGE To do: Try a recent version and check optimization options
pts/rust-prime-1.0.0
Rust Prime Benchmark:
Test: Prime Number Test To 200,000,000 Seconds < Lower Is Better
Talos 2 Power9 2x 4c 13.71
Talos 2 Power9 2x 4c optim 13.64
Threadripper 2990WX 12.49
Core i9 7980XE 8.18
2 x Intel Xeon Gold 6138 4.48
Talos II 2 x 22c POWER9 3.64
Phoronix comment: “Rustlang performance is looking good on POWER9. The Rust Mandelbrot benchmark performed poorly with POWER9, but that certainly wasn’t the case with the Rustlang Prime benchmark.”
Multi-threaded: YES Verdict: GOOD To do: Run Rust Mandelbrot benchmark that behaves poorly, in addition to Prime benchmark
pts/scikit-learn-1.0.1
It failed to install on my machine:
Scikit-Learn 0.17.1:
pts/scikit-learn-1.0.1
The test quit with a non-zero exit status.
E: ModuleNotFoundError: No module named 'sklearn.externals.six'
So I got results from only one source:
Talos II 2 x 22c POWER9 229.62
Talos II POWER9 Dual 18-Core 227.39
2 x Intel Xeon Gold 6138 176.07
EPYC 7601 144.51
Phoronix comment: “The SciKit-Learn performance could also be better improved for POWER9, possibly via further software optimizations.”
In the sthbrx article, it is said that the benchmark uses the libblas
that is a basic implementation among others and with no optimization for POWER9. Alternative libraries bring major speedups.
Multi-threaded: ? Verdict: BAD To do: Run a more recent version of the benchmark and analyze
pts/stockfish-1.1.1
v2014-11-26
Test: Total Time Nodes Per Second > Higher Is Better
Talos II POWER9 Dual 4 Core 21485986
Core i9 7980XE 46289588
EPYC 7601 58469775
Threadripper 2990WX 67300757
2 x Xeon Gold 6138 69928856
Talos II POWER9 Dual 18-Core 73165064
Talos II 2 x 22c POWER9 79137127
2 x EPYC 7601 100932062
I don’t remember where I found other results with other metrics but they showed Talos 2 between both EPYC models:
AMD EPYC 7551 5032 -msse -msse3 -mpopcnt
Talos II 2 x 22c POWER9 4915 -mcpu=power9 -mtune=power9
AMD EPYC 7601 4474 -msse -msse3 -mpopcnt
2 x Xeon Gold 6138 3343 -msse -msse3 -mpopcnt
Phoronix comment: “The Stockfish chess benchmark was running very well on POWER9 where the 22-core Talos II Lite was just behind the EPYC 7601, the dual quad-core POWER9 system well ahead of the other quad and octa core Intel Xeons, and the dual 18-core box outperforming the Xeon Gold 6138 by a small margin.”
Phoronix comment: “With the multi-threaded Stockfish chess benchmark using pthreads, the dual socket POWER9 system came up short of the dual EPYC 7601 Dell PowerEdge server.”
This last comment seems to mean Talos 2 performs very good but as the other results put it between both EPYC models and also because the dual 4-core model has poor results, I choose to say it has an average score.
Multi-threaded: YES Verdict: AVERAGE To do: To confirm heterogeneous results on a recent version of the benchmark, look at optimization options
pts/tinymembench-1.0.1
Tinymembench 2018-05-28:
Test: Standard Memcpy MB/s > Higher Is Better
2 x Xeon Gold 6138 6015.50
Talos 2 Power9 2x 4c default 10662.90
Talos 2 Power9 2x 4c optim 10676.10
Talos II POWER9 Dual 4 Core 12418.40
EPYC 7601 12613.20
Xeon E3-1280 v5 12877.90
Talos II POWER9 Dual 18-Core 14515.40
Talos II 2 x 22c POWER9 15453.00
Phoronix comment: “The Tinymembench performance on POWER9 was looking good for memory copy speed.”
Suprisingly, the 2 x Xeon Gold 6138
system looses this benchmark and … Talos 2 wins!
This slow to run benchmark has seen no improvement compiling it with optimization options.
Multi-threaded: NO Verdict: GOOD To do: Nothing
pts/x264-2.3.2
x264 2018-02-05:
Test: H.264 Video Encoding Frames Per Second > Higher Is Better
Talos II POWER9 Dual 4-Core 29.14
Xeon E3-1280 v5 42.24
Talos II 2 x 22c POWER9 43.72
Talos II POWER9 Dual 18-Core 51.22
AMD EPYC 7551 101.52
2 x Xeon Gold 6138 125.21
EPYC 7601 126.39
Phoronix comment: “The x264 video encoding program is one test showing it’s not too well optimized right now for POWER9”. And a bit later: “Similar to our first POWER9 benchmarking session back in April, the x264 performance and more broadly the multimedia CPU performance on POWER9 still could be much better optimized. The POWER9 performance was quite low compared to the x86_64 competition.”
Not an easy project to improve.
Multi-threaded: YES Verdict: BAD To do: That would require huge efforts …
system/octave-benchmark-1.0.0
GNU Octave Benchmark 4.4.1:
2 x Xeon Gold 6138 23.47
AMD EPYC 7551 22.66
AMD EPYC 7601 20.92
Threadripper 2990WX 16.78
Talos II 2 x 22c POWER9 14.92
Phoronix comment: “With the GNU Octave software as a MATLAB performance, the Talos II squeezed in front of the Threadripper systems for this single-core test.”
Multi-threaded: YES Verdict: GOOD To do:
system/blender-1.0.2
Blender 2.79:
Test: Blend File: Classroom - Compute: CPU-Only
Xeon E3-1280 v5 1656.82
Talos II POWER9 Dual 4-Core 1391.41
Talos II POWER9 Dual 18-Core 829.10
EPYC 7601 504.13
2 x Xeon Gold 6138 415.84
Test: Blend File: Pabellon Barcelona - Compute: CPU-Only
Xeon E3-1280 v5 1745.08
Talos II POWER9 Dual 4-Core 3057.45
Talos II POWER9 Dual 18-Core 1354.60
EPYC 7601 972.62
2 x Xeon Gold 6138 787.52
Phoronix comment: “The Blender 3D modeling performance on the CPU also leaves more room for optimization on the POWER9 front.”
“failed to use more than 15 threads, even when “-t 128” was added to the Blender command line”
Multi-threaded: YES Verdict: BAD To do: Check if the project is fixed to use more than 16 threads
Conclusion
That was very much work for me … taking the risk to get a conclusion that was the almost same than in the very first Phoronix article. Anyway, that allowed me to dive into these topics and to reconnect to my too long abandoned Talos 2. Thanks thanks to the synthesis below, that will give an orientation to the next step.
Benchmark | MT | Verdict | To do |
---|---|---|---|
pts/build-gcc-1.0.0 | YES | AVERAGE | |
pts/build-llvm-1.1.0 | YES | GOOD | |
pts/compress-7zip-1.7.1 | YES | GOOD | Check optim options and results in recent version of the benchmark |
pts/compress-zstd-1.0.0 | NO | GOOD | |
pts/c-ray-1.1.1 | YES | GOOD | |
pts/encode-flac-1.6.0 | NO | BAD | Check suggested improvements and measure their benefit |
pts/encode-mp3-1.7.0 | NO | BAD | Check proposed improvements integrated in the official project |
pts/openssl-1.11.0 | YES | BAD | |
pts/parboil-1.1.2 | ? | AVERAGE | Focus OPEN MP CUTCP that does not seem to be optimized for POWER9 |
pts/phpbench-1.1.5 | NO | AVERAGE | Identify the source of the problem |
pts/povray-1.2.1 | YES | BAD | Investigate, profile … |
pts/primesieve-1.4.1 | YES | BAD | Check if changes proposed have a positive and measurable impact |
pts/pybench-1.1.2 | NO | VERY BAD | Investigate |
pts/osbench-1.0.1 | N/A | GOOD | Check optim options, they improved greatly the test Create Processes |
pts/pgbench-1.8.4 | YES | GOOD | Check optim options, that provide a boost |
pts/redis-1.1.0 | NO | BAD | Check CPU activity |
pts/rodinia-1.2.2 | ? | AVERAGE | Try a recent version and check optimization options |
pts/rust-prime-1.0.0 | YES | GOOD | Run Mandelbrot that behaves poorly, in addition to Prime benchmark |
pts/scikit-learn-1.0.1 | ? | BAD | Run a more recent version of the benchmark and analyze |
pts/stockfish-1.1.1 | YES | AVERAGE | To confirm heterogeneous results on a recent version and optim options |
pts/tinymembench-1.0.1 | NO | GOOD | Nothing |
pts/x264-2.3.2 | YES | BAD | That would require huge efforts … |
system/octave-benchmark-1.0.0 | YES | GOOD | |
system/blender-1.0.2 | YES | BAD | Check if the project is fixed to use more than 16 threads |
Future actions will depend on the verdict:
GOOD
, whenTalos 2
is ont the first place, I will just check that there is no regression.
I saw in the generated webpage that other benchmarks run on Talos 2 used -mtune=power9 -mcpu=power9 Or even, for Postgresql -O3 -mtune=power9 -mcpu=power9 Environment Details
- Core i9 7980XE: CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native
- Threadripper 2990WX: CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native
- Talos II 2 x 22c POWER9: CXXFLAGS=-O3 -mtune=power9 -mcpu=power9 CFLAGS=-O3 -mtune=power9 -mcpu=power9
This 44-core configuration is intended to compete with the likes of AMD Threadripper and Intel Core i9 families and it did manage to successfully do so in a majority of the benchmarks come out ahead of the Threadripper 2990WX and Core i9 7980XE.
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor |
Except stockfish and osbench, results are far from other machines AMD EPYC 7551
and AMD EPYC 7601
and even more 2 x Intel Xeon Gold 6138
.
Annexes
References of Phoronix articles related to Talos 2:
- 2018-04-04: https://www.phoronix.com/review/power9-epyc-xeon phoronix-test-suite benchmark 1804049-AR-POWERTALO23
- 2018-09-25: https://www.phoronix.com/review/power9-talos-2 phoronix-test-suite benchmark 1806251-AR-LINUXCPUS06
- 2018-11-08: https://www.phoronix.com/review/power9-threadripper-core9 phoronix-test-suite benchmark 1811068-SK-TALOS205952
- 2018-11-27: https://www.phoronix.com/review/power9-x86-servers
- 2019-08-19: https://www.phoronix.com/review/rome-power9-arm
Notes for the next article:
About the methodology:
- Define a set of benchmarks, at least the same than in this article
- Run the default configuration
- List all that still behaves bad
- Compare results on my machine, as the initial article made a baseline
- Check and set optimization options
- Evaluate effort and potential benefit
pts/encode-flac-1.8.1 45.592 instead of 40.066 with pts/encode-flac-1.6.0 !!
Tried configuring with –disable-altivec and result is also 45.307
pts/encode-mp3-1.7.4 14.309 instead of 75.09 with pts/encode-mp3-1.7.0 !!
Timed LLVM Compilation 16.0: pts/build-llvm-1.5.0 [Build System: Unix Makefiles]
Average: 1169.625 Seconds
Deviation: 1.55%
OpenSSL 3.3: pts/openssl-3.3.0 [Algorithm: RSA4096]
Povray pts/povray-1.2.1: Test Installation 1 of 1 1 File Needed [44.75 MB / 2 Minutes] File Found: povray-3.7.0.7.tar.xz [44.75MB] Approximate Install Size: 172 MB Estimated Install Time: 10 Seconds Installing Test @ 23:28:27 The installer exited with a non-zero exit status. ERROR: C compiler cannot create executables LOG: ~/.phoronix-test-suite/installed-tests/pts/povray-1.2.1/install-failed.log
[PROBLEM] pts/povray-1.2.1 is not installed.