Talos 2 performance evaluated in 2018-2019
Let’s do a comeback in the past … Before I had my own Talos 2 machine, some benchmarks were published on Phoronix in 2018 and 2019. I remembered that some benchmarks gave good results but that few of them were not at the expected level of performance. On the other hand, and with benchmark reports here and there with various configurations, I forgot with which machines the Talos 2 was in competition.
Also, after the initial articles, some fixes were done very fast and proposed to concerned projects. There is an article, Improving performance of Phoronix benchmarks on POWER9, that analyzed some benchmarks run in the initial Phoronix article, focusing only on benchmarks that did not perform well on Talos 2: LBM Parboil, x264 video encoding, Primesieve, LAME, FLAC, OpenSSL, Scikit-Learn and Blender. The article proposes a description of the situation and suggests changes. Note that sometimes, changes were obvious, for example it appeared that a benchmark missed an optimization option!
Let’s remind that in the Phoronix benchmarks, some of them showed that the Talos 2 performed well. For example: Stockfish, LLVM Compilation, 7zip and Zstd compression, TinyMembench, Postgresql. We will come back on that in details.
So, it was time to refresh and synthesize all that. In the end, the investigation will show if there are still some benchmarks to look at for fixes or improvements.
Comparing hardware
I started to read all the articles and thanks to the benchmark identifier in some of them, I was able to run these old testsuites to have a snapshot on my own config, see what works or not, practice the phoronix-test-suite tool, etc.
I also took some comments from any articles and comments.
I kept a short list of common systems found in these articles, from the less to the most powerful, in theory:
| Processor | Year | Cores/Threads | Base Freq | Max Freq | Cache | TDP | Memory Support |
|---|---|---|---|---|---|---|---|
| Intel Xeon E3-1280 v5 (Skylake) | Q4-2015 | 4/8 | 3.70 GHz | 4.00 GHz | 8 MB | 80 W | DDR4-2133, up to 64 GB |
| Intel Core i9-7980XE | Q3-2017 | 18/36 | 2.60 GHz | 4.20 GHz | 24 MB | 165 W | DDR4-2666, up to 128 GB |
| Intel Xeon Gold 6138 | Q3-2017 | 20/40 | 2.00 GHz | 3.70 GHz | 28 MB | 125 W | DDR4-2666, up to 768 GB |
| AMD EPYC 7551 | Q2-2017 | 32/64 | 2.00 GHz | 3.00 GHz | 64 MB | 180 W | DDR4-2666, octa-channel |
| AMD EPYC 7601 | Q2-2017 | 32/64 | 2.20 GHz | 3.20 GHz | 64 MB | 180 W | DDR4-2666, octa-channel |
| AMD Ryzen Threadripper 2990WX | Q3-2018 | 32/64 | 3.00 GHz | 4.20 GHz | 64 MB | 250 W | DDR4-2933, quad-channel |
| IBM POWER9 (dual 22-core) | Q4-2017 | 44/176 | 2.80 GHz | 3.40 GHz | 120 MB | N/A | DDR4, up to 16 TB DDR4 |
So, still in theory, the big POWER9 configurations should compete with (and even beat) all these systems except the 2 x Xeon Gold 6138.
Note that I own a Talos 2 dual 4-core, that will be mentioned in this article.
Results of old benchmarks (2018 and 2019)
That will highlight the comparison with different machines and also with variants of Talos 2. Note that the listed are sorted by increasing performance (the best machine at the end).
pts/build-gcc-1.0.0
Timed GCC Compilation 7.2:
On my Talos 2, this old version fails. At installation, there is a message No rule to make 'defconfig' and then running the test:
pts/build-gcc-1.0.0 [Time To Compile]
E: ../.././gcc/match.pd:120:1 error: expected (, got NAME
So below are only results provided by Phoronix:
Test: Time to compile
Talos II 2 x 22c POWER9 1070.70
AMD EPYC 7551 926.08
AMD EPYC 7601 707.34
2 x Xeon Gold 6138 591.32
Phoronix wrote “Keep in mind the Talos II Secure Workstation was limited to a slow hard drive for this initial testing, but there are some build time references for those curious about the potential of Talos II serving as a POWER build platform.” With the provided results, let’s say that Talos 2 is rather close than EPYC 7551.”
Multi-threaded: YES
Verdict: AVERAGE
To do:
#### pts/build-llvm-1.1.0
Timed LLVM Compilation 6.0.1:
Test: Time To Compile (Seconds < Lower Is Better)
Talos 2 Power9 2x 4c 535.10
Talos II POWER9 Dual 4-Core 354.23
AMD EPYC 7551 247.00
AMD EPYC 7601 236.00
Core i9 7980XE 227.00
Threadripper 2990WX 221.00
Talos II 2 x 22c POWER9 183.00
AMD EPYC 7601 171.58
2x EPYC 7601 149.00
Talos II POWER9 Dual 18-Core 141.79
2 x Intel Xeon Gold 6138 127.08
There are some strange results, for example two very different results for the Talos 2 dual 4-core model, about results concerning the EPYC 7601 models … and also with the Talos 2 better with a dual 18-core than with a dual-22 core processor (maybe due to the slow drive evocated in the build-gcc test?).
Anyway, let’s say that high-end Talos 2 models are in the same area than Threadripper 2990WX and AMD EPYC 7601. We will see in another article running the same benchmarks in their recent versions.
Multi-threaded: YES
Verdict: GOOD
To do:
pts/compress-7zip-1.7.1
7-Zip Compression 16.02:
Test: Compress Speed Test (MIPS > Higher Is Better)
Talos 2 Power9 2x 4c 40043
AMD EPYC 7551 79708
Threadripper 2990WX 85484
Core i9 7980XE 95662
AMD EPYC 7601 99574
2 x Intel Xeon Gold 6138 143505
Talos II POWER9 Dual 18-Core 158405
Talos II 2 x 22c POWER9 162969
Phoronix comment: “The 7-Zip compression performance was doing very well on the POWER9 hardware with the 22-core Talos II Lite was outperforming the 32 core EPYC 7601 processor and the dual 18-core Talos II system was outperforming the dual Xeon Gold 6138 Tyan server. 7-Zip is another workload that always scales well including with SMT systems and here the 176 threads of the Talos II paid off well for this compression test.”
A comment says: “Is there a reason why Rodinia is only ‘-O2’ (not ‘-O3’ like everything else), and for 7Zip, it seems no compile optimization at all? (Also, to make best use of the POWER9 processor, use ‘-mcpu=power9’).” That may explain discrepancies in results. However, I did set optimization options that brought nothing in terms of performance.
Multi-threaded: YES
Verdict: GOOD
To do: Check optimization options and results in a recent version of the benchmark
pts/compress-zstd-1.0.0
Zstd Compression 1.3.4:
Test: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 (Seconds < Lower Is Better)
EPYC 7601 163.35
Talos 2 Power9 2x 4c 134.46
2 x Xeon Gold 6138 117.96
Talos II POWER9 Dual 4 Core 109.09
Talos II POWER9 Dual 18-Core 106.94
Phoronix comment:”POWER9 was performing extremely well in the Zstd compression benchmark. The Xeon systems were outperforming the EPYC hardware in the Zstd benchmark while the POWER9 hardware managed to beat out the Intel x86_64 CPUs in this single-thread test case.”
Multi-threaded: NO
Verdict: GOOD
To do:
pts/c-ray-1.1.1
C-Ray - 4K 16 Rays Per Pixel
pts/c-ray-1.1.1 (Seconds < Lower Is Better)
Talos 2 Power9 2x 4c 9.49
Raptor Talos II 4.65
AMD EPYC 7601 3.46
2 x Intel Xeon Gold 6138 3.15
Optimization does not change anything on my machine.
I don’t know what this config Raptor Talos II is … That’s too difficult to compare, so I ran pts/c-ray-1.2.0 and obtained these results:
Talos 2 Power9 2x 4c 83.99
Core i9 7980XE 33.51
2 x Xeon Gold 6138 27.16
AMD EPYC 7601 25.36
AMD EPYC 7601 21.97 -march=native
Talos II 2 x 22c POWER9 19.14 -mcpu=power9 -mtune=power9
Threadripper 2990WX 17.97
Phoronix comment: “The C-Ray ray-tracing performance of the Talos II was in line with the AMD Ryzen Threadripper 2990WX but had come up shy of the dual EPYC 7601 server.”
Multi-threaded: YES
Verdict: GOOD
To do:
pts/encode-flac-1.6.0
FLAC Audio Encoding 1.3.2:
Test: WAV To FLAC (Seconds < Lower Is Better)
Talos II 2 x 22c POWER9 51.79
Talos II POWER9 Dual 4 Core 43.99
Talos II POWER9 Dual 18-Core 43.95
Talos II POWER9 Dual 4-Core 40.23
AMD EPYC 7551 12.71
AMD EPYC 7601 11.79
2 x Intel Xeon Gold 6138 10.27
Xeon E3-1280 v5 9.60
Optimization options does not change anything.
Phoronix comment: “For audio encoding with FLAC and MP3 is another one of the areas where the POWER9 CPU performance is behind, but could possibly be improved with maturing POWER9 compiler support.”
The project lacks SIMD code for POWER. A patch series was done and integrated in FLAC 1.3.3, that improved the performance by 3.
Multi-threaded: NO
Verdict: BAD
To do: Check suggested improvements and measure their benefit
pts/encode-mp3-1.7.0
LAME MP3 Encoding 3.100:
Test: WAV To MP3 Seconds < Lower Is Better
Talos II 2 x 22c POWER9 75.27
Talos II POWER9 Dual 4-Core 75.57
Talos II POWER9 Dual 18-Core 67.48
AMD EPYC 7551 45.69
AMD EPYC 7601 42.67
2 x Xeon Gold 6138 32.29
Xeon E3-1280 v5 30.14
Results show that performance on POWER is very bad!
In the sthbrx article, it is said: “Due to configure options not being parsed correctly this benchmark is built without any optimisation regardless of architecture. We see a massive speedup by turning optimisations on, and a further 6-8% speedup by enabling USE_FAST_LOG (which is already enabled for Intel)”. It concludes on a x5 speedup. See the dedicated article for details. It mentions that the obtained speedup is x7! On my own machine, optimisation options made the score flight to more or less 15 seconds! What is confirmed in a recent version of the benchmark.
Multi-threaded: NO
Verdict: BAD
To do: Check proposed improvements have been integrated in the official project
pts/openssl-1.11.0
OpenSSL 1.1.1:
Test: RSA 4096-bit Performance Signs Per Second > Higher Is Better
Talos 2 Power9 2x 4c 1616.9
Talos II dual 18-core 3971.9
AMD EPYC 7551 4387.4
AMD EPYC 7601 4598.4
Core i9 7980XE 4686.0
Threadripper 2990WX 5821.0
2 x Intel Xeon Gold 6138 7965.4
Phoronix comment: “The OpenSSL results also have a ways to improve, the performance on POWER9 was mixed against the AMD/Intel CPUs with the dual 18-core system failing to outperform the EPYC 7601.”
OpenSSL 1.1.0f did not include some improvements existing in the mainline. An update of the used version should improve the perfs on Power9 by a factor of 1.7.
Multi-threaded: YES
Verdict: BAD
To do:
pts/parboil-1.1.2
Parboil v2.5
Test: OpenMP LBM
Talos II POWER9 Dual 4 Core 113.43
AMD EPYC 7551 71.88
Talos II 2 x 22c POWER9 66.08
Talos II POWER9 Dual 18-Core 44.18
AMD EPYC 7601 37.39
2 x Xeon Gold 6138 30.18
Phoronix comment: “With the Lattice-Boltzmann Method Fluid Dynamics test case, the dual 18-core POWER9 configuration was competing with the EPYC CPUs in this round of OpenMP benchmarking.”
From sthbrx article: “Also this benchmark is compiled without any optimisation. Recompiling with -O3 improves the results 3.2x on POWER9.”
Test: OpenMP CUTCP
Talos II 2 x 22c POWER9 9.69
AMD EPYC 7551 2.76
AMD EPYC 7601 2.61
2 x Xeon Gold 6138 2.28
Phoronix comment: “Some tests like this Distance-Cutoff Coulombic Potential test appear just not well optimized for POWER9 at this point.”
To see if changes have been included in recent version of the benchmark and if the 3x speedup applied.
Test: OpenMP Stencil
AMD EPYC 7551 17.35
AMD EPYC 7601 14.26
Talos II 2 x 22c POWER9 10.51
2 x Xeon Gold 6138 6.01
Phoronix comment: “While in the stencil test, the Talos II system beat out both AMD EPYC systems and was mid-way to the performance of the dual Xeon Gold server.”
Multi-threaded: ?
Verdict: AVERAGE
To do: Focus on the test `OPEN MP CUTCP` that does not seem to be optimized for POWER9.
pts/phpbench-1.1.5
PHPBench 0.8.1: pts/phpbench-1.1.5 [PHP Benchmark Suite] Score > Higher Is Better
Talos II POWER9 Dual 4-Core 166406
AMD EPYC 7551 365767
Talos II POWER9 Dual 18-Core 373681
AMD EPYC 7601 393659
Threadripper 2990WX 525276
2 x Xeon Gold 6138 606341
Xeon E3-1280 v5 651532
Core i9 7980XE 703666
Phoronix comment: “The Python and PHP benchmarks also show room for single-threaded performance improvements. POWER9 only came in line with the AMD EPYC hardware for the PHP language performance.”
Optimization options did not bring any visible enhancements.
Multi-threaded: NO
Verdict: AVERAGE
To do: Identify the source of the problem (but who will like to work on PHP?)
pts/povray-1.2.1
POV-Ray 3.7.0.7:
Test: Trace Time
Talos 2 2x 4c 93.57
Core i9 7980XE 28.29
Talos II 2 x 22c POWER9 25.28
AMD EPYC 7551 23.01
AMD EPYC 7601 22.61
2x Xeon Gold 6138 19.02
Threadripper 2990WX 17.92
Even with the benefit of the multi-thread support, the best Talos 2 system does not reach the performance of the AMC EPYC systems.
Multi-threaded: YES
Verdict: BAD
To do: Investigate ...
pts/primesieve-1.4.1
Primesieve 6.2:
Test: 1e12 Prime Number Generation Seconds < Lower Is Better
Talos II POWER9 Dual 4-Core 44.84
Talos II POWER9 Dual 18-Core 18.81
Talos II 2 x 22c POWER9 16.42
EPYC 7551 12.93
EPYC 7601 12.15
2 x Xeon Gold 6138 10.63
The nominal results on POWER are not convincing, showing lower performance than AMD EPYC systems.
After a pull request by Anton Blanchard, the author had to make changes, having understood the issue. To check in a recent version of the benchmark.
Multi-threaded: YES
Verdict: BAD
To do: Check if changes proposed by the author have a positive and measurable impact
pts/pybench-1.1.2
PyBench 2018-02-16:
Test: Total For Average Test Times Milliseconds < Lower Is Better
Talos II 2 x 22c POWER9 4088
Talos 2 Power9 2x 4c 3671
Talos II POWER9 Dual 18-Core 1867
EPYC 7601 1538
Threadripper 2990WX 1147
2 x Xeon Gold 6138 1127
Xeon E3-1280 v5 1043
Core i9 7980XE 955
Note that I also collected results that are not really the same:
Talos II 2 x 22c POWER9 4859
AMD EPYC 7551 2216
AMD EPYC 7601 2086
2 x Intel Xeon Gold 6138 1395
Anyway, that does not change the order: Python on Power9 systems is 2 or 3 times slower than on x86-64 machines (2 times slower than EPYC based systems).
Multi-threaded: NO
Verdict: VERY BAD
To do: Investigate
pts/osbench-1.0.1
Test: Create Threads us Per Event < Lower Is Better
AMD EPYC 7551 38.25
AMD EPYC 7601 30.71
Talos 2 Power9 2x 4c 27.28
Raptor Talos II 27.17
2 x Intel Xeon Gold 6138 23.07
Test: Create Processes us Per Event < Lower Is Better
Talos 2 Power9 2x 4c 74.33
AMD EPYC 7601 59.61
AMD EPYC 7551 57.95
2 x Intel Xeon Gold 6138 42.95
Raptor Talos II 29.77
Test: Memory Allocations Ns Per Event < Lower Is Better
AMD EPYC 7551 96.32
2 x Intel Xeon Gold 6138 96.05
AMD EPYC 7601 95.14
Talos 2 Power9 2x 4c 94.70
Raptor Talos II 83.03
Phoronix comment: “While lastly for now are the OSBench synthetic operating system benchmarks with the Raptor Talos II doing well against the EPYC and Xeon platforms.”
Talos 2 performs very, close to the 2 x Intel Xeon Gold 6138 or even better!
On my model, adding optimization options, only the test Create Processes had different results, with a better score of 49 instead of 74.
Multi-threaded: N/A
Verdict: GOOD
To do: Check optimization options, they improved greatly the test `Create Processes`
pts/pgbench-1.8.4
PostgreSQL pgbench 10.3:
Test: Scaling: Buffer Test - Test: Normal Load - Mode: Read Only TPS > Higher Is Better
Talos 2 Power9 2x 4c 11110
Xeon E3-1280 v5 116058
Talos 2 Power9 2x 4c optim 159835
Talos II POWER9 Dual 4 Core 222683
EPYC 7601 399625
Talos II Lite POWER9 22 Core 442106
Threadripper 2990WX 472250
Talos II 2 x 22c POWER9 544186 -mcpu=power9 -mtune=power9 (-march=native on x86_64)
Talos II POWER9 Dual 18-Core 574297
2 x Xeon Gold 6138 587539
Test: Scaling: Buffer Test - Test: Normal Load - Mode: Read Write TPS > Higher Is Better
Talos 2 Power9 2x 4c 542
Xeon E3-1280 v5 3803
Talos II POWER9 Dual 4 Core 6381
Talos II POWER9 Dual 18-Core 6451
EPYC 7601 6473
2 x Xeon Gold 6138 6588
Talos II POWER9 Dual 4-Core 14457
Talos 2 Power9 2x 4c optim 14507
Optimization clearly boosts the performance!
Phoronix comment: “The dual 18-core POWER9 system was managing to compete with the dual Xeon Gold server for the PostgreSQL database benchmarking.”
Multi-threaded: YES
Verdict: GOOD
To do: Check optimization options, that provide a boost
pts/redis-1.1.0
Test: GET Requests Per Second > Higher Is Better
Talos 2 Power9 2x 4c 904977
Raptor Talos II 1049994
Talos 2 Power9 2x 4c optim 1053740
AMD EPYC 7601 1703353
2 x Intel Xeon Gold 6138 2515784
Test: SET Requests Per Second > Higher Is Better
Talos 2 Power9 2x 4c optim 553403
Raptor Talos II 606874
Talos 2 Power9 2x 4c 615384
AMD EPYC 7601 1195935
2 x Intel Xeon Gold 6138 1744256
There is almost no CPU activity!
Multi-threaded: NO
Verdict: BAD
To do: Check CPU activity
pts/rodinia-1.2.2
Rodinia - OpenMP LavaMD pts/rodinia-1.2.2: Problem to install opencl packages
And also a problem of checksum on the rodinia_2.4.tar.bz2 archive.
AMD EPYC 7601 13.26
Talos II 2 x 22c POWER9 13.22
AMD EPYC 7551 12.71
2 x Intel Xeon Gold 6138 7.02
Not many results collected so let’s base our opinion on Phoronix comment: “First up was the Rodinia OpenMP benchmark where the Talos II with dual 22-core processors (44 cores / 176 threads) had the performance aligned with the Core i9 7980XE, which in turn were behind the AMD Ryzen Threadripper 2 WX series performance. With the Parboil and Rodinia scientific tests, the dual 22-core POWER9 system was just behind the EPYC 7551 for performance.”
A comment says: “Is there a reason why Rodinia is only ‘-O2’ (not ‘-O3’ like everything else), and for 7Zip, it seems no compile optimization at all? (Also, to make best use of the POWER9 processor, use ‘-mcpu=power9’).”
Multi-threaded: ?
Verdict: AVERAGE
To do: Try a recent version and check optimization options
pts/rust-prime-1.0.0
Rust Prime Benchmark:
Test: Prime Number Test To 200,000,000 Seconds < Lower Is Better
Talos 2 Power9 2x 4c 13.71
Talos 2 Power9 2x 4c optim 13.64
Threadripper 2990WX 12.49
Core i9 7980XE 8.18
2 x Intel Xeon Gold 6138 4.48
Talos II 2 x 22c POWER9 3.64
Phoronix comment: “Rustlang performance is looking good on POWER9. The Rust Mandelbrot benchmark performed poorly with POWER9, but that certainly wasn’t the case with the Rustlang Prime benchmark.”
Multi-threaded: YES
Verdict: GOOD
To do: Run Rust Mandelbrot benchmark that behaves poorly, in addition to Prime benchmark
pts/scikit-learn-1.0.1
It failed to install on my machine:
Scikit-Learn 0.17.1:
pts/scikit-learn-1.0.1
The test quit with a non-zero exit status.
E: ModuleNotFoundError: No module named 'sklearn.externals.six'
So I got results from only one source:
Talos II 2 x 22c POWER9 229.62
Talos II POWER9 Dual 18-Core 227.39
2 x Intel Xeon Gold 6138 176.07
EPYC 7601 144.51
Phoronix comment: “The SciKit-Learn performance could also be better improved for POWER9, possibly via further software optimizations.”
In the sthbrx article, it is said that the benchmark uses the libblas that is a basic implementation among others and with no optimization for POWER9. Alternative libraries bring major speedups.
Multi-threaded: ?
Verdict: BAD
To do: Run a more recent version of the benchmark and analyze
pts/stockfish-1.1.1
v2014-11-26
Test: Total Time Nodes Per Second > Higher Is Better
Talos II POWER9 Dual 4 Core 21485986
Core i9 7980XE 46289588
EPYC 7601 58469775
Threadripper 2990WX 67300757
2 x Xeon Gold 6138 69928856
Talos II POWER9 Dual 18-Core 73165064
Talos II 2 x 22c POWER9 79137127
2 x EPYC 7601 100932062
I don’t remember where I found other results with other metrics, but they showed Talos 2 between both EPYC models:
AMD EPYC 7551 5032 -msse -msse3 -mpopcnt
Talos II 2 x 22c POWER9 4915 -mcpu=power9 -mtune=power9
AMD EPYC 7601 4474 -msse -msse3 -mpopcnt
2 x Xeon Gold 6138 3343 -msse -msse3 -mpopcnt
Phoronix comment: “The Stockfish chess benchmark was running very well on POWER9 where the 22-core Talos II Lite was just behind the EPYC 7601, the dual quad-core POWER9 system well ahead of the other quad and octa core Intel Xeons, and the dual 18-core box outperforming the Xeon Gold 6138 by a small margin.”
Phoronix comment: “With the multi-threaded Stockfish chess benchmark using pthreads, the dual socket POWER9 system came up short of the dual EPYC 7601 Dell PowerEdge server.”
This last comment seems to mean Talos 2 performs very good but as the other results put it between both EPYC models and also because the dual 4-core model has poor results, I choose to say it has an average score.
Multi-threaded: YES
Verdict: AVERAGE
To do: To confirm heterogeneous results on a recent version of the benchmark, look at optimization options
pts/tinymembench-1.0.1
Tinymembench 2018-05-28:
Test: Standard Memcpy MB/s > Higher Is Better
2 x Xeon Gold 6138 6015.50
Talos 2 Power9 2x 4c default 10662.90
Talos 2 Power9 2x 4c optim 10676.10
Talos II POWER9 Dual 4 Core 12418.40
EPYC 7601 12613.20
Xeon E3-1280 v5 12877.90
Talos II POWER9 Dual 18-Core 14515.40
Talos II 2 x 22c POWER9 15453.00
Phoronix comment: “The Tinymembench performance on POWER9 was looking good for memory copy speed.”
Suprisingly, the 2 x Xeon Gold 6138 system looses this benchmark and … Talos 2 wins!
This slow to run benchmark has seen no improvement compiling it with optimization options.
Multi-threaded: NO
Verdict: GOOD
To do: Nothing
pts/x264-2.3.2
x264 2018-02-05:
Test: H.264 Video Encoding Frames Per Second > Higher Is Better
Talos II POWER9 Dual 4-Core 29.14
Xeon E3-1280 v5 42.24
Talos II 2 x 22c POWER9 43.72
Talos II POWER9 Dual 18-Core 51.22
AMD EPYC 7551 101.52
2 x Xeon Gold 6138 125.21
EPYC 7601 126.39
Phoronix comment: “The x264 video encoding program is one test showing it’s not too well optimized right now for POWER9”. And a bit later: “Similar to our first POWER9 benchmarking session back in April, the x264 performance and more broadly the multimedia CPU performance on POWER9 still could be much better optimized. The POWER9 performance was quite low compared to the x86_64 competition.”
Not an easy project to improve.
Multi-threaded: YES
Verdict: BAD
To do: That would require huge efforts ...
system/octave-benchmark-1.0.0
GNU Octave Benchmark 4.4.1:
2 x Xeon Gold 6138 23.47
AMD EPYC 7551 22.66
AMD EPYC 7601 20.92
Threadripper 2990WX 16.78
Talos II 2 x 22c POWER9 14.92
Phoronix comment: “With the GNU Octave software as a MATLAB performance, the Talos II squeezed in front of the Threadripper systems for this single-core test.”
Multi-threaded: YES
Verdict: GOOD
To do:
system/blender-1.0.2
Blender 2.79:
Test: Blend File: Classroom - Compute: CPU-Only
Xeon E3-1280 v5 1656.82
Talos II POWER9 Dual 4-Core 1391.41
Talos II POWER9 Dual 18-Core 829.10
EPYC 7601 504.13
2 x Xeon Gold 6138 415.84
Test: Blend File: Pabellon Barcelona - Compute: CPU-Only
Xeon E3-1280 v5 1745.08
Talos II POWER9 Dual 4-Core 3057.45
Talos II POWER9 Dual 18-Core 1354.60
EPYC 7601 972.62
2 x Xeon Gold 6138 787.52
Phoronix comment: “The Blender 3D modeling performance on the CPU also leaves more room for optimization on the POWER9 front.”
“failed to use more than 15 threads, even when “-t 128” was added to the Blender command line”
Multi-threaded: YES
Verdict: BAD
To do: Check if the project is fixed to use more than 16 threads
Conclusion
That was very much work for me … taking the risk to get a conclusion that it was almost the same as in the very first Phoronix article. Anyway, that allowed me to dive into these topics and to reconnect to my too long abandoned Talos 2. Thanks to the synthesis below, that will give an orientation to the next step.
| Benchmark | MT | Verdict | To do |
|---|---|---|---|
| pts/build-gcc-1.0.0 | YES | AVERAGE | |
| pts/build-llvm-1.1.0 | YES | GOOD | |
| pts/compress-7zip-1.7.1 | YES | GOOD | Check optim options and results in recent version of the benchmark |
| pts/compress-zstd-1.0.0 | NO | GOOD | |
| pts/c-ray-1.1.1 | YES | GOOD | |
| pts/encode-flac-1.6.0 | NO | BAD | Check suggested improvements and measure their benefit |
| pts/encode-mp3-1.7.0 | NO | BAD | Check proposed improvements integrated in the official project |
| pts/openssl-1.11.0 | YES | BAD | |
| pts/parboil-1.1.2 | ? | AVERAGE | Focus OPEN MP CUTCP that does not seem to be optimized for POWER9 |
| pts/phpbench-1.1.5 | NO | AVERAGE | Identify the source of the problem |
| pts/povray-1.2.1 | YES | BAD | Investigate, profile … |
| pts/primesieve-1.4.1 | YES | BAD | Check if changes proposed have a positive and measurable impact |
| pts/pybench-1.1.2 | NO | VERY BAD | Investigate |
| pts/osbench-1.0.1 | N/A | GOOD | Check optim options, they improved greatly the test Create Processes |
| pts/pgbench-1.8.4 | YES | GOOD | Check optim options, that provide a boost |
| pts/redis-1.1.0 | NO | BAD | Check CPU activity |
| pts/rodinia-1.2.2 | ? | AVERAGE | Try a recent version and check optimization options |
| pts/rust-prime-1.0.0 | YES | GOOD | Run Mandelbrot that behaves poorly, in addition to Prime benchmark |
| pts/scikit-learn-1.0.1 | ? | BAD | Run a more recent version of the benchmark and analyze |
| pts/stockfish-1.1.1 | YES | AVERAGE | To confirm heterogeneous results on a recent version and optim options |
| pts/tinymembench-1.0.1 | NO | GOOD | |
| pts/x264-2.3.2 | YES | BAD | That would require huge efforts … |
| system/octave-benchmark-1.0.0 | YES | GOOD | |
| system/blender-1.0.2 | YES | BAD | Check if the project is fixed to use more than 16 threads |
With this panel of benchmarks, we get 9 GOOD, 5 AVERAGE and 10 BAD. We expect better as our machines are very capable. We will see that in another article with results on recent versions of these benchmarks and possibly more.
Future actions will depend on the verdict:
GOOD, whenTalos 2is on the first place, I will just celebrate and check that there is no regressionAVERAGE, whenTalos 2is in the same area of performance thanAMD EPYCmachines, I will at least check that there is no regressionBAD, whenTalos 2does not show performance it should, I will check that proposed changes have been applied and investigate if necessary, to identify possible remaining problems
I will also check the impact of SMT settings and compilation options, seeing that in some cases, the Phoronix reports mention some flags (CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native for x86_64 platforms and CXXFLAGS=-O3 -mtune=power9 -mcpu=power9 CFLAGS=-O3 -mtune=power9 -mcpu=power9 for ours), but I’m not sure if it’s always the case.
Annexes
References of Phoronix articles related to Talos 2:
- 2018-04-04: power9-epyc-xeon: phoronix-test-suite benchmark 1804049-AR-POWERTALO23
- 2018-09-25: power9-talos-2: phoronix-test-suite benchmark 1806251-AR-LINUXCPUS06
- 2018-11-08: power9-threadripper-core9: phoronix-test-suite benchmark 1811068-SK-TALOS205952
- 2018-11-27: power9-x86-servers
- 2019-08-19: rome-power9-arm