Let’s do a comeback in the past … Before I had my own Talos 2 machine, some benchmarks were published on Phoronix in 2018 and 2019. I remembered that some benchmarks gave good results but that few of them were not at the expected level of performance. On the other hand, and with benchmark reports here and there with various configurations, I forgot with which machines the Talos 2 was in competition.

Also, after the initial articles, some fixes were done very fast and proposed to concerned projects. There is an article, Improving performance of Phoronix benchmarks on POWER9, that analyzed some benchmarks run in the initial Phoronix article, focusing only on benchmarks that did not perform well on Talos 2: LBM Parboil, x264 video encoding, Primesieve, LAME, FLAC, OpenSSL, Scikit-Learn and Blender. The article proposes a description of the situation and suggests changes. Note that sometimes, changes were obvious, for example it appeared that a benchmark missed an optimization option!

Let’s remind that in the Phoronix benchmarks, some of them showed that the Talos 2 performed well. For example: Stockfish, LLVM Compilation, 7zip and Zstd compression, TinyMembench, Postgresql. We will come back on that in details.

So, it was time to refresh and synthesize all that. In the end, the investigation will show if there are still some benchmarks to look at for fixes or improvements.

Comparing hardware

I started to read all the articles and thanks to the benchmark identifier in some of them, I was able to run these old testsuites to have a snapshot on my own config, see what works or not, practice the phoronix-test-suite tool, etc.

I also took some comments from any articles and comments.

I kept a short list of common systems found in these articles, from the less to the most powerful, in theory:

Processor Year Cores/Threads Base Freq Max Freq Cache TDP Memory Support
Intel Xeon E3-1280 v5 (Skylake) Q4-2015 4/8 3.70 GHz 4.00 GHz 8 MB 80 W DDR4-2133, up to 64 GB
Intel Core i9-7980XE Q3-2017 18/36 2.60 GHz 4.20 GHz 24 MB 165 W DDR4-2666, up to 128 GB
Intel Xeon Gold 6138 Q3-2017 20/40 2.00 GHz 3.70 GHz 28 MB 125 W DDR4-2666, up to 768 GB
AMD EPYC 7551 Q2-2017 32/64 2.00 GHz 3.00 GHz 64 MB 180 W DDR4-2666, octa-channel
AMD EPYC 7601 Q2-2017 32/64 2.20 GHz 3.20 GHz 64 MB 180 W DDR4-2666, octa-channel
AMD Ryzen Threadripper 2990WX Q3-2018 32/64 3.00 GHz 4.20 GHz 64 MB 250 W DDR4-2933, quad-channel
IBM POWER9 (dual 22-core) Q4-2017 44/176 2.80 GHz 3.40 GHz 120 MB N/A DDR4, up to 16 TB DDR4

So, still in theory, the big POWER9 configurations should compete with (and even beat) all these systems except the 2 x Xeon Gold 6138.

Note that I own a Talos 2 dual 4-core, that will be mentioned in this article.

Results of old benchmarks (2018 and 2019)

That will highlight the comparison with different machines and also with variants of Talos 2. Note that the listed are sorted by increasing performance (the best machine at the end).

pts/build-gcc-1.0.0

Timed GCC Compilation 7.2:

On my Talos 2, this old version fails. At installation, there is a message No rule to make 'defconfig' and then running the test:

    pts/build-gcc-1.0.0 [Time To Compile]
        E: ../.././gcc/match.pd:120:1 error: expected (, got NAME

So below are only results provided by Phoronix:

Test: Time to compile

    Talos II 2 x 22c POWER9 1070.70
    AMD EPYC 7551            926.08
    AMD EPYC 7601            707.34
    2 x Xeon Gold 6138       591.32

Phoronix wrote “Keep in mind the Talos II Secure Workstation was limited to a slow hard drive for this initial testing, but there are some build time references for those curious about the potential of Talos II serving as a POWER build platform.” With the provided results, let’s say that Talos 2 is rather close than EPYC 7551.”

Multi-threaded: YES
Verdict: AVERAGE
To do:

#### pts/build-llvm-1.1.0

Timed LLVM Compilation 6.0.1:

Test: Time To Compile (Seconds < Lower Is Better)

    Talos 2 Power9 2x 4c                   535.10
    Talos II POWER9 Dual 4-Core            354.23
    AMD EPYC 7551                          247.00
    AMD EPYC 7601                          236.00
    Core i9 7980XE                         227.00
    Threadripper 2990WX                    221.00
    Talos II 2 x 22c POWER9                183.00
    AMD EPYC 7601                          171.58
    2x EPYC 7601                           149.00
    Talos II POWER9 Dual 18-Core           141.79
    2 x Intel Xeon Gold 6138               127.08

There are some strange results, for example two very different results for the Talos 2 dual 4-core model, about results concerning the EPYC 7601 models … and also with the Talos 2 better with a dual 18-core than with a dual-22 core processor (maybe due to the slow drive evocated in the build-gcc test?).

Anyway, let’s say that high-end Talos 2 models are in the same area than Threadripper 2990WX and AMD EPYC 7601. We will see in another article running the same benchmarks in their recent versions.

Multi-threaded: YES
Verdict: GOOD
To do:

pts/compress-7zip-1.7.1

7-Zip Compression 16.02:

Test: Compress Speed Test (MIPS > Higher Is Better)

    Talos 2 Power9 2x 4c           40043
    AMD EPYC 7551                  79708
    Threadripper 2990WX            85484
    Core i9 7980XE                 95662
    AMD EPYC 7601                  99574
    2 x Intel Xeon Gold 6138      143505
    Talos II POWER9 Dual 18-Core  158405
    Talos II 2 x 22c POWER9       162969

Phoronix comment: “The 7-Zip compression performance was doing very well on the POWER9 hardware with the 22-core Talos II Lite was outperforming the 32 core EPYC 7601 processor and the dual 18-core Talos II system was outperforming the dual Xeon Gold 6138 Tyan server. 7-Zip is another workload that always scales well including with SMT systems and here the 176 threads of the Talos II paid off well for this compression test.”

A comment says: “Is there a reason why Rodinia is only ‘-O2’ (not ‘-O3’ like everything else), and for 7Zip, it seems no compile optimization at all? (Also, to make best use of the POWER9 processor, use ‘-mcpu=power9’).” That may explain discrepancies in results. However, I did set optimization options that brought nothing in terms of performance.

Multi-threaded: YES
Verdict: GOOD
To do: Check optimization options and results in a recent version of the benchmark

pts/compress-zstd-1.0.0

Zstd Compression 1.3.4:

Test: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 (Seconds < Lower Is Better)

    EPYC 7601                      163.35
    Talos 2 Power9 2x 4c           134.46 
    2 x Xeon Gold 6138             117.96
    Talos II POWER9 Dual 4 Core    109.09
    Talos II POWER9 Dual 18-Core   106.94

Phoronix comment:”POWER9 was performing extremely well in the Zstd compression benchmark. The Xeon systems were outperforming the EPYC hardware in the Zstd benchmark while the POWER9 hardware managed to beat out the Intel x86_64 CPUs in this single-thread test case.”

Multi-threaded: NO
Verdict: GOOD
To do:

pts/c-ray-1.1.1

C-Ray - 4K 16 Rays Per Pixel

pts/c-ray-1.1.1 (Seconds < Lower Is Better)

    Talos 2 Power9 2x 4c       9.49
    Raptor Talos II            4.65
    AMD EPYC 7601              3.46
    2 x Intel Xeon Gold 6138   3.15

Optimization does not change anything on my machine. I don’t know what this config Raptor Talos II is … That’s too difficult to compare, so I ran pts/c-ray-1.2.0 and obtained these results:

    Talos 2 Power9 2x 4c                83.99
    Core i9 7980XE                      33.51
    2 x Xeon Gold 6138                  27.16
    AMD EPYC 7601                       25.36
    AMD EPYC 7601                       21.97 -march=native
    Talos II 2 x 22c POWER9             19.14 -mcpu=power9 -mtune=power9
    Threadripper 2990WX                 17.97

Phoronix comment: “The C-Ray ray-tracing performance of the Talos II was in line with the AMD Ryzen Threadripper 2990WX but had come up shy of the dual EPYC 7601 server.”

Multi-threaded: YES
Verdict: GOOD
To do:

pts/encode-flac-1.6.0

FLAC Audio Encoding 1.3.2:

Test: WAV To FLAC (Seconds < Lower Is Better)

    Talos II 2 x 22c POWER9        51.79
    Talos II POWER9 Dual 4 Core    43.99
    Talos II POWER9 Dual 18-Core   43.95
    Talos II POWER9 Dual 4-Core    40.23
    AMD EPYC 7551                  12.71
    AMD EPYC 7601                  11.79
    2 x Intel Xeon Gold 6138       10.27
    Xeon E3-1280 v5                 9.60

Optimization options does not change anything.

Phoronix comment: “For audio encoding with FLAC and MP3 is another one of the areas where the POWER9 CPU performance is behind, but could possibly be improved with maturing POWER9 compiler support.”

The project lacks SIMD code for POWER. A patch series was done and integrated in FLAC 1.3.3, that improved the performance by 3.

Multi-threaded: NO
Verdict: BAD
To do: Check suggested improvements and measure their benefit

pts/encode-mp3-1.7.0

LAME MP3 Encoding 3.100:

Test: WAV To MP3 Seconds < Lower Is Better

    Talos II 2 x 22c POWER9        75.27 
    Talos II POWER9 Dual 4-Core    75.57
    Talos II POWER9 Dual 18-Core   67.48
    AMD EPYC 7551                  45.69
    AMD EPYC 7601                  42.67
    2 x Xeon Gold 6138             32.29
    Xeon E3-1280 v5                30.14

Results show that performance on POWER is very bad!

In the sthbrx article, it is said: “Due to configure options not being parsed correctly this benchmark is built without any optimisation regardless of architecture. We see a massive speedup by turning optimisations on, and a further 6-8% speedup by enabling USE_FAST_LOG (which is already enabled for Intel)”. It concludes on a x5 speedup. See the dedicated article for details. It mentions that the obtained speedup is x7! On my own machine, optimisation options made the score flight to more or less 15 seconds! What is confirmed in a recent version of the benchmark.

Multi-threaded: NO
Verdict: BAD
To do: Check proposed improvements have been integrated in the official project

pts/openssl-1.11.0

OpenSSL 1.1.1:

Test: RSA 4096-bit Performance Signs Per Second > Higher Is Better

    Talos 2 Power9 2x 4c                    1616.9
    Talos II dual 18-core                   3971.9
    AMD EPYC 7551                           4387.4
    AMD EPYC 7601                           4598.4
    Core i9 7980XE                          4686.0
    Threadripper 2990WX                     5821.0 
    2 x Intel Xeon Gold 6138                7965.4

Phoronix comment: “The OpenSSL results also have a ways to improve, the performance on POWER9 was mixed against the AMD/Intel CPUs with the dual 18-core system failing to outperform the EPYC 7601.”

OpenSSL 1.1.0f did not include some improvements existing in the mainline. An update of the used version should improve the perfs on Power9 by a factor of 1.7.

Multi-threaded: YES
Verdict: BAD
To do:

pts/parboil-1.1.2

Parboil v2.5

Test: OpenMP LBM

    Talos II POWER9 Dual 4 Core   113.43
    AMD EPYC 7551                  71.88
    Talos II 2 x 22c POWER9        66.08
    Talos II POWER9 Dual 18-Core   44.18
    AMD EPYC 7601                  37.39
    2 x Xeon Gold 6138             30.18 

Phoronix comment: “With the Lattice-Boltzmann Method Fluid Dynamics test case, the dual 18-core POWER9 configuration was competing with the EPYC CPUs in this round of OpenMP benchmarking.”

From sthbrx article: “Also this benchmark is compiled without any optimisation. Recompiling with -O3 improves the results 3.2x on POWER9.”

Test: OpenMP CUTCP

    Talos II 2 x 22c POWER9  9.69
    AMD EPYC 7551            2.76
    AMD EPYC 7601            2.61
    2 x Xeon Gold 6138       2.28 

Phoronix comment: “Some tests like this Distance-Cutoff Coulombic Potential test appear just not well optimized for POWER9 at this point.”

To see if changes have been included in recent version of the benchmark and if the 3x speedup applied.

Test: OpenMP Stencil

    AMD EPYC 7551           17.35
    AMD EPYC 7601           14.26
    Talos II 2 x 22c POWER9 10.51
    2 x Xeon Gold 6138       6.01 

Phoronix comment: “While in the stencil test, the Talos II system beat out both AMD EPYC systems and was mid-way to the performance of the dual Xeon Gold server.”

Multi-threaded: ?
Verdict: AVERAGE
To do: Focus on the test `OPEN MP CUTCP` that does not seem to be optimized for POWER9.

pts/phpbench-1.1.5

PHPBench 0.8.1: pts/phpbench-1.1.5 [PHP Benchmark Suite] Score > Higher Is Better

    Talos II POWER9 Dual 4-Core    166406
    AMD EPYC 7551                  365767
    Talos II POWER9 Dual 18-Core   373681
    AMD EPYC 7601                  393659
    Threadripper 2990WX            525276 
    2 x Xeon Gold 6138             606341 
    Xeon E3-1280 v5                651532 
    Core i9 7980XE                 703666 

Phoronix comment: “The Python and PHP benchmarks also show room for single-threaded performance improvements. POWER9 only came in line with the AMD EPYC hardware for the PHP language performance.”

Optimization options did not bring any visible enhancements.

Multi-threaded: NO
Verdict: AVERAGE
To do: Identify the source of the problem (but who will like to work on PHP?)

pts/povray-1.2.1

POV-Ray 3.7.0.7:

Test: Trace Time

    Talos 2 2x 4c              93.57
    Core i9 7980XE             28.29
    Talos II 2 x 22c POWER9    25.28
    AMD EPYC 7551              23.01
    AMD EPYC 7601              22.61
    2x Xeon Gold 6138          19.02
    Threadripper 2990WX        17.92

Even with the benefit of the multi-thread support, the best Talos 2 system does not reach the performance of the AMC EPYC systems.

Multi-threaded: YES
Verdict: BAD
To do: Investigate ...

pts/primesieve-1.4.1

Primesieve 6.2:

Test: 1e12 Prime Number Generation Seconds < Lower Is Better

    Talos II POWER9 Dual 4-Core    44.84 
    Talos II POWER9 Dual 18-Core   18.81
    Talos II 2 x 22c POWER9        16.42
    EPYC 7551                      12.93
    EPYC 7601                      12.15
    2 x Xeon Gold 6138             10.63

The nominal results on POWER are not convincing, showing lower performance than AMD EPYC systems.

After a pull request by Anton Blanchard, the author had to make changes, having understood the issue. To check in a recent version of the benchmark.

Multi-threaded: YES
Verdict: BAD
To do: Check if changes proposed by the author have a positive and measurable impact

pts/pybench-1.1.2

PyBench 2018-02-16:

Test: Total For Average Test Times Milliseconds < Lower Is Better

    Talos II 2 x 22c POWER9       4088 
    Talos 2 Power9 2x 4c          3671 
    Talos II POWER9 Dual 18-Core  1867
    EPYC 7601                     1538
    Threadripper 2990WX           1147
    2 x Xeon Gold 6138            1127
    Xeon E3-1280 v5               1043
    Core i9 7980XE                 955

Note that I also collected results that are not really the same:

    Talos II 2 x 22c POWER9       4859
    AMD EPYC 7551                 2216
    AMD EPYC 7601                 2086
    2 x Intel Xeon Gold 6138      1395

Anyway, that does not change the order: Python on Power9 systems is 2 or 3 times slower than on x86-64 machines (2 times slower than EPYC based systems).

Multi-threaded: NO
Verdict: VERY BAD
To do: Investigate

pts/osbench-1.0.1

Test: Create Threads us Per Event < Lower Is Better

    AMD EPYC 7551              38.25 
    AMD EPYC 7601              30.71 
    Talos 2 Power9 2x 4c       27.28
    Raptor Talos II            27.17 
    2 x Intel Xeon Gold 6138   23.07

Test: Create Processes us Per Event < Lower Is Better

    Talos 2 Power9 2x 4c       74.33
    AMD EPYC 7601              59.61
    AMD EPYC 7551              57.95
    2 x Intel Xeon Gold 6138   42.95
    Raptor Talos II            29.77

Test: Memory Allocations Ns Per Event < Lower Is Better

    AMD EPYC 7551              96.32
    2 x Intel Xeon Gold 6138   96.05
    AMD EPYC 7601              95.14
    Talos 2 Power9 2x 4c       94.70
    Raptor Talos II            83.03

Phoronix comment: “While lastly for now are the OSBench synthetic operating system benchmarks with the Raptor Talos II doing well against the EPYC and Xeon platforms.”

Talos 2 performs very, close to the 2 x Intel Xeon Gold 6138 or even better!

On my model, adding optimization options, only the test Create Processes had different results, with a better score of 49 instead of 74.

Multi-threaded: N/A
Verdict: GOOD
To do: Check optimization options, they improved greatly the test `Create Processes`

pts/pgbench-1.8.4

PostgreSQL pgbench 10.3:

Test: Scaling: Buffer Test - Test: Normal Load - Mode: Read Only TPS > Higher Is Better

    Talos 2 Power9 2x 4c            11110
    Xeon E3-1280 v5                116058
    Talos 2 Power9 2x 4c optim     159835
    Talos II POWER9 Dual 4 Core    222683
    EPYC 7601                      399625
    Talos II Lite POWER9 22 Core   442106 
    Threadripper 2990WX            472250
    Talos II 2 x 22c POWER9        544186 -mcpu=power9 -mtune=power9 (-march=native on x86_64)
    Talos II POWER9 Dual 18-Core   574297 
    2 x Xeon Gold 6138             587539 

Test: Scaling: Buffer Test - Test: Normal Load - Mode: Read Write TPS > Higher Is Better

    Talos 2 Power9 2x 4c              542
    Xeon E3-1280 v5                  3803
    Talos II POWER9 Dual 4 Core      6381
    Talos II POWER9 Dual 18-Core     6451
    EPYC 7601                        6473
    2 x Xeon Gold 6138               6588
    Talos II POWER9 Dual 4-Core     14457
    Talos 2 Power9 2x 4c optim      14507

Optimization clearly boosts the performance!

Phoronix comment: “The dual 18-core POWER9 system was managing to compete with the dual Xeon Gold server for the PostgreSQL database benchmarking.”

Multi-threaded: YES
Verdict: GOOD
To do: Check optimization options, that provide a boost

pts/redis-1.1.0

Test: GET Requests Per Second > Higher Is Better

    Talos 2 Power9 2x 4c        904977
    Raptor Talos II            1049994
    Talos 2 Power9 2x 4c optim 1053740
    AMD EPYC 7601              1703353
    2 x Intel Xeon Gold 6138   2515784

Test: SET Requests Per Second > Higher Is Better

    Talos 2 Power9 2x 4c optim  553403
    Raptor Talos II             606874
    Talos 2 Power9 2x 4c        615384
    AMD EPYC 7601              1195935
    2 x Intel Xeon Gold 6138   1744256

There is almost no CPU activity!

Multi-threaded: NO
Verdict: BAD
To do: Check CPU activity

pts/rodinia-1.2.2

Rodinia - OpenMP LavaMD pts/rodinia-1.2.2: Problem to install opencl packages

And also a problem of checksum on the rodinia_2.4.tar.bz2 archive.

    AMD EPYC 7601                13.26
    Talos II 2 x 22c POWER9      13.22
    AMD EPYC 7551                12.71
    2 x Intel Xeon Gold 6138      7.02

Not many results collected so let’s base our opinion on Phoronix comment: “First up was the Rodinia OpenMP benchmark where the Talos II with dual 22-core processors (44 cores / 176 threads) had the performance aligned with the Core i9 7980XE, which in turn were behind the AMD Ryzen Threadripper 2 WX series performance. With the Parboil and Rodinia scientific tests, the dual 22-core POWER9 system was just behind the EPYC 7551 for performance.”

A comment says: “Is there a reason why Rodinia is only ‘-O2’ (not ‘-O3’ like everything else), and for 7Zip, it seems no compile optimization at all? (Also, to make best use of the POWER9 processor, use ‘-mcpu=power9’).”

Multi-threaded: ?
Verdict: AVERAGE
To do: Try a recent version and check optimization options

pts/rust-prime-1.0.0

Rust Prime Benchmark:

Test: Prime Number Test To 200,000,000 Seconds < Lower Is Better

    Talos 2 Power9 2x 4c             13.71
    Talos 2 Power9 2x 4c optim       13.64
    Threadripper 2990WX              12.49
    Core i9 7980XE                    8.18
    2 x Intel Xeon Gold 6138          4.48
    Talos II 2 x 22c POWER9           3.64

Phoronix comment: “Rustlang performance is looking good on POWER9. The Rust Mandelbrot benchmark performed poorly with POWER9, but that certainly wasn’t the case with the Rustlang Prime benchmark.”

Multi-threaded: YES
Verdict: GOOD
To do: Run Rust Mandelbrot benchmark that behaves poorly, in addition to Prime benchmark

pts/scikit-learn-1.0.1

It failed to install on my machine:

Scikit-Learn 0.17.1:
    pts/scikit-learn-1.0.1
        The test quit with a non-zero exit status.
        E: ModuleNotFoundError: No module named 'sklearn.externals.six'

So I got results from only one source:

    Talos II 2 x 22c POWER9        229.62
    Talos II POWER9 Dual 18-Core   227.39
    2 x Intel Xeon Gold 6138       176.07
    EPYC 7601                      144.51

Phoronix comment: “The SciKit-Learn performance could also be better improved for POWER9, possibly via further software optimizations.”

In the sthbrx article, it is said that the benchmark uses the libblas that is a basic implementation among others and with no optimization for POWER9. Alternative libraries bring major speedups.

Multi-threaded: ?
Verdict: BAD
To do: Run a more recent version of the benchmark and analyze

pts/stockfish-1.1.1

v2014-11-26

Test: Total Time Nodes Per Second > Higher Is Better

    Talos II POWER9 Dual 4 Core    21485986
    Core i9 7980XE                 46289588
    EPYC 7601                      58469775 
    Threadripper 2990WX            67300757 
    2 x Xeon Gold 6138             69928856 
    Talos II POWER9 Dual 18-Core   73165064 
    Talos II 2 x 22c POWER9        79137127 
    2 x EPYC 7601                 100932062

I don’t remember where I found other results with other metrics, but they showed Talos 2 between both EPYC models:

    AMD EPYC 7551                      5032    -msse -msse3 -mpopcnt
    Talos II 2 x 22c POWER9            4915    -mcpu=power9 -mtune=power9
    AMD EPYC 7601                      4474    -msse -msse3 -mpopcnt
    2 x Xeon Gold 6138                 3343    -msse -msse3 -mpopcnt

Phoronix comment: “The Stockfish chess benchmark was running very well on POWER9 where the 22-core Talos II Lite was just behind the EPYC 7601, the dual quad-core POWER9 system well ahead of the other quad and octa core Intel Xeons, and the dual 18-core box outperforming the Xeon Gold 6138 by a small margin.”

Phoronix comment: “With the multi-threaded Stockfish chess benchmark using pthreads, the dual socket POWER9 system came up short of the dual EPYC 7601 Dell PowerEdge server.”

This last comment seems to mean Talos 2 performs very good but as the other results put it between both EPYC models and also because the dual 4-core model has poor results, I choose to say it has an average score.

Multi-threaded: YES
Verdict: AVERAGE
To do: To confirm heterogeneous results on a recent version of the benchmark, look at optimization options

pts/tinymembench-1.0.1

Tinymembench 2018-05-28:

Test: Standard Memcpy MB/s > Higher Is Better

    2 x Xeon Gold 6138              6015.50
    Talos 2 Power9 2x 4c default   10662.90
    Talos 2 Power9 2x 4c optim     10676.10
    Talos II POWER9 Dual 4 Core    12418.40
    EPYC 7601                      12613.20
    Xeon E3-1280 v5                12877.90
    Talos II POWER9 Dual 18-Core   14515.40
    Talos II 2 x 22c POWER9        15453.00

Phoronix comment: “The Tinymembench performance on POWER9 was looking good for memory copy speed.”

Suprisingly, the 2 x Xeon Gold 6138 system looses this benchmark and … Talos 2 wins! This slow to run benchmark has seen no improvement compiling it with optimization options.

Multi-threaded: NO
Verdict: GOOD
To do: Nothing

pts/x264-2.3.2

x264 2018-02-05:

Test: H.264 Video Encoding Frames Per Second > Higher Is Better

    Talos II POWER9 Dual 4-Core    29.14
    Xeon E3-1280 v5                42.24
    Talos II 2 x 22c POWER9        43.72
    Talos II POWER9 Dual 18-Core   51.22
    AMD EPYC 7551                 101.52
    2 x Xeon Gold 6138            125.21 
    EPYC 7601                     126.39 

Phoronix comment: “The x264 video encoding program is one test showing it’s not too well optimized right now for POWER9”. And a bit later: “Similar to our first POWER9 benchmarking session back in April, the x264 performance and more broadly the multimedia CPU performance on POWER9 still could be much better optimized. The POWER9 performance was quite low compared to the x86_64 competition.”

Not an easy project to improve.

Multi-threaded: YES
Verdict: BAD
To do: That would require huge efforts ...

system/octave-benchmark-1.0.0

GNU Octave Benchmark 4.4.1:

    2 x Xeon Gold 6138        23.47
    AMD EPYC 7551             22.66
    AMD EPYC 7601             20.92
    Threadripper 2990WX       16.78 
    Talos II 2 x 22c POWER9   14.92

Phoronix comment: “With the GNU Octave software as a MATLAB performance, the Talos II squeezed in front of the Threadripper systems for this single-core test.”

Multi-threaded: YES
Verdict: GOOD
To do:

system/blender-1.0.2

Blender 2.79:

Test: Blend File: Classroom - Compute: CPU-Only

    Xeon E3-1280 v5                1656.82
    Talos II POWER9 Dual 4-Core    1391.41
    Talos II POWER9 Dual 18-Core    829.10
    EPYC 7601                       504.13
    2 x Xeon Gold 6138              415.84 

Test: Blend File: Pabellon Barcelona - Compute: CPU-Only

    Xeon E3-1280 v5                1745.08
    Talos II POWER9 Dual 4-Core    3057.45
    Talos II POWER9 Dual 18-Core   1354.60
    EPYC 7601                       972.62
    2 x Xeon Gold 6138              787.52 

Phoronix comment: “The Blender 3D modeling performance on the CPU also leaves more room for optimization on the POWER9 front.”

“failed to use more than 15 threads, even when “-t 128” was added to the Blender command line”

Multi-threaded: YES
Verdict: BAD
To do: Check if the project is fixed to use more than 16 threads

Conclusion

That was very much work for me … taking the risk to get a conclusion that it was almost the same as in the very first Phoronix article. Anyway, that allowed me to dive into these topics and to reconnect to my too long abandoned Talos 2. Thanks to the synthesis below, that will give an orientation to the next step.

Benchmark MT Verdict To do
pts/build-gcc-1.0.0 YES AVERAGE  
pts/build-llvm-1.1.0 YES GOOD  
pts/compress-7zip-1.7.1 YES GOOD Check optim options and results in recent version of the benchmark
pts/compress-zstd-1.0.0 NO GOOD  
pts/c-ray-1.1.1 YES GOOD  
pts/encode-flac-1.6.0 NO BAD Check suggested improvements and measure their benefit
pts/encode-mp3-1.7.0 NO BAD Check proposed improvements integrated in the official project
pts/openssl-1.11.0 YES BAD  
pts/parboil-1.1.2 ? AVERAGE Focus OPEN MP CUTCP that does not seem to be optimized for POWER9
pts/phpbench-1.1.5 NO AVERAGE Identify the source of the problem
pts/povray-1.2.1 YES BAD Investigate, profile …
pts/primesieve-1.4.1 YES BAD Check if changes proposed have a positive and measurable impact
pts/pybench-1.1.2 NO VERY BAD Investigate
pts/osbench-1.0.1 N/A GOOD Check optim options, they improved greatly the test Create Processes
pts/pgbench-1.8.4 YES GOOD Check optim options, that provide a boost
pts/redis-1.1.0 NO BAD Check CPU activity
pts/rodinia-1.2.2 ? AVERAGE Try a recent version and check optimization options
pts/rust-prime-1.0.0 YES GOOD Run Mandelbrot that behaves poorly, in addition to Prime benchmark
pts/scikit-learn-1.0.1 ? BAD Run a more recent version of the benchmark and analyze
pts/stockfish-1.1.1 YES AVERAGE To confirm heterogeneous results on a recent version and optim options
pts/tinymembench-1.0.1 NO GOOD  
pts/x264-2.3.2 YES BAD That would require huge efforts …
system/octave-benchmark-1.0.0 YES GOOD  
system/blender-1.0.2 YES BAD Check if the project is fixed to use more than 16 threads

With this panel of benchmarks, we get 9 GOOD, 5 AVERAGE and 10 BAD. We expect better as our machines are very capable. We will see that in another article with results on recent versions of these benchmarks and possibly more.

Future actions will depend on the verdict:

  • GOOD, when Talos 2 is on the first place, I will just celebrate and check that there is no regression
  • AVERAGE, when Talos 2 is in the same area of performance than AMD EPYC machines, I will at least check that there is no regression
  • BAD, when Talos 2 does not show performance it should, I will check that proposed changes have been applied and investigate if necessary, to identify possible remaining problems

I will also check the impact of SMT settings and compilation options, seeing that in some cases, the Phoronix reports mention some flags (CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native for x86_64 platforms and CXXFLAGS=-O3 -mtune=power9 -mcpu=power9 CFLAGS=-O3 -mtune=power9 -mcpu=power9 for ours), but I’m not sure if it’s always the case.

Annexes

References of Phoronix articles related to Talos 2: