Game Benchmarks stub

FFXV Hyperthreading & SMT On vs. Off Benchmarks

Posted on February 7, 2018

Despite having just called the FFXV benchmark “useless” and “misleading,” we did still have some data left over that we wanted to publish before moving on. We were in the middle of benchmarking all of our CPUs when discovering the game’s two separate culling and LOD issues (which Square Enix has addressed and is fixing), and ended up stopping all tests upon that discovery. That said, we still had some interesting data collected on SMT and Hyperthreading, and we wanted to publish that before shelving the game for launch.

We started testing with the R7 1700 and i7-8700K a few days ago, looking at numThreads=X settings in command line to search for performance deltas. Preliminary testing revealed that these settings provided performance uplift to a point of 8 threads, beyond or under which we observed diminishing returns.

Just before we begin, a few quick notes: First, consider this a reminder that the buffalo cause (on high settings) a HairWorks hit even when neither buffalo nor HairWorks objects are on-screen. This means that the game is globally rendering the hair, and that culling is not functioning properly. It is irrelevant whether Ansel is used to inspect this, as the performance data illustrates it as fact (and, besides, Square Enix has acknowledged the bug).

Secondly, we had trouble creating a GPU bottleneck in this testing, which means we had trouble actually using FFXV as a CPU benchmark. Culling and LOD issues notwithstanding, the benchmark utility stands more as a GPU tester than anything. Even at 1080p and with Low settings, we were hitting a GPU bottleneck on the 1080 Ti, and we also encountered this bottleneck regularly at 1080p/Medium.

Let’s just go through the data somewhat quickly. The video above contains a few extra pre-test points, alongside screenshots of called and drawn meshes, despite those meshes not appearing on-camera during the captured frame. This further validates our proven theory that objects are being rendered out-of-scene, even when Ansel is off.

1080p/Medium FFXV CPU Benchmark

ffxv cpu benchmark 1080p medium

We can start this piece by illustrating just how easily this game bottlenecks on the GPU. This is at 1080p with Medium settings, and we’re clearly hitting a bottleneck at around 137FPS AVG. The GPU is a GTX 1080 Ti FTW3 – among the best gaming cards you can get right now – and 1080p at Medium settings is still too much to be a viable CPU benchmark. With these settings, we only start seeing real divergence from the high-end parts when we step down to $100 R3 parts. That’s not a good CPU benchmark.

This comes back to what we found with the FFXV benchmark’s silent rendering of nearly everything on the map. It’s not just GameWorks – it’s other 3D objects.

ffxv fishing rendered

Here’s an example: This is a frame we analyzed from the game, where it’s the main character on a fishing dock. Even during this frame, the game is still rendering cars that aren’t nearby, rendering item chests that aren’t in frame or nearby, and rendering large portions of highway that are located miles away.

ffxv chest

highway ffxv

This is also true for iguanas and other animals, which render even when we’re at the start of the benchmark:

ffxv driving rendered

iguana

Clearly, this is not a good benchmark utility. These things are loading down the GPU, despite being a relatively lightweight bench scene.

1080p/Low FFXV CPU Benchmark

ffxv cpu benchmark 1080p low

Anyway, after stepping down to 1080p/Low, we can finally start to plot some actual CPU performance differences. We’re still bottlenecking at the high-end, but not as flatly as before.

With these settings, the Intel i7-8700K demonstrates our point of GPU limitations: Overclocked to 5GHz or stock, we’re still bumping up against a rough 174FPS chokepoint. GPU utilization is nearly 100% at this point, further illustrating the limitations of usefulness for this benchmark.

Anyway, a shining note here is that the game does seem to like threads – but only up to a point. With AMD’s R7 1700, we noticed that performance improved with SMT disabled. We saw performance uplift of 5.1% from the stock R7 1700 to the R7 1700 with SMT off. For the R5 1600X, we observed a 4% performance uplift by disabling SMT. Note also that frametime consistency is not hugely impacted. We are technically plotting a downtrend in low-end frametime consistency, but we can’t confidently state whether this is statistically significant or accurate, as the benchmark is too inconsistent to establish confidence in that 0.1% low swing. This is further illustrated by the opposite behavior on the R7 1700, where we still saw AVG FPS performance uplift, but also saw 0.1% low performance uplift.

7 ffxv utilization fps

Relating this back to our previous research with the numThreads commands (we’ll put this on screen above), we believe that the game encounters a point of diminishing returns at around 8 threads. Up until that point, more threads is better; after that point, we either lose performance from inefficient load-balancing across all threads, or we stagnate in performance. This leads to a greater discussion on CPU utilization – for which we also have charts from previous research – because lower utilization is not, in fact, a good thing. There is a misconception that a game utilizing minimal amounts of the CPU means that the CPU has more headroom for background processing. In reality, what this means is that we’re load-balancing across all threads inefficiently, and losing performance as a result. With any component, you want it to be fully engaged – or close to it – in any task. The closer to 100%, the better, because that means you’re able to leverage the component to its fullest potential, not wasting any performance. If background operations exist, there should be some level of native load-balancing to distribute resources as needed.

Anyway, back to the FFXV 1080p/Low chart, the 7700K and the R5 1400 both demonstrate where disabling hyperthreading or SMT results in a net-negative. In these instances, the R5 1400 CPU sees a deficit in AVG FPS and frametime consistency. The 7700K achieves an inappreciably different AVG FPS, but has halved 0.1% lows. We attribute this to a 4-thread limitation on both devices, which the game really seems to not like. At the low-end, highlighting the R3 CPU, R5 1400 CPU, and 7700K with SMT0, the game does not like working with 4 threads. At the high-end, highlighting the R7 1700 and R5 1600X, the extra threads should be toggled down to a count of 8 for peak performance. We are uncertain about the 8700K’s performance behaviors without hyperthreading, as we cannot reduce GPU load enough to limit the CPU. Limiting the CPU would require a 480p or 720p resolution, which enters realm of a strict academic study and exits usefulness.

7 ffxv cpu thread utilization 1700

Looking back at our CPU utilization chart from a few days ago, we can show again that using the numThreads command to limit thread utilization does have meaningful impact. The result is improved or equal performance with half of the R7 1700’s threads. Interestingly, despite this game seemingly hitting a point of diminishing returns at 8 threads, it will still attempt to use every thread you give it – just in a less efficient way.

Also interesting, this sort of limitation would indicate an IPC bias, but the FFXV benchmark is still favoring an overclocked R7 1700 CPU to an overclocked i7-7700K CPU. This unique behavior should be interesting to monitor upon the game’s launch. The main takeaway here is that, even with a supposed point of diminishing returns with thread utilization, AMD is not carrying a performance deficit. The Ryzen 7 CPUs are performing well when compared to the 7700K, but should be limited to 8 threads for peak performance (rather than 16). Fortunately, this can be done via game launch options, so users may not need to reboot or manually toggle SMT, which is clearly not a user-friendly mode of gaining performance. We hope that this numThreads feature remains in the launch version for easy use, as the benefit to AMD is clear. The benefit to Intel is non-existent on the 7700K -- we see a 0.1% low performance deficit at 4T -- but could be present on the 8700K. Unfortunately, to test this on the 8700K, we would either need functional SLI or a very low resolution, like 480p/720p. This is the point at which we exit usefulness and enter the area of academic study.

That largely wraps our FFXV testing for now. We’ll revisit at launch.

Editorial, Test Lead: Steve Burke
Testing: Patrick Lathan
Video: Andrew Coleman