It’s been a turbulent 12 months for AMD. Since the company launched Llano, its first mainstream “Fusion” part, it has replaced its CEO, brought in multiple new executives, debuted a disappointing architecture,delayed its next-generation Brazos parts by a full year, and outlined a comprehensive vision of the future that de-emphasizes cutting-edge process node transitions in favor of re-useable IP blocks that can be shared between multiple SoCs (system-on-a-chip).
When it launched last year, Bulldozer ran hot, scaled poorly, and was less efficient than its predecessor. When it came to building Llano’s successor, AMD clearly had its work cut out for it. We won’t belabor that point further; if you want more information, check our previous coverage.
We’re guessing that Trinity (the code name) is a nod to the fact that Trinity (the APU) contains a new CPU, new GPU, and new interconnect structure. There’s also a handy reference back to the first atomic bomb test in July, 1945 (this is where Oppenheimer famously said “I am become death, destroyer of worlds” and, of course, to the Holy Trinity. These are both big shoes to fill, so let’s tackle what they’ve come up with, starting with the CPU core. We’ll only be addressing the CPU and GPU here, but a discussion of the interconnect is in the very near future.
AMD claims to have done a great deal of low-level optimization to clean up Bulldozer’s mess. Piledriver’s branch prediction is better, its integer and FPU scheduling makes better use of shared resources, and larger L1 TLBs (Translation Lookaside Buffers) reduce the chance that the CPU will “miss” when searching translated virtual addresses.
Piledriver also adds support for two additional instructions, FMA3 (Fused Multiply-Add) and F16C. FMA3 is a different form of the FMA4 instruction Bulldozer supported. AMD has beaten Intel to the punch on this one; Intel’s own FMA3 support will debut in 2013, with Haswell. Both instructions can improve code execution efficiency by fusing operations and performing them in a single clock cycle, but neither FMA3 or FMA4 is expected to provide significant speed boosts. F16C is a method for converting and storing 32-bit floating point values using 16-bits. AMD might make use of this for the GPU (GPUs have a native 16-bit floating point shader capability), but that’s an unknown as well.
Nearly all of the listed changes are small in and of themselves, but combined, they could make a significant difference in the chip’s overall efficiency. I’m particularly curious about the unspecified “L2 efficiency improvements,” having long suspected that high cache latencies fundamentally sabotaged Bulldozer last fall.
One major feature Piledriver doesn’t change is the number of instructions decoded per clock cycle (4 per module, for a total of eight in a dual-module / quad-core design. That’s significantly fewer than Llano (12 per quad-core) or Sandy Bridge (16). With Bulldozer, it was never clear how much of a role this played in the chip’s lower-than-expected performance.
Without a test system of our own, we’re forced to rely on AMD’s own published numbers and reviews from other sites. To call AMD’s CPU performance data “cherry picked” is a drastic understatement, virtually every performance score the company provided is GPU-centric or leverages the GPU heavily. The only non-GPU performance data AMD released was in PCMark Vantage and PCMark 7. Those aren’t bad choices for total system productivity — while they don’t rely entirely on the CPU, they’re probably much more relevant to the end-users AMD is courting with these new designs.
Unfortunately, even here, data is extremely limited. AMD’s vaunted claim that Trinity delivers 2x the performance/watt of Llano is based solely on PCMark Vantage’s overall score. AMD claims that a dual-core, 17W Trinity at 2.6GHz essentially ties a quad-core Llano at 2.3GHz, but states elsewhere that an A10-4600M (Trinity, quad-core, 3.2GHz) is only 28.5% faster than an A8-3500M (Llano, quad-core, 2.4GHz). Incidentally, the company’s claim to deliver 28.5% higher x86 performance is based solely on the latter figure.
With dubious and rather contradictory data , our best guess is that Trinity improves on Llano’s overall positioning and offers equivalent performance, clock-for-clock. This will translate into better CPU performance in some SKUs. More important, from AMD’s perspective, was the need to bring Bulldozer’s power consumption down to something that would fit into mainstream and “ultra-light” form factors. Trinity accomplishes this. It won’t compete with Ivy Bridge — matching Llano means it won’t even compete particularly well with Sandy Bridge in CPU-centric workloads — but AMD is pricing these parts into markets well below Intel’s target for IVB and ultrabooks.
AMD’s Cayman – Trinity’s Linchpin
In most literature, AMD refers to Trinity’s GPU as a “Northern Islands” class part without bothering to explain whether it’s based on Barts (a modest step forward from the old 5000 series) or Cayman (the high-end GPU that AMD confined to the 6900 series). Officially, it’s branding the new core as part of the Radeon 7000 family, which isn’t accurate either.
Trying to make sense of AMD’s branding has become murky at best. The 7000M brand is now polluted with three types of GPUs — 40nm rebrands of 6000 parts which arebased on Barts/Turks, 28nm parts based on GCN (Graphics Core Next), and 32nm APUs based on Cayman. Trying to hash out which GPUs are the best match for the APU is a task better left to saints and madmen then poor journalists; we’ll leave the topic of paired graphics for another day.
Unlike Llano, whose integrated GPU was nearly identical to AMD’s discrete “Redwood” part, there’s no easy point of comparison for Cayman. The new GPU features an array of six SIMD clusters of 64 cores each; Llano had five SIMDs of 80 cores. The new GPU is slightly smaller than its predecessor, with 384 cores instead of 400, but one of the features AMD introduced with Cayman was a VLIW4 architecture that was significantly more efficient than the VLIW5 designs that preceeded it. AMD has also increased the number of texture units, to a maximum of 24, up from Llano’s 20. The total number of ROPs remains the same, at eight.
When it comes to game performance, AMD is more willing to share the goodies.
All game tests run at 1920×1080
The company has a bad habit of switching back and forth between desktop parts and laptop parts, and there’s no Ivy Bridge comparison data. Still, things look good. This was very nearly a given; Cayman made a number of efficiency improvements that were logical fits for Trinity, and all of AMD’s APU demos in the run up to launch focused on the GPU.
Trinity moves AMD forward, buys time for 2013 launches
After talking with AMD and reading over the company’s presentations, our educated bet is that Trinity is a qualified success. Piledriver may not move the bar very much on the CPU side of the equation, but between power consumption, temperature, and performance, AMD had to fix the first two to have any chance of launching a mobile part based on the architecture. If Piledriver can match Llano clock-for-clock (or within the same TDP), that’s still significantly more than BD managed when compared to Istanbul/Thuban.
Will it compete effectively against Ivy Bridge? No. But it was never intended to. AMD’s goal with Trinity is to position the CPU as a successor to Llano, a further fulfillment of the company’s “Fusion” vision, and as an anchor in the popular $400-$700 segment. Based on what we’ve seen and a few educated guesses, it’s got a fair chance of pulling it off — short term.
No matter how successful Trinity is in 2012, it doesn’t change the fact that AMD has no traction in tablets or sub-10W designs at a time when companies like Qualcomm have given notice that they intend to move into PCs. That’s fine for the moment, because Windows 8 won’t drop until the latter half of the year, and it’ll take 4-6 months past that point for some of the traditional smartphone/tablet players to make moves into the low-end PC space.
AMD needs a quick jump to 28nm Brazos and a fast refresh on Trinity. In theory, the new chips — Kabini and Kaveri — will be ready in 2013. The company has yet to put a quarter on that number, or to even comment on where the parts are being made. Trinity may be a good beginning, but it’s only that; AMD has a long way to go when it comes to carving out its own territory in between Intel at the top of the market and an onslaught of ARM-based hardware at the bottom.