Tag Archive: core architecture


Surya R Praveen Tegra 3 - Companion Core

When it comes to product branding, Nvidia’s track record is actually pretty good. “GeForce” stood for Geometry Force, a moniker that made sense given that it debuted with support for Quake III‘s curved surfaces. “Tegra” is formed from the word “integral,” “Tesla” is a reference to one of the great scientists of the 20th century, and “Ion” was a great name for a product designed around Intel’s Atom (ions are atoms with a positive or negative charge). The name “Nvidia” is itself a reference to the Roman goddess of envy and jealousy, Invidia. The color most commonly associated with envy and jealousy? Green!

Nvidia, in other words, is pretty darn smart when it comes to branding, brand associations, and clever plays on words, which is why the company’s new name for Tegra 3′s Companion Core is abnormally wretched. According to a company blog post, Tegra 3′s unique quad-core + Companion Core will henceforth be known as… the 4-PLUS-1 quad-core architecture. The post offers the following explanation: “The reason is that, the more popular this technology became, the more our customers wanted a name for it that’s unique and descriptive. A name they could put on a box or a store sign that immediately represents its value.”

Surya R Praveen Companion Core

The problem with 4-Plus-1 is that equations make terrible brand names. The first thing anyone over the age of four is likely to think after reading 4-plus-1 is “five.” Nvidia’s next descriptive phrase? “Quad-core architecture.” As a technical description of how the Companion Core functions, 4-plus-1 isn’t bad. As a descriptive phrase that “immediately represents its value,” it’s terrible. The strongest linguistic and pop-culture references for 4-plus-1 are linked to two Christian bands (one real, one formed by Cartman on South Park), a cancelled British sitcom, Google’s +1 button, and, inevitably, Seven of Nine.

Nvidia seems to be aware of this problem, given that the blog entry actually finishes with a reference to Prince, stating that “while Prince Rogers Nelson initially changed his stage name to Prince and then to TAFKAP (the Artist Formerly Known as Prince) when he took up a symbol combining elements of male and female symbology, 4-PLUS-1 is here to stay.”

Surya R Praveen Prince SymbolWay to take the wrong lesson. Prince changed his name to unpronounceable symbol because he wanted to emancipate himself from Prince, which he saw as being owned by Warner Brothers. This backfired stupendously because it left the world with no acceptable alternative on what to call him. “The dude we used to call Prince” wasn’t actually what Prince wanted to be called — it was the best anyone could come up with.

Nvidia’s mistake, in this case, was to focus on conveying what the technology was rather than what it did. The Companion Core is interesting, not because it means the chip has a PentaQuad of cores, but because it allows Tegra 3 to reduce its power consumption below what it could otherwise reach. Nvidia would’ve done better to wrap the Companion Core concept into a family and call it Low Power Drive, or brand it as “Part of the Optimus family.”

The company claims that 4-Plus-1 is here to stay. We hope not. Companion Core might have been imprecise, but it actually summarized the situation a whole lot more effectively.

Source

Surya R Praveen CPU wafer

It’s been nearly eight years since Intel canceled Tejas and announced its plans for a new multi-core architecture. The press wasted little time in declaring conventional CPU scaling dead — and while the media has a tendency to bury products, trends, and occasionally people well before their expiration date, this is one declaration that’s stood the test of time.

To understand the magnitude of what happened in 2004 it may help to consult the following chart. It shows transistor counts, clock speeds, power consumption, and instruction-level parallelism (ILP). The doubling of transistor counts every two years is known as Moore’s law, but over time, assumptions about performance and power consumption were also made and shown to advance along similar lines. Moore got all the credit, but he wasn’t the only visionary at work. For decades, microprocessors followed what’s known as Dennard scaling. Dennard predicted that oxide thickness, transistor length, and transistor width could all be scaled by a constant factor. Dennard scaling is what gave Moore’s law its teeth; it’s the reason the general-purpose microprocessor was able to overtake and dominate other types of computers.

Surya R Praveen CPU ScalingCPU scaling showing transistor density, power consumption, and efficiency. Chart originally from The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

 

The original 8086 drew ~1.84W and the P3 1GHz drew 33W, meaning that CPU power consumption increased by 17.9x while CPU frequency improved by 125x. Note that this doesn’t include the other advances that occurred over the same time period, such as the adoption of L1/L2 caches, the invention of out-of-order execution, or the use of superscaling and pipelining to improve processor efficiency. It’s for this reason that the 1990s are sometimes referred to as the golden age of scaling. This expanded version of Moore’s law held true into the mid-2000s, at which point the power consumption and clock speed improvements collapsed. The problem at 90nm was that transistor gates became too thin to prevent current from leaking out into the substrate.

Intel and other semiconductor manufacturers have fought back with innovations like strained silicon, hi-k metal gate, FinFET, and FD-SOI — but none of these has re-enabled anything like the scaling we once enjoyed. From 2007 to 2011, maximum CPU clock speed (with Turbo Mode enabled) rose from 2.93GHz to 3.9GHz, an increase of 33%. From 1994 to 1998, CPU clock speeds rose by 300%.

The multi-core swerve

For the past seven years, Intel and AMD have emphasized multi-core CPUs as the answer to scaling system performance, but there are multiple reasons to think the trend towards rising core counts is largely over. First and foremost, there’s the fact that adding more CPU cores never results in perfect scaling. In any parallelized program, performance is ultimately limited by the amount of serial code (code that can only be executed on one processor). This is known as Amdahl’s law. Other factors, such as the difficulty of maintaining concurrency across a large number of cores, also limit the practical scaling of multi-core solutions.

Surya R Praveen Amdahl's Law

AMD’s Bulldozer is a further example of how bolting more cores together can result in a slower end product. Bulldozer was designed to share logic and caches in order to reduce die size and allow for more cores per processor, but the chip’s power consumption badly limits its clock speed while slow caches hamstring instructions per cycle (IPC). Even if Bulldozer had been a significantly better chip, it wouldn’t change the long-term trend towards diminishing marginal returns. The more cores per die, the lower the chip’s overall clock speed. This leaves the CPU ever more reliant on parallelism to extract acceptable performance. AMD isn’t the only company to run into this problem; Oracle’s new T4 processor is the first Niagara-class chip to focus on improving single-thread performance rather than pushing up the total number of threads per CPU.

Surya R Praveen Rage Jobs

The difficulty of software optimization is a further reason why adding more CPU cores doesn’t help much. Game developers have made progress in using multi-core systems, but the rate of advance has been slow. Games like Rage and Battlefield 3 — two high-profile titles that use multiple cores — both utilized new engines designed from the ground-up with multi-core scaling as a primary goal.

The bottom line is that its been easier for Intel and AMD to add cores than it is for software to take advantage of them. Seven years after the multi-core era began, it’s already morphing into something different.

The rise (and limit) of Many-Core

In this context, we’re using the term “many-core” to refer to a wide range of programmable hardware. GPUs from AMD and Nvidia are both “many-core” products, as are chips from companies like Tilera. Intel’s Knights Corner is a many-core chip.

The death of conventional scaling has sparked a sharp increase in the number of companies researching various types of specialized CPU cores. Prior to that point, general-purpose CPU architectures, exemplified by Intel’s x86, had eaten through the high-end domains of add-in boards and co-processors at a ferocious rate. Once that trend slammed into the brick wall of physics, more specialist architectures began to appear.

Surya R Praveen Many-core ScalingNote: Three exclamation points doesn’t actually mean anything, despite the fondest wishes of AMD’s marketing department

Despite what some companies like to claim, specialized many-core chips don’t “break” Moore’s law in any way and are not exempt from the realities of semiconductor manufacturing. What they offer is a tradeoff — a less general, more specialized architecture that’s capable of superior performance on a narrower range of problems. They’re also less encumbered by socket power constraints — Intel’s CPUs top out at 140W TDP; Nvidia’s upper-range GPUs are in the 250W range.

Intel’s upcoming Many Integrated Core (MIC) architecture is partly an attempt to capitalize on the benefits of having a separate interface and giant PCB for specialized, ultra-parallel data crunching. AMD, meanwhile, has focused on consumer-side applications and the integration of CPU and GPU via what it calls Graphics Core Next. Regardless of market segmentation, all three companies are talking about integrating specialized co-processors that excel at specific tasks, one of which happens to be graphics.

Surya R Praveen AMD's many-core strategy

Unfortunately, this isn’t a solution. Incorporating a specialized many-core processor on-die or relying on a discrete solution to boost performance is a bid to improve efficiency per watt, but it does nothing to address the underlying problem that transistors can no longer be counted on to scale the way they used to. The fact that transistor density continues to scale while power consumption and clock speed do not has given rise to a new term: dark silicon. It refers to the percentage of silicon on a processor that can’t be powered up simultaneously without breaching the chip’s TDP.

A recent report in dark silicon and the future of multi-core devices describes the future in stark terms. The researchers considered both transistor scaling as forecast by the International Technology Roadmap for Semiconductors (ITRS) and by a more conservative amount; they factored in the use of APU-style combinations, the rise of so-called “wimpy” cores, and the future scaling of general-purpose multiprocessors. They concluded:

Regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community… Given the low performance returns… adding more cores will not provide sufficient benefit to justify continued process scaling. Given the time-frame of this problem and its scale, radical or even incremental ideas simply cannot be developed along typical academic research and industry product cycles… A new driver of transistor utility must be found, or the economics of process scaling will break and Moore’s Law will end well before we hit final manufacturing limits

Over the next few years scaling will continue to slowly improve. Intel will likely meander up to 6-8 cores for mainstream desktop users at some point, quad-cores will become standard at every product level, and we’ll see much tighter integration of CPU and GPU. Past that, it’s unclear what happens next. The gap between present-day systems and DARPA’s exascale computing initiative will diminish only marginally with each successive node; there’s no clear understanding of how — or if — classic Dennard scaling can be re-initiated.

This is part one of a two-part story. Part two will deal with how Intel is addressing the problem through what it calls the “More than Moore” approach and its impact on the mobile market.

Source