Late last week, Jen-Hsun Huang sent a letter to Nvidia employees, congratulating them on successfully launching the highly acclaimed GeForce GTX 680. After discussing how Nvidia changed its entire approach to GPU design to create the new GK104, Jen-Hsun writes: “Today is just the beginning of Kepler. Because of its super energy-efficient architecture, we will extend GPUs into datacenters, to super thin notebooks, tosuperphones.” (Emphasis added — Nvidia calls Tegra-powered products “super”, as in super phones, super tablets, etc, presumably because it believes you’ll be more inclined to buy one if you associate it with a red-booted man in blue spandex.)
This has touched off quite a bit of speculation concerning Nvidia’s Tegra 4, codenamed Wayne, including assertions that Nvidia’s next-gen SoC will use a Kepler-derived graphics core. That’s probably true, but the implications are considerably wider than a simple boost to the chip’s graphics performance. Tegra 4, also known as T40, could very well be a fundamental game-changer for Nvidia and the most important Tegra product to date.
Improved game performance
The GPU that powers Tegra 2 and Tegra 3 has a fixed number of pixel and vertex shaders and is much more closely related to GeForce 7-era products than the Unified Shader Architecture Nvidia debuted with the G80 (GeForce 8). When Nvidia describes T2 & 3 as “fully programmable,” it’s true — but it’s not at all the same as being DirectCompute/CUDA/OpenCL-compatible. Current Tegra products are capable of running complex shader programs, but not the general-purpose code that makes things like PhysX or GPGPU calculations possible.
Modern GPUs adjust their execution resources depending on workload
GPUs with a Unified Shader Architecture (all Nvidia products from G80 onwards) have two advantages over their fixed-function cousins. First, they’re more efficient. A fixed-function GPU’s performance can vary considerably from game to game depending on whether a title emphasizes pixel shading or model geometry; this is quite visible when comparing performance between Tegra 2/3 and the SGX 544. A Kepler-based GPU would be much more flexible, able to allocate its execution resources to process either workload. This can indirectly lead to decreased power usage — a wide array of more efficient stream processors doesn’t necessarily need to run at nearly as high a clockspeed as a fixed-function chip.
Second, and arguably more important, is their ability to handle functions that would normally be processed on the CPU. This is where we expect T40 to come into its own.
A second chance for Hardware PhysX
As a software SDK for physics calculation, Nvidia’s PhysX solution has been quite successful; it’s used in nearly 400 games across consoles and PCs. Nvidia’s attempts to encourage game developers to include support for so-called hardware PhysX — the term refers to using Nvidia GPUs for significantly enhanced physics effects, cloth simulation, and particle interactions — have largely come to naught. Out of the 374 games listed as shipped or in development that are confirmed to use PhysX at PhysXinfo.com, just 19 of them use hardware PhysX. (PhysXinfo’s list of hardware PhysX games shows several cancelled games as still being “in development.”)
Hardware PhysX could fare much better on mobile platforms if Nvidia can show that using the GPU to offload physics calculations leads to better performance, improved power efficiency, and allows for more advanced physics modeling. Many of the most popular mobile games, from Angry Birds to Cut the Rope are fundamentally physics games, but they often rely on relatively crude models.
The challenges here will be on the development side. The best thing Nvidia could do to spur hardware PhysX adoption would be to pay for Tegra-specific adaptations of the most popular physics-based games of today, as well as investing in their own specific titles or in upcoming games. There will always be developers who eschew hardware PhysX in favor of a simplified software-based solution that can run on every mobile device, but the cost and complexity of integrating hardware PhysX into a mobile game is a fraction of applying the same technology to a PC title.
Benefits beyond gaming
Tegra 4 also gives Nvidia a fresh platform with which to bundle its Icera modem and DirectTouch products. Its Icera product family is a set of softmodems for LTE and WiFi, while DirectTouch is technology that “improves touch responsiveness by offloading some of the touch processing that is typically performed by touch controllers and touch modules onto the Nvidia Tegra 3 application processor [Companion Core]. The architecture also simplifies the implementation of touch based hardware and user interfaces, requiring less power while delivering more scalable performance.”
Speaking of the Companion Core, Tegra 4 will almost certainly include an enhanced version of the architecture. With Tegra 3, Nvidia chose to build an array of five Cortex-A9 cores — in Tegra 4, the company could opt to adopt the sort of hybrid strategy ARM introduced with big.LITTLE (Cortex-A15 cores for heavy lifting with a Cortex-A7 chip for low power) or use a hybrid A9/A15 arrangement.
A Kepler-based GPU will also likely improve Tegra 4′s video encode/decode capabilities as compared to Tegra 3. This is one area where the T2/T3 family is significantly more advanced than the GeForce 7-era hardware they resemble in other respects, but a GK104-derived chip could improve the situation further by either increasing power efficiency, supporting a wider variety of formats and standards, or providing an increased number of post-processing options.
An opportunity for CUDA
Nvidia has always marketed CUDA as a capability that could exponentially increase performance, but the company’s efforts in this area have mostly been confined to scientific computing or high-end industrial applications. The problem has been two-fold. First, x86 CPUs are both ubiquitous in desktop/laptop computing and extremely capable, and second, the performance/watt advantage of using a GPU for computation instead of a CPU was a hard sell at the consumer level, even in laptops. Intel’s dominance of the graphics industry also played a factor; Nvidia started talking about CUDA back in 2007, Intel’s first OpenCL-capable GPU is Ivy Bridge in 2012.
In mobile phones, it’s a different story. Here, battles are fought for milliwatts of power and the CPU ecosystem is still evolving rapidly. ARM’s Cortex family has improved considerably since the Cortex-A8 debuted in 2008, but there’s more than enough room for a complementary array of streaming cores that handle graphics as well as other functions. Developers are used to — indeed, encouraged — to target multiple families of ARM products and manufacturers often develop their own specialized applications to highlight particular hardware capabilities.
OpenCL could throw a wrench in this idea, but it’s not an insurmountable obstacle. Since NV GPUs fully support OpenCL, Tegra 4 devices could still take advantage of it, while developer assistance and joint marketing funds can both count for a lot.
The reason we’re willing to advance all of the above without public confirmation from Nvidia is that the predictions make logical sense based on where the company is. Tegra 3 is a good part as far as it goes, but Qualcomm has stolen a great deal of thunder with its Snapdragon S4. Nvidia can’t count on the Cortex-A15 as a differentiator, given that the likes of Samsung and Texas Instruments will have their own A15-based products out by the end of the year as well. Graphics are the logical area for Nvidia to emphasize; a smartphone GPU based on Kepler would be far more advanced than anything currently offered by Imagination Technologies’ PowerVR, ARM’s Mali, or the aging Adreno core.
Granted, more advanced doesn’t always mean higher performance, and NV will have to balance GPU complexity and the number of cores against die size and power consumption. That’s the one enormous caveat to what we’ve written here — Tegra 4′s attractiveness will be critically dependent on how well Nvidia balances the device’s hardware and software ecosystem. Strong execution of a solid architecture will make T40 competitive; GPU-powered utilities, better battery life/touch sensitivity, and possibly customized games that take advantage of hardware physics processing are what will make it great.
That’s a tall order for any company to nail with the first iteration of an architecture, but Nvidia has a a lot of irons in the fire when it comes to mobile development. We expect the company will attempt to position Tegra 4 as a chip that can leverage its GPU in ways that are beyond its competition. If the hardware delivers and the software support is in place, it could give Nvidia a potent weapon against OMAP5, Exynos, and future Medfield products from Intel.