Modern CPU scaling has largely flatlined, thanks to a combination of scaling challenges, intrinsic limits to silicon performance, and a general shift in the market towards lower power devices. Chip designers who want to continue to push the limits of what computers are capable of need to look to specialized architectures and custom designs. That was the message at the Linley Processor Forum according to Linley Gwennap, principle analyst at the Linley Group.
Dean Takahashi covered the event for VentureBeat and reports that Linley’s comments dovetail with the industry trends we’ve seen emerging over the last few years. Companies like Nvidia are marketing their chips as ideal for deep learning and AI calculations. Intel purchased the FPGA manufacturer Altera and has built its own specialty coprocessor hardware, dubbed the Xeon Phi. From IBM’s TrueNorth to Qualcomm’s heterogeneous compute capabilities in the Snapdragon 820, a number of companies are attacking performance problems in a wide variety of ways. Self-driving cars, drone deliveries, and even chatbot AI could transform computing as we know it today — but not without fundamentally changing how we think about computers.
For most of the past 40 years, the story of personal computing was, well, personal. Desktops and laptops were positioned as devices that allowed the customer to rip music, explore the Internet, write, work, game, or stay in touch with family and friends.
In the beginning, the hardware that enabled many of these functions was mounted in separate expansion slots or installed in co-processor sockets on the motherboard. Over time, these capabilities were either integrated into the motherboard and CPU socket. Channel I/O and FPU coprocessors went away, as did most independent sound cards. Dedicated GPUs have hung on thanks to intrinsic thermal issues, but on-die graphics solutions from AMD and Intel have improved every single year.
There was a simple, virtuous cycle at work: The computer industry collectively delivered faster hardware, consumers bought it, and developers took advantage of it. The initial end to CPU clock scaling went nearly unnoticed, thanks to the widespread availability (and superiority) of dual-and-quad-core chips compared with their single-core predecessors. This cycle has been repeated in smartphones and tablets, but we’ve already seen signs that the rate of improvement is slowing.
To demonstrate, here’s a graph of Apple iPhone single-core CPU performance in Geekbench 2 and Geekbench 3. We split the data into two sections to capture how the various iDevices compared over time, and to capture the shift from GB2 to GB3 when the latter became available. We’re using Apple because that company has spent more resources on improving its single-core performance than the various Android SoC developers, which tend to prioritize higher core counts.
From 2008 to 2012, single-core CPU benchmark scores in Geekbench 2 increased from 141 on the iPhone 3G to 1602 on the iPhone 5. That’s a nearly 12x improvement in just four years.
From 2012 to 2016, Apple’s single-threaded CPU performance improved from 699 to 3500. While this still represents excellent scaling, the net improvement is roughly 5x — less than half the level of scaling we saw from 2008 to 2012. There are a number of reasons for this. Apple’s CPU designs are maturing and process node shrinks often don’t offer the same degree of improvement. But the other explanation is thermal — smartphones and tablets have hard TDP limits that prevent them from scaling upwards indefinitely. The slideshow below steps through some of the major advances in computing from the 1950s through the present day.
These limitations drive research in two different, seemingly contradictory outcomes. On the one hand, we’ve seen the smartphone industry integrate sophisticated sensors, DSPs, and other specialized co-processors to provide additional functionality without increasing power consumption. At the same time, there’s been a huge push towards the idea of cloud computing. Whether you’re riding in a self-driving car, conversing with a chatbot, or searching the web, companies like Microsoft, Google, and Nvidia have been jumping on the specialized processor bandwagon. IBM has TrueNorth, Google has TensorFlow, and we’ve written about other alternative architectures and methods for solving complex problems at multiple points at ET. AMD is the only company that’s really shipped an integrated programmable device with specialized hardware and heterogeneous compute support — and there are no publicly available HSA applications in-market that we’re aware of.
As AMD’s issues demonstrate, the problem with pushing specialized new architectures into the consumer market is that there’s little reason for developers to take advantage of them, and therefore little reason for consumers to take interest. This classic chicken-and-egg problem is only worse now that consumers hold on to PC hardware for longer and longer periods of time. If you don’t upgrade, you don’t get the general performance or battery life improvements baked into newer hardware, much less any specialized acceleration equipment.
Instead of trying to nudge products out into the market and then encourage broad adoption, Microsoft, Google, and other companies are building their own data centers in-house and applying their own expertise directly to hardware. That’s not to say the trend is entirely in one direction — Microsoft’s HoloLens depends on a custom-built HPU, or Holographic Processor Unit. But hardware like the HPU may be more of an exception than the rule, particularly considering that the device costs $3,000 and is explicitly marketed to developers.
There are practical advantages to bringing hardware development in-house. It’s much easier for a company to tweak an algorithm or experiment with new features or data analysis when it doesn’t have to worry about pushing that update out to thousands of customers. Desktop PCs and some laptops could handle the heat output from custom microprocessors, but nobody wants an FPGA in a smartphone. Giving the customer what they want means keeping portable devices thin and light, particularly since battery scaling historically hasn’t kept pace with existing computing power. Move those workloads to specialized cloud databases, and you can deliver the services customers want without waiting on developers to take advantage of features. When Microsoft began integrating FPGAs into its data centers, it was able to tweak algorithms constantly rather than trying to push those updates out to consumer hardware or otherwise take advantage of specialized fixed-function hardware.
This trend isn’t all upside, at least not if you prefer the existing model that puts premium performance in the hands of individuals who invest a relatively limited amount of money in hardware. By keeping advances in software tied to hardware running in custom data centers, Google, Facebook, and other companies can also keep their algorithms and intelligence proprietary and strictly off-limits. If you want to run an open-source operating system today, there are a wide range of choices with varying degrees of openness, some of which rely on no closed-source code. There’s no real equivalent in the realm of AI — no Cortana or Siri-equivalents, at least not yet (there are some projects underway).
The other risk in this kind of model is that it truly makes Internet connectivity essential — and that means it pushes more people towards carrier plans that often have no basis in objective reality when it comes to per-GB costs for data. But it also suggests a future in which computing performance continues to advance, provided certain issues related to network scaling, cost, and availability can be overcome. It could help drive down the long-term cost of computing devices, but the current carriers in the US have data plans that will soak up any one-time savings in long-term fees.
Moving away from the personal and into the cloud could restart scaling and allow manufacturers to deliver much faster experiences than they can today. But the hardware that drives said experiences may never be available to common consumers. Whether this is a good thing depends on whether you think widespread availability and low cost were key drivers of the computer revolution or merely temporary factors that are no longer necessary.