HPE The Machine Sets Memory Record And Uses Single-Socket ARM Processors

Hewlett Packard Enterprise (HPE) ended up owning a chunk of Hewlett Packard Labs after the split with HP Inc. [The other chunk of labs took the name “HP Labs”.] It is often hard to put a price tag on innovation, but I think the folks at HPE have a clear idea of the genuine cost of fundamental innovation through ownership of Hewlett Packard Labs.

Hewlett Packard Labs’ future data center architecture prototype “The Machine” is a big bet. You can read exactly how big a bet it is here and here. Hewlett Packard Labs designed The Machine so that researchers can experiment with many advanced computing concepts, either separately or as bundles. HPE can then decide which advanced concepts to commercialize, when to commercialize and with how much budget.

Memory-first data center architecture is important is because it is both faster and will use less energy than the architectures the industry has been using for the past seven decades. Architects assume that memory is faster than storage but volatile (it burns power to stay on and data disappears when power is shut off) and that storage is denser and cheaper per byte than memory and it is non-volatile (data stays in place when there is no power).

A host of new non-volatile memory technology options are being readied for commercialization over the next few years. Some of these technologies are close to DRAM read speeds today, but the goal is to get as close to current DRAM write speeds as possible. When that happens, these new non-volatile memory technologies will be denser than current memory and they will consume much less power. The potential is to replace many data center storage tiers – solid-state storage accelerators (such as NVMe), solid-state drives (SSD) and hard disk drives (HDD) – eventually converging on global non-volatile memory pool and archival storage as the two remaining tiers of memory and storage.

Architects also assume that the processor is the scarce, expensive resource in a system. However, today’s cloud-scale systems have a wealth of processing capability. That capability will get less expensive as competition heats up later this year, with AMD Epyc, ARM-based processors from Cavium and Qualcomm, plus IBM’s next generation POWER designs entering the market. Processors are already spread throughout data center racks, the trick is to enable all the processors to see the same data pool and assign processors that are physically close to data to process that data, instead of spending time and energy to move data across large scale networks to processors.

HPE mentions its first market targets as: smart cities, autonomous vehicles and intelligent power grids. In many respects these are not separate entities:

Smart cities are dependent on power grids. It is likely that most autonomous vehicles will be fully electric and as electric vehicles become more popular, recharging them will affect city-wide power grid behavior.

Autonomous vehicles will be able to communicate with city-wide traffic control systems through V2I (Vehicle to Infrastructure) connections. Both vehicles and cities have incentive to decrease each vehicle’s power use while also delivering people to their destinations in comfort and on time.

While these are good examples of systems that must share a lot of data, they are only a small sample of the potential for memory-first architectures.

There are four elements to HPE’s recent disclosures on The Machine:

1. A very large 160 Terabyte (TB) pool of physical shared memory. Most PCs today ship with eight to 16GB of memory. 160TB is the equivalent physical memory to 10,000 high-end consumer PCs. A large consumer HDD today has a capacity of 5TB of storage. 32 5TB HDDs hold the same amount of data as 160TB of memory, but the bandwidth and latency to read or write that data to storage are orders of magnitude slower than writing to memory.

Memory pool on a single The Machine sled

TIRIAS Research

Put another way, 160TB of DRAM memory bought as 128GB registered DIMMs (the type I believe HPE is using for The Machine) has a street price of almost $10 million (multiply the memory on in the sled show above by 40). At a volume discount for a company the size of HPE, that could bring a discounted price perhaps as low as $7 million. That is for just the memory sticks, not counting the rest of the cost of the system. Power consumption for that pool of DIMMs is about 9.6 kilowatts (kW) – again, just for the global pool of memory.

HPE plans to replace the DRAM used in this prototype with non-volatile memory as soon as reasonably fast non-volatile memory technology is available. In the meanwhile, they can learn how to design systems and software to take advantage of large global non-volatile memory pools.

2. HPE’s X1 photonics module is operational and in use in the prototype. I described the X1 photonics module in detail in this post. Without the X1 module, The Machine’s global memory pool would not be performant at any meaningful scale. HPE implemented this demonstration’s pool of memory using only four server chassis each housing 10 memory and compute sleds. That means 40 sleds with a total of 40 processors and a lot of interconnects.

HPE

This is a big deal because other vendors, notably Intel, have for years been unsuccessful in trying to commercialize silicon photonics (SiPh) for in-chassis use. Hewlett Packard Labs made a very reasonable decision to use the more mature vertical-cavity surface-emitting laser (VCSEL) technology in this first set of The Machine prototypes. HPE does have plans to upgrade to SiPh in the future, but showing optical capability was more important than waiting for other technology breakthroughs.

3. Use of a 64-bit ARM server system-on-chip (SoC), namely Cavium’s upcoming ThunderX2 SoC. HPE has been working closely with Cavium. Given that ThunderX2 has not yet launched, I think this live demonstration is a good sign that Cavium is hitting ThunderX2 performance and sampling targets. For comparison with system memory costs, above, the total cost of the 40 processors (in volume) would be close to $25,000 – a few orders of magnitude lower than the memory costs. I do not think that HPE chose ThunderX2 as a pure cost play, given how little of the system cost the processors account for.

TIRIAS Research

HPE characterizes their use of ARM-based processors as a learning experiment in addition to large pools of memory. However, their choice of single-socket compute nodes is not dependent on processor architecture. Like Microsoft’s choice to enable single-socket nodes for OCP Project Olympus and AMD’s emphasis on single-socket nodes for their Epyc SoC, HPE must be looking at rising core counts and increasing I/O capabilities of single-socket compute nodes.

4. Software development tools designed for systems with large scale persistent memory. HPE did not share updates on software tools during their presentations. However, a scan of The Machine’s software developer website shows continuing work on Linux for large, persistent memory pools, a managed data structure interface initiative, work on database engines and fault-tolerant programming models.

Where to from here?

The Machine has a physical memory address space of 4096 yottabytes. One yottabyte is, well, a yottabyte is so large that it effectively loses meaning. Millions of petabytes are still too small. Measuring address space in yottabytes means that The Machine’s architecture has a lot of headroom for growth. HPE’s largest ever assembled 160TB of system memory is a baby step.

Fundamental innovation fuels products that change the course of an industry. Sometimes fundamental innovation can even change the course of a society, where incremental innovation gives us small or gradual improvements. HPE’s bet on The Machine is intended to change the way large-scale distributed computing systems are designed. If HPE can commercialize major aspects of The Machine, it will have knock-on effects across industries. Its bet will take a few years, perhaps even a decade, to play out. I hope HPE sticks with this bet, as it is a good vision of the future of computing.

-- The author and members of the TIRIAS Research staff do not hold equity positions in any of the companies mentioned. TIRIAS Research tracks and consults for companies throughout the electronics ecosystem from semiconductors to systems and sensors to the cloud.

Follow me on Twitter or LinkedIn. Check out my website.

More From Forbes

HPE The Machine Sets Memory Record And Uses Single-Socket ARM Processors