Elastic HPC Infrastructure: How non-hybrid HPC could be leaving customers behind

Posted: 13 Jul 2018

At ISC 2018 in June, atNorth (atNorth) hosted an invitation-only event for strategic partners, key customers and potential customers to discuss the state of HPC and how atNorth can best navigate the emerging trends in the HPC marketplace. Discussion centered around atNorth’s innovative cloud HPC offering HPCFLOW.

However, one of the most compelling insights to emerge from the event concerned a concept that hasn’t been used much in the HPC space. The idea’s origins instead trace to cloud service providers, who often tout cloud’s flexibility in allowing users to expand and reduce their IT footprint on the go.

In traditional cloud computing environments, then, the idea of having what they call an “elastic infrastructure” is crucial for the business. Some of the largest companies in the world in entertainment, media, retail, social networking, etc. could not exist under their current business models without an elastic IT environment that can respond instantaneously to spikes and down-drafts in system usage and projected IT requirements.

In HPC, however, elastic infrastructure is arguably becoming just as significant. And, as participants from the HPC ecosystem at atNorth’s event pointed out, one of the telltale signs of this fact may be found in the absence of spikes and down-drafts in their HPC system workloads.

“HPC workloads, like the more traditional workload, is going to be variable and have workload spikes,” said Gísli Kr., atNorth’s Chief Commercial Officer. “The only reason why your HPC workloads are flat is because you’ve maxed out on your HPC infrastructure. In fact, maybe we have yet to see the true workload spikes and workload trends in HPC, because everybody’s maxing out their local infrastructure all the time to avoid over investment.”

The inherent spikes in the HPC system

Translated into the terminology of cloud service providers, perhaps many HPC workloads are so flat because so many clusters lack the elastic HPC infrastructure needed to do anything else.

Seemingly calm waters, in so many words, may be a sign that something in your system could in fact be more responsive and indeed that you and your customers could be losing money in the long run. This could be due to productivity decline, missed deadlines, slowed workforce or reduced output — or all of the above.

Gísli Kr. mentions that the insight is a little surprising and unexpected. However, he notes there are also a couple reasons to suspect the idea surfaced by solutions specialists and HPC operators could have some value for many customers.

The first reason concerns some anecdotal evidence shared at the atNorth event. HPE and Intel solutions experts noted that in some of their HPC deployments in financial organizations, a predictable and regular spike in the system did not always show up as a hit on that bank’s HPC resources.

“We talked about how the banks are doing big settlements a few times a year, involving lots of HPC resources and then daily settlements that take up massive infrastructure every day — but just for a few hours,” says Gísli Kr. “So, where’s the workload spike? It’s going somewhere, but you just might not realize it, either you are over-investing or it goes into your job queue.”

Second, at an intuitive level, the idea has some traction. HPC is itself not so much of a different realm of computing where principles that hold true in conventional computing are invalid in the HPC space. And one of the foundational principles in computing is that input leads to output. If inputs to a system are not a steady and continuous stream, why would anyone expect that the output that input generates would somehow by itself smooth out?

No, more likely what is happening to produce flat HPC usage graphs is they’re like a bathtub filled to capacity. It’s impossible in such a maxed-out system — or at least very difficult — to measure the speed of the faucet pouring water in as well as the speed of the drain taking water out of the tub.

In other words, becoming accustomed to systems that work at or near 100% capacity all the time is necessarily becoming accustomed to maxed-out systems.

There is, of course, a better way to respond to maxed-out HPC infrastructure than to just accept it. atNorth’s cloud HPC offering, HPCFLOW, represents a powerful and inexpensive resolution to the maxed-out HPC infrastructure problem.

Moreover, other related collateral, also discussed and distributed at atNorth’s ISC event, may also be significant in considering the HPCFLOW advantage, including:

• atNorth’s new partnership with the CAE company Numeca

• atNorth’s new collaboration with NVIDIA

• Intel’s new related case study on our HPCaaS offering

Gísli Kr. says he certainly welcomes feedback on the “elastic HPC infrastructure” insight brought to light at ISC. Some may agree, and some may disagree. But he says the overall trend going forward is certainly clear. “Workloads are going to grow,” he says. “If you’re not thinking about Hybrid HPC at this point, then you’re starting to lag behind.”

Customers and partners seeking more information about HPCFLOW and atNorth’s HPC cloud offerings can contact an atNorth solutions expert. For free benchmarking of your systems to discover specifically what HPCFLOW environment works best for your own workload and IT infrastructure, contact an atNorth benchmarking specialist.