HPC in the Cloud – Technical Feasibility Challenges

High-performance computing is making a transition to the cloud. Many companies are porting their HPC application to the cloud. Although cloud computing provides a wide range of benefits, such as elasticity, scalability, infinite resources, pay-as-you-go pricing, and hardware virtualization, it still has many technical feasibility challenges. Read on!

Dynamic Scalability

One of the major problems with the cloud is that dynamic scalability is not up to the mark. Many HPC applications have dynamic scalability as a major feature. Dynamic scalability refers to scaling up and down compute notes by the application.

For instance, if you set the response time to two milliseconds for user queries, and the response time exceeds the threshold, the application will allocate additional compute nodes to overcome the problem of busy queries. The compute note allocation can take up to 10 minutes. Therefore, such as a delay prohibit the compute node from handling busy queries in time.

Low-Level Performance Control  

Many HPC users do not have adequate control over the compute nodes. HPC applications read user queries and split the tasks into sub-tasks among compute nodes. Each node then requests data from the storage depending on the sub-tasks. For instance, if the user’s next query is the same, will the application reuse the previous data set or execute the same process again?

Even if you transfer the data from cloud to local storage to prevent latency delay, the same compute node may not serve the next user’s query. It is not easy to fully exploit the capacity of compute node due to the lack of low-level control. Similarly, it isn’t easy to maximize data locality.

Multi-Tenancy

Many HPC companies are unsure about the dedication of compute nodes for the application. Although multi-tenancy is a great cloud feature, it is still an issue for many HPC application.

Multi-tenancy refers to sharing compute nodes among multiple HPC applications. Because many application run on the same compute node, it will significantly decrease the amount of bandwidth assigned to each HPC application. As a result, you may face problems like performance degradation.

Fault-Tolerance

Reliability and fault-tolerance are other feasibility issues when it comes to HPC in the cloud. For example, some cloud platforms like Windows Azure does not provide information on the time it takes to replace a failed node with a new one. Besides, many companies don’t know the impact of hardware failure on the HPC application performance.

It is crucial to study and consider such impacts when developing and maintaining the load management system. For example, on Platform-as-a-service (PaaS) architecture, one downside is that testing your HPC application gainst compute node failures and fault tolerance are very difficult.

No Remote Debugging

Although companies can debug their cloud HPC locally, the architecture might not support remote debugging. As a result, you may face the problem for developing and deploying large HPC application on the cloud.

HPC program development has technical problems like parallel and remote debugging. Research shows that these are new problems on cloud computing, and service providers must develop and provide effective detection tools to companies.

It is crucial to use light-weight profiling tools to analyze and streamline performance. However, many cloud platforms lack these features, which cause problems for businesses.

Final Words

Although many cloud computing platforms provide great architecture, they have flexibility issues that prohibit companies from optimizing their HPC performance. HPC users need high-level controls to improve their high-performance computing applications.

If you want to overcome these challenges, you can contact Clovertex for high-performance computing services. We have a team of experienced and professional IT personnel with extensive knowledge of technical HPC. 

Recent Posts

Blog

Clovertex Receives AWS Funding Support for Its Clients Under AWS MAP Program

We are pleased to announce that Clovertex is the latest AWS partner to become MAP-qualified. With this qualification, along with AWS Advanced Tier Services Partnership with AWS, we are in a great position to help you support your cloud migration and modernization journey and enjoy the benefits of lowering infrastructure costs, reducing security incidents, lowering time to market of new features and innovating faster.

Read More »
Video

Faster Drug Discovery Design with WEKA on AWS

Clovertex Principal Architect Baris Guler and WEKA Director of Sales, Pruitt Chamness, co-presented at AWS re:Invent on their collaboration to enable cryo-EM data processing at scale. This solution allows scientists to access data quicker, drive results faster and focus on their research instead of infrastructure.

Read More »

Contact Us

Head Office (USA)

275 Grove St Suite 2-400
Newton, MA, 02466
[email protected]
+1 (508) 395-3423

Regional Office (India)

Workafella, Cyber Crown, Suite #204 2nd floor, Sec-II Village, HUDA Techno Enclave, Madhapur, Telangana, 500081.

Clovertex is hiring.
To apply, visit the Careers page.