The use of cryo-electron microscopy (cryo-EM) in the pharmaceutical R&D sector is rapidly growing. Like many pharmaceutical companies, our client’s molecular profiling group relied on leased cryo-electron microscopes at a shared facility to engage in drug discovery.
This shared facility was processing data on AWS. Though it was cloud-based, it was still only utilizing a single machine. This caused data sets to take up to 3-4 weeks to process, limiting the number of experiments they could run. Another factor was data uploading. The considerable amount of data being generated by the microscopes didn’t even start uploading until all the processing was completed. Then, compounding this, whenever parameters were changed everything had to be re-processed once again, further adding to delays in results.
Thanks in part to their rapid and continued growth, our client decided to invest in building their own facility, which greatly improved throughput. This additional throughput, though, then created a new downstream challenge:
- How to process the greatly increased amount of data being generated by their on-prem cryo-EM facility?
- They wanted to reduce processing time from weeks to days, enabling them to process multiple times per week.
To address these needs, Clovertex built a High Performance Cluster (HPC) on AWS, initially using the open-source Relion software to process the cryo-EM data. We then built a way to quickly and securely transfer data from their cryo-EM lab to AWS, where it got processed and the data was stored.
By leveraging AWS cloud elasticity, we also unlocked the benefits of parallelism, which significantly reduced overall processing time.
To improve data transfer, Clovertex implemented incremental data transfer, so data could start moving from the microscope to AWS as it was being created, instead of having to wait until the entire process was completed.
In addition, now that they had an on-prem facility, our client chose to invest in cryoSPARC, a commercial cryo-EM software which unlocked even higher performance by leveraging a SPARC cluster. Clovertex helped migrate their processes from Relion to this more robust, well-supported application.
AWS services used in this solution include:
- AWS Direct Connect
- AWS Data Sync
- Amazon S3
- AWS ParallelCluster with Amazon EC2 GPU Based Instance Families G4, G5, P3 and P4
- Amazon Elastic Block Storage (Amazon EBS)
- Amazon FSx for Lustre
- Amazon Cloud Watch
The combination of parallel processing on AWS, improved data transport, and optimized settings enabled our client to finish processing an experiment in less than a week. In addition, it enabled multiple scientists to run multiple clusters simultaneously, enabling experiments with the same data but different parameters, simultaneously. Also, whenever a new software version is released, they can now evaluate the new version separately without disrupting current version production processing.