Introduction
With the advent of direct-electron detectors and advanced methods of image processing, structural characterisation of macromolecular complexes to near-atomic resolution is now feasible using single-particle electron cryo-microscopy (cryo-EM) (Cheng, 2015; Fernandez-Leiro and Scheres, 2016b). Although this has caused a rapid gain in its popularity, two technological factors still limit wide applicability of cryo-EM as a standard tool for structural biology.
First, partly due to the steep increase in demand, access to high-end microscopes is limited. This is being addressed with acquisition of new equipment in a large number of departments worldwide, as well as the establishment of shared infrastructures (Saibil et al., 2015). Second, processing the large amounts of data produced by these microscopes requires computational hardware that is not directly accessible to many labs. Even at larger centres the computational requirements are so high that cryo-EM now suffers from a computational bottleneck. The work presented here addresses this second problem, to the end of drastically reducing the computational time and investment necessary for cryo-EM structure determination.
A typical cryo-EM data set may constitute hundreds or thousands of images (called micrographs) of a thin layer of vitreous ice in which multiple individual macromolecular complexes (called particles) are imaged. Because radiation damage imposes strict limitations on the electron exposure, micrographs are extremely noisy. Thus, to extract fine structural details, one needs to average over multiple images of identical complexes to cancel noise sufficiently. This is achieved by isolating two-dimensional particle-projections in the micrographs, which can then be recombined into a three-dimensional structure (Cheng et al., 2015). The latter requires the estimation of the relative orientations of all particles, which can be done by a wide range of different image processing programs, such as SPIDER (Frank et al., 1981, 1996), IMAGIC (van Heel et al., 1996), BSOFT (Heymann and Belnap, 2007), EMAN2 (Tang et al., 2007), SPARX (Hohn et al., 2007), FREALIGN (Grigorieff, 2007), XMIPP (Scheres et al., 2008), RELION (Scheres, 2012a), or SIMPLE (Elmlund and Elmlund, 2012).
These programs also need to tackle the problem that any one data set typically comprises images of multiple different structures; purified protein samples are e.g. rarely free from all contaminants. Multiple conformations, non-stoichiometric complex formation, or sample degradation are all possible sources of additional data heterogeneity. The classification of heterogeneous data into homogeneous subsets has therefore proven critical for high-resolution structure determination and provides a tool for structural analysis of dynamic systems. However, identifying structurally homogeneous subsets in the data by image classification algorithms adds computational complexity, and often increases the computational load dramatically.
An increasingly popular choice for processing cryo-EM data is an empirical Bayesian approach to single-particle analysis (Scheres, 2012b) implemented in the computer program RELION (Scheres, 2012a). In the underlying regularised likelihood optimisation algorithm, optimal weights for the contribution of different Fourier components to the determination of orientations, as well as to the three-dimensional reconstruction(s), are learnt from the data in an iterative manner, thereby reducing the need for user input. In addition, RELION has proven highly effective in classifying a wide range of structural variation, such as conformational dynamics within protein domains (Bai et al., 2015), or of very small sub-populations in large data-sets (Fernández et al., 2013). Unfortunately, the regularised likelihood optimisation algorithm that underlies these calculations is computationally demanding. We estimate that a recent 3.7 Å structure of a yeast spliceosomal complex (Nguyen et al., 2016) required more than half a million CPU hours of classification and high-resolution refinement. Computations of this magnitude require the use of high-performance computing clusters with dedicated staff, and restrict the exploration of new image processing schemes.
The introduction of hardware accelerators, such as graphics processors (GPUs), has recently transformed other scientific fields where computation was a bottleneck. To exploit this type of hardware, substantial redesigns of algorithms are required to make many independent tasks simultaneously available for computation, which is known as exposing (low-level) parallelism. However, the possible gain is equally substantial; together with commodity hardware it has been a revolution e.g. for molecular dynamics simulations (Salomon-Ferrer et al., 2013; Abraham et al., 2015), quantum chemistry (Ufimtsev and Martínez, 2008), and machine learning (Jia et al., 2014).
GPU acceleration of computationally expensive algorithms has also been performed in other cryo-EM software, with varying success and subsequent impact in the field. Many attempts have achieved significant improvement in performance through acceleration of only a few subroutines (Li et al., 2010; Tagare et al., 2010; Li et al., 2013; Hoang et al., 2013), relieving major bottlenecks. The overall software architecture and data structures were however not modified to better suit the hardware. In contrast, other tools were designed or fundamentally reformulated with a particular hardware in mind (Li et al., 2015; Zhang, 2016); this tends to lead to more substantial long-term performance benefits, and for some steps the accelerated implementations have fully replaced CPU codes. Historically, RELION (like most alternatives) has scaled to the large resources it needs by using higher-level parallelism, where computational tasks are divided into subsets of particle images in each iterative refinement step. However, lower-level core computations on individual particle images in RELION have remained serialised since its introduction more than four years ago.
Here, we describe a new implementation of the regularised likelihood optimisation algorithm in RELION that uses GPUs to address its computational bottlenecks. We have chosen to implement our increased parallelism in CUDA, a programming language provided by NVIDIA. The CUDA language currently dominates the GPU computing market, and provides a stable programming environment with a rich C++ interface. We also utilise a number of libraries provided within the CUDA framework, such as cuFFT for fast Fourier transforms (FFTs), and CUB/thrust for sorting and other standard functions. In addition to high-end professional cards there is wide availability of cheap consumer hardware that supports CUDA, which provides outstanding value for many research groups. However, the acceleration and parallelisation approaches are general and should be possible to port to other architectures in the future.
The present acceleration of relion addresses the most computationally intensive steps in a typical image processing workflow. This includes classification of data into structurally homogeneous subsets (2D or 3D classification) and high-resolution refinement of each homogeneous such set of particles (3D auto-refine). In addition, we describe an improved algorithm for semi-automated selection of particles from micrographs (Scheres, 2015), this too targeting GPUs. Memory requirements have been reduced to fit widely available consumer graphics cards, and we show that the current adaptation to use single precision floating-point arithmetic does not cause loss of resolvable detail in the final structures. These developments enable high-resolution cryo-EM structure determination in a matter of days on individual workstations rather than relying on large clusters.
Methods
Regularised likelihood optimisation
The regularised likelihood optimisation in relion uses an Expectation-Maximization algorithm (Dempster et al., 1977) to find the most likely 3D density map for a large set of 2D particles images with unknown orientations, under the prior expectation that the 3D map has limited power in the Fourier domain. This iterative algorithm involves two fundamental steps at every iteration. In the expectation, or E-step, one calculates probability distributions for the relative orientations of all particles based on the current estimate of the 3D map. In the subsequent maximization, or M-step, one updates the estimate for the 3D map. Intuitively, these two steps represent alignment of the particles with respect to a common 3D reference, and reconstruction of a new reference map, respectively. The algorithm is typically started from an initial 3D reference map at low resolution, and is guaranteed to increase the likelihood of the 3D map given the data and the prior at every iteration, until converging onto the nearest local minimum. In the presence of structural variability in the data set, multiple 3D reference maps, or 3D classes, can be refined simultaneously. In this case, the class assignments of the particles are not known either. RELION also allows the refinement of multiple 2D references, or 2D classes. In that case, only in-plane orientations are sampled.
Within a single iteration, the E-step requires four major computational stages (Figure 1): (i) The current 3D reference map is projected along many orientations; (ii) the difference between these reference projections and every particle image is calculated for each orientation and for each sampled translation; (iii) all these differences are converted to probability-weights; and (iv) those weights are used to update a running sum of the back-projected particle images in 3D Fourier space. In practice, to limit the number of operations, stages (i) and (ii) are performed twice for every particle. A first pass examines all orientations using a relatively coarse orientational sampling, and a second pass re-examines the regions of orientational space with significant probabilities at an increased sampling density. Still, millions of orientations and translations are typically compared for each particle at every iteration. Converting the resulting differences to weights is a relatively cheap operation as each image-orientation pair at this stage is represented by a single scalar value. The back-projection in operation (iv) is more demanding since it is again necessary to work with all pixels of each image in multiple orientations. After the E-step has been evaluated for all (typically tens to hundreds of thousand) particles, in the M-step the running sum in 3D Fourier space is transformed into an updated 3D reference map for the next iteration. The corresponding reconstruction algorithm is expensive in terms of computer memory, but it typically needs orders of magnitude less time than the E-step because it is only performed once for every iteration. Therefore, operations (i), (ii) and (iv) of the E-step have dominated execution time in previous versions of RELION.
Figure 1.
High level flowchart of RELION.
(A) Operations and the real vs. Fourier spaces used during (B) image reconstruction in RELION. Micrograph input and model setup use the CPU, while most subsequent processing steps have been adapted for accelerator hardware. The highlighted orientation-dependent difference calculation is by far the most demanding task, and fully accelerated. Taking 2D slices out of (and setting them back into) the reference transforms has also been accelerated at high gain. The inverse FFT operation has not yet been accelerated, but uses the CPU.
DOI: http://dx.doi.org/10.7554/eLife.18722.002
Extracting parallelism for accelerators
Modern accelerator processors - such as GPUs - achieve very high floating-point performance by incorporating a large set (thousands) of very simple and streamlined functional units instead of the handful advanced general-purpose cores in a normal (CPU) processor. In previous versions, RELION has scaled over multiple CPU cores by using separate processes for independent particle images. GPUs however require much broader low-level parallelism to increase performance substantially. Our implementation of RELION therefore required a reformulation to expose additional available parallelism. Since most of the computation time is spent on the E-step for a typical application, we focussed our acceleration efforts on this part. By treating multiple reference maps, all relative image translations and orientations, and even individual image pixels as parallel tasks (Figure 2), we created sufficient low-level parallelism to provide outstanding performance on accelerator hardware such as GPUs.
Figure 2.
Extensive task-level parallelism for accelerators.
While previously relion only exploited parallelism over images (left), in the new implementation classes and all orientations of each class are expressed as tasks that can be scheduled independently on the accelerator hardware (e.g. GPUs). Even individual pixels for each orientation can be calculated in parallel, which makes the algorithm highly suited for GPUs.
DOI: http://dx.doi.org/10.7554/eLife.18722.003
Formulating and enabling the available parallelism in RELION for GPUs did present specific challenges. In operations (i) and (ii) of the E-step, it would for instance require very large amounts of fast memory to pre-calculate projections of the 3D reference maps for all examined orientations. Pre-calculation could potentially increase re-use of data on the GPU, but it would impose severe limitations on the number of simultaneously examined references, input image sizes, and sampling granularity. Instead, like the CPU code, our implementation stores a two-fold oversampled Fourier transform of each reference in GPU memory, and 2D slices (along any orientation) are extracted only when needed. This alleviates limitations imposed by hardware, and makes it possible to improve performance through the use of so-called GPU textures. Textures store data that require fast lookup, and the corresponding texture units provide support for performing non-integer pixel interpolation of texture data in a single instruction with only marginal loss of precision. In graphics applications, textures are used to efficiently rescale image data. In RELION, they turn out to be well-suited for the resampling operations when taking slices from the 3D Fourier transform. The on-the-fly extraction of 2D Fourier slices is combined with the very broad parallelism of calculating squared differences between all pixels of all particle images and the reference projections in all orientations and all translations. For the technically interested reader, details of how we exploit this parallelism in our implementation of the squared difference calculation are presented in Appendix I. Technical details of how we also accelerated operation (iv) of the E-step, the back-projection of 2D images into the oversampled 3D Fourier transform, are presented in Appendix II. When combined, the E-step can now be evaluated in a fraction of the runtime previously needed.
Finally, like most scientific software, the CPU version of relion has historically used double precision, and merely recompiling in single does not improve performance. However, all GPUs provide higher performance for single precision, and the dedicated texture units can typically only perform reduced-precision operations. For these reasons we also reformulated our implementation of the E-step to be less precision-sensitive, making it possible to execute these calculations in single precision without adverse effects on the final results. As we did not accelerate the M-step, this part of the algorithm remains on the CPU and is still executed in double precision.
Semi-automated picking
RELION also implements a template-based particle selection procedure, which calculates a probability measure (the R-value) for each pixel in the micrograph to signify the likelihood that it is the location of any of the provided templates (Scheres, 2015). The R-value map of a micrograph considers all possible rotations of each template, and is subsequently used in a peak-search algorithm that locates particles within the original micrograph (Figure 3). These calculations are performed in Fourier space, where they are highly efficient (Roseman, 2003). In fact, they are so fast that their execution time becomes negligible compared to the time spent performing FFTs to transform image objects between real and Fourier spaces. Consequently, even though reference templates are also treated as independent tasks to increase parallelism in the GPU version, a much larger gain is found at the level of template rotations, through parallel execution of FFTs. For example, when using 5-degree incremental template rotations, 72 such inverse FFTs are now performed concurrently on the GPU through the cufft cuda library. The size of these FFTs is now also padded automatically, since substantial performance penalties can occur if the transform size includes any large prime factors.
Figure 3.
Semi-automated particle picking in RELION-2.
The low-pass filter applied to micrographs is a novel feature in RELION, aimed at reducing the size and execution time of the highlighted inverse FFTs, which accounts for most of the computational work. In addition to the inverse FFTs, all template- and rotation-dependent parallel steps have also been accelerated on GPUs.
DOI: http://dx.doi.org/10.7554/eLife.18722.004
Results
Acceleration of regularised likelihood optimisation
Performance
The performance of our implementation on a workstation equipped with modern GPUs can exceed that of hundreds of CPU cores (Figure 4). This is most prominent for increasing numbers of pixels, orientations and classes, due to the increased low-level parallelism RELION-2 has been designed to use efficiently. Therefore, calculations where many classes and orientations need to be sampled, e.g. 3D classifications over multiple classes and with fine sampling of orientations, experience the greatest gain from the acceleration (Figure 4). Traditionally such large problems have required cluster-size resources with high-performance interconnects for fast communication. As seen in the performance benchmarks, RELION-2 makes it possible to run many of these calculations even on workstations or low-cost desktop solutions (Figure 4). GPU hardware evolves rapidly, but appendix XI contains a few recommendations about the solutions we currently think are most cost-efficient. For sufficiently large computational problems, relion’s processing time scales linearly with increased number of classes, but since the extra calculations are much faster with the GPU-enabled version the relative advantage is larger the more classes are used (Figure 4D). In practice we believe this will make it more common to use very large numbers of classes in 3D classification. Further scrutiny reveals that the calculations still performed on the CPU actually dominate both the execution and scaling even for the GPU version (Figure 4E), which indicates new bottlenecks are now limiting scaling - we intend to focus on these parts of the code for future improvements.
Figure 4.
RELION-2 enables desktop classification and refinement using GPUs.
EMPIAR (Iudin et al., 2016) entry 10028 was used to assess performance, using refinements of 105 k ribosomal particles in 3602-pixel images. (A) A quad-GPU workstation easily outperforms even a large cluster job in 3D classification. (B) In 2D classification, the GPU desktop performs slightly better in the first few iteration and then provides performance equivalent to the 280 CPU cores. (C) Total time for 25 iterations of 3D classification for a few different hardware configurations. (D) Additional classes are processed at reduced cost compared to CPU-only execution, due to faster execution and increased capacity for latency hiding. (E) With increasing number of classes, the time spent in non-accelerated vs accelerated execution increases. (F) The workstation also beats the cluster for single-class refinement to high resolution, despite the generally lower degree of parallelism. This is particularly striking for the finer exhaustive sampling at 3.8
DOI: http://dx.doi.org/10.7554/eLife.18722.005
Limited precision and accuracy
RELION has used double precision arithmetic since its first release in order to be as accurate as possible, at the cost of increased memory requirements. While there are professional GPUs with good double precision performance, the consumer market is dominated by visualisation and gaming applications, and for this reason cheap hardware only provides good performance for single precision. Even for professional hardware, the performance is better with single precision, although the difference is smaller. This makes it highly desirable to use single precision arithmetics wherever possible. In addition to much better floating-point throughput, single precision calculations reduce the memory requirements by a factor two, and make it possible to employ textures for image rotation. The relative advantage of single precision can thus be much higher on GPUs compared to CPUs. Because the required precision depends on the algorithms used in the application, part of the development of RELION-2 involved the evaluation of reconstructed quality when using single precision. We evaluate this primarily by examining the agreement of refinements results, characterising both the reconstructed volume and the image orientation assignment statistics. As image orientations are analogous to a location and rotation on the unit sphere, we compared refinement results of the EMPIAR 10028 dataset by the distribution of angular differences for all images. Two double-precision CPU refinements (using different random seeds) produce distributions where 81% of images fall within
Figure 5.
The GPU reconstruction is qualitatively identical to the CPU version.
(A) A high-resolution refinement of the Plasmodium falciparum 80S ribosome using single precision GPU arithmetic achieves a gold-standard Fourier shell correlation (FSC) indistinguishable from double precision CPU-only refinement (previously deposited as EMD-2660). The FSC of full reconstructions comparing the two methods shows their agreement far exceeds the recoverable signal (grey), and as shown in Figure 5—figure supplement 1 the variation in angle assignments match the differences between CPU runs with different random seeds. (B) Partial snapshots of the final reconstruction following post-processing, superimposed on PDB ID 3J79 (Wong et al., 2014).
DOI: http://dx.doi.org/10.7554/eLife.18722.006
Figure 5—figure supplement 1.
The CPU and GPU implementations provide qualitatively identical distributions of image orientations.
For two CPU runs with different random seeds, 81% of images fall within 1
DOI: http://dx.doi.org/10.7554/eLife.18722.007
Disk and memory considerations
relion has traditionally required large amounts of memory. Fortunately its peak use is however not during the accelerated, computationally most intensive, parts of the algorithm. Rather, memory use peaks during the M-step, which is executed on the CPU using the gridding reconstruction algorithm mentioned above. The amount of available GPU memory still remains a limitation, as it determines the capacity for storage of the oversampled Fourier transforms of one or more references. This is of particular concern for larger and higher-resolution structures, which require more memory to be faithfully represented. When resolving detail at the Nyquist frequency, due to a twofold oversampling (associated with gridding during reconstruction), relion requires memory corresponding to twice the cube of the image dimension. For example, when using 400
Figure 6.
GPU memory requirements.
(A) The required GPU memory scales linearly with the number of classes. (B) The maximum required GPU memory occurs for single-class refinement to the Nyquist frequency, which increases rapidly with the image size. Horizontal grey lines indicate avaliable GPU memory on different cards.
DOI: http://dx.doi.org/10.7554/eLife.18722.008
To enable efficient evaluation and good scaling on GPUs, several new methods to manually manage data efficiently have been implemented. Lower levels of parallelism are coalesced into larger objects using customised tools, which results in more efficient use of memory (see appendices I-II for details). In addition, because of the much improved performance, multiple tasks have become limited by how fast input data can be read from disk. Therefore, we now find it highly beneficial to explicitly cache data on local solid state devices (SSDs), as has also been observed for GPU-accelerated CTF estimation (Zhang, 2016). To allow this in a straightforward way, RELION-2 features the ability to automatically copy data sets to fast local disks prior to refinement, which further increases performance during less computationally intensive steps, such as 2D classification.
Acceleration of automatic particle picking
Low-pass filtering of micrographs
Even after parallelisation and acceleration on GPUs, the cross-correlation-based particle selection in relion is dominated by computing many large inverse FFTs (Figure 3), as has been observed previously for similar methods (Castaño-Díez et al., 2008). Reducing their size is thus the most straight-forward way to further improve execution performance. Reference templates are typically subject to low-pass filtering, and for this reason we investigated the possibility to apply a similar filtering to all micrographs, which reduces computations by discarding high-frequency information.
We found little difference in the particles selected when discarding resolution information in micrographs beyond that of search templates. While intuitively straightforward, this conclusion drastically reduces the size of FFT grids and subsequent computations, which provides large acceleration at virtually no quality loss. The low-pass filtering also significantly reduces the amount of memory required for particle selection, which permits parallelism to target hardware like desktop workstations more efficiently.
Autopicking performance
We tested both picking speed and quality of picked particles of RELION-2. In an initial test, a single 4096
We further evaluated the quality of filtered selection according to the
Figure 7.
Low-pass filtering and acceleration of particle picking.
(A) Ribosomal particles were auto-picked from representative
DOI: http://dx.doi.org/10.7554/eLife.18722.009
A complete workflow for
To illustrate the impact of our GPU implementation and show how it can alter practical work, we chose to re-analyse the EMPIAR-10061 dataset of
Figure 8.
High-resolution structure determination on a single desktop.
(A) The resulting 2.2 Å map (deposited as EMD-4116) shows excellent high-resolution density throughout the complex. (B) The most time-consuming steps in the image processing workflow. GPU-accelerated steps are indicated in orange. The total time of image processing was less than that of downloading the data. (C) The resolution estimate is based on the gold-standard FSC after correcting for the convolution effects of a soft solvent mask (black). The FSC between the relion map and the atomic model in PDB ID 5A1A is shown in orange. The FSC between EMD-2984 and the same atomic model is shown for comparison (dashed gray).
DOI: http://dx.doi.org/10.7554/eLife.18722.010
Discussion
We present a GPU-enabled implementation of RELION-2, as a first step to address current and future needs for large and expedient computations in the field of cryo-EM structure determination. The principal benefits drawn from the presented work are twofold. First, the nature of progress in scientific applications is to continually re-evaluate and examine data in many different ways. With ease of re-processing data, the threshold for trial, error and successive improvement of existing methods is now markedly lowered. Second, the order-of-magnitude speedups make it possible to get by with much less hardware for cryo-EM processing, in most cases even desktops. This removes a computational bottleneck for large labs, and enables any group to perform their own reconstruction without access to supercomputers.
In the next few years, larger data sets and image sizes are expected, as well as new methods that require expedient processing of large data sets. The large reduction in computational costs opens up the possibility to perform more ambitious computational analyses without increasing the investments. For example, the favourable scaling of performance we observed for multi-class refinements will make it feasible to use many more classes than was practical before, which will lead to better descriptions of conformational diversity in flexible molecules. Additionally, with even faster algorithms and hardware it might soon be possible to perform highly automated, on-the-fly, structure determination while data acquisition is ongoing. In anticipation of these developments, RELION-2 already implements a pipelined approach for automated execution of pre-determined image processing workflows (Fernandez-Leiro and Scheres, 2016a).
While the new GPU implementation has removed many of the previous computational bottlenecks in relion, the large speedup has exposed several new areas of the code that can now dominate execution time, such as data input/output and the reconstruction step during iterative image refinement. Although these parts of the algorithm were previously insignificant, in some cases they now collectively account for roughly 50% of total execution time. These parts of the code will see benefit from further modifications. Future work will e.g. strive to further generalise parallelism such that performance is less dependent on the type of refinement performed, as sufficient parallelism is always available within the relion core algorithm. Memory requirements on the GPU are also expected to be reduced further, so that larger image sizes and more classes can be handled to higher resolution.
With the current implementation, cryo-EM structures to near-atomic resolution can be calculated in a matter of days on a single workstation, or hours on a GPU-cluster. Nevertheless, the aim of the current adaptations is not to present a final solution to computational needs in RELION; while the present version achieves excellent speedup on a wide range of low-cost systems, we expect the acceleration to improve both in performance and coverage. Generalising the low-level parallelism described here to vectorised CPU calculations, and possibly an open GPU language like OpenCL, will constitute little more than translating this parallelism to new instructions. This is something we intend to pursue in the future. As such, RELION-2 represents a new incarnation of an existing algorithm, which is intended to be developed far further in the following years. Meanwhile, we hope that the current implementation will have as much impact in the broader community as it is already having in our labs.
Materials and methods
Availability
RELION 2 is both open source and free software, distributed under the GPLv2 licence. It is publicly available for download through http://www2.mrc-lmb.cam.ac.uk/relion.
Data sets and hardware specifications
The ribosome data used for the 3D classification and refinement in Figures 4 and 5 correspond to EMPAIR entry 10028 (Wong et al., 2014). For autopicking, empiar entry 10017 was used (Scheres, 2015). The complete workflow for
The acceleration in RELION-2 works with any NVIDIA GPU of compute-capability 3.5 or higher, which covers all models launched the last three years. Version 7.0 of the CUDA toolkit is also required, and was used for all presented results. CPU performance was benchmarked with a cluster of 10 compute nodes equipped with dual Xeon E5-2690v4 CPUs (2
Super-resolution
Table 1.
Quality and speed of autopicking for the
DOI: http://dx.doi.org/10.7554/eLife.18722.011
Code | Filter | # picked | Recall | FDR | Time | Performance in |
---|---|---|---|---|---|---|
(Å) | particles | (s/micrograph) | CPU core units | |||
CPU | none | 54,301 | 0.88 | 0.34 | 1,227 | 1 |
GPU | none | 54,325 | 0.88 | 0.34 | 10 | 122 |
GPU | 5 | 55,629 | 0.90 | 0.34 | 5.8 | 211 |
GPU | 10 | 55,886 | 0.90 | 0.34 | 2.1 | 584 |
GPU | 15 | 56,450 | 0.92 | 0.33 | 1.6 | 766 |
GPU | 20 | 57,361 | 0.95 | 0.33 | 1.3 | 943 |
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2016, Kimanius et al. This work is licensed under the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/3.0/ ) (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
By reaching near-atomic resolution for a wide range of specimens, single-particle cryo-EM structure determination is transforming structural biology. However, the necessary calculations come at large computational costs, which has introduced a bottleneck that is currently limiting throughput and the development of new methods. Here, we present an implementation of the RELION image processing software that uses graphics processors (GPUs) to address the most computationally intensive steps of its cryo-EM structure determination workflow. Both image classification and high-resolution refinement have been accelerated more than an order-of-magnitude, and template-based particle selection has been accelerated well over two orders-of-magnitude on desktop hardware. Memory requirements on GPUs have been reduced to fit widely available hardware, and we show that the use of single precision arithmetic does not adversely affect results. This enables high-resolution cryo-EM structure determination in a matter of days on a single workstation.
DOI: http://dx.doi.org/10.7554/eLife.18722.001
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer