ABSTRACT: All-atom normal mode analysis (NMA) is an efficient way to predict the collective motions in a given macromolecule,
which is essential for the understanding of protein biological function and drug design. However, the calculations are limited in time
scale mainly because the required diagonalization of the Hessian matrix by Householder-QR transformation is a computationally
exhausting task. In this paper, we demonstrate the parallel computing power of the graphics processing unit (GPU) in NMA by
mapping Householder-QR transformation onto GPU using Compute Unified Device Architecture (CUDA). The results revealed
that the GPU-accelerated all-atom NMA could reduce the runtime of diagonalization significantly and achieved over 20� speedup
over CPU-based NMA. In addition, we analyzed the influence of precision on both the performance and the accuracy of GPU.
Although the performance of GPU with double precision is weaker than that with single precision in theory, more accurate results
and an acceptable speedup of double precision were obtained in our approach by reducing the data transfer time to a minimum.
Finally, the inherent drawbacks of GPU and the corresponding solution to deal with the limitation in computational scale are also
discussed in this study.