Algebraic constraints and algorithms for common

Full text

Turn on search term navigation

Impact Statement

Single-particle cryo-electron microscopy (cryo-EM) is an imaging technique used to determine the 3D structure of biomolecules from noisy 2D projection images. This article revisits one of the first approaches to cryo-EM image processing, namely common lines between pairs of 2D class averages coming from the Fourier slice theorem. We present a novel mathematical approach for dealing with common lines: in contrast to some alternatives, it operates directly on the common lines themselves and avoids triplewise angular reconstitution completely. The article then derives novel algebraic constraints on sets of consistent common lines, including a straightforward low-rank matrix condition. The algebraic conditions are incorporated into optimization methods arising from the field of computer vision to produce new methods for computational tasks involving common lines. In particular, we achieve improved accuracy in common line denoising and rotation recovery at low signal-to-noise ratios. We also present a method to detect homogeneous communities of 2D class averages in the case of a cryo-EM dataset with multiple molecular conformations. Altogether this work clarifies a classic topic in cryo-EM, and opens the door to applying common lines techniques on more challenging datasets.

1. Introduction

Single-particle cryo-electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution 3D structure of molecules from many noisy tomographic projection images taken at unknown viewing angles. One of the first approaches for 3D reconstruction, known as angular reconstitution, is based on the common line property of projection images induced by the Fourier slice theorem.⁽¹^,²⁾ Due to the low signal-to-noise ratio (SNR) in cryo-EM data, detecting common lines is a difficult task⁽³⁾: even today when applied to denoised averages of images, referred to as 2D class averages. Detecting common lines is subject to angular errors and incorrectly identified common lines. Although methods which seek to minimize global errors in the estimated viewing directions have increased the utility of common lines methods,⁽⁴⁾ additional constraints on common lines are needed to improve their accuracy and robustness.

In this article, we propose a novel approach for dealing with common lines. Specifically, we assemble the estimated common lines for a dataset of $ n $ images into a certain $ 2n\times n $ matrix, which stores properly scaled basis vectors for the common lines (Theorem 3.2). The matrix directly encodes common lines data, without requiring angular reconstitution on various subsets of images or needing voting procedures like some existing formulations.⁽³^,⁵⁾ As such, it yields a direct and more global approach than prior constructions for common lines.

As a main contribution, we derive algebraic constraints on the matrix of common lines, which must be satisfied in order for a set of common lines to be consistent with a single asymmetric molecular conformation. The constraints include a straightforward low-rank condition on the matrix, as well as various sparse quadratic constraints. Importantly, the constraints enable new strategies for computational tasks involving common lines, in particular for denoising common lines; estimating 3D rotations; and clustering heterogeneous image sets into homogeneous subsets. We demonstrate this by adapting optimization algorithms from other domains to these tasks, using the algebraic constraints. We remark that our constraints seem better suited for numerical optimization than the semialgebraic constraints found in prior work.⁽⁶⁾

Notably, the clustering problem is a recent application of common lines.⁽⁷⁾ In more detail, the goal is to sort discretely heterogeneous image sets of multiple molecules into communities corresponding to homogeneous image subsets. This application is motivated by the increasing complexity of cryo-EM datasets, where samples may not be purified and thus the number of distinct molecules contained in a dataset is more than one.⁽⁸^–¹⁰⁾ Our algebraic constraints and optimization algorithms enable consistency checks of subsets of images, to test whether the subset corresponds to a single molecule.

As a mathematical guarantee, we prove that computing the correct scales in the homogeneous case admits an essentially unique global optimum, see Theorem 5.1. We implement our algorithms and test them on simulated and real datasets in Section 7. The results demonstrate that our methods can be successful when applied to 2D class averages at noise levels comparable to experimental data, in both the homogeneous and discretely heterogeneous cases. We conclude with a discussion of potential future improvements.

1.1. Advantages

There are several advantages to our approach for dealing with common lines:

The new formulation is directly in terms of the data, that is, in terms of the common lines themselves.
It involves multiple common lines simultaneously and does not require triplewise angular reconstitution at all (in contrast to (5) for instance), making our approach fully global and potentially more robust to noise than alternatives.
The algebraic constraints can be incorporated into existing optimization algorithms that have seen success in computer vision applications.⁽¹¹⁾
The resulting algorithms outperform existing methods for denoising and rotation recovery on noisy simulated data, and perform comparably well for clustering heterogeneous image sets on real data, even though the optimization algorithms are off-the-shelf.

2. Background

First, we recall a standard simplified mathematical model for cryo-EM, in the homogeneous case of one molecular conformation. We assume there exists a 3D function $ \varphi :{\mathrm{\mathbb{R}}}^3\to \mathrm{\mathbb{R}} $ describing the electrostatic potential generated by the molecule. As data, we receive $ n $ two-dimensional tomographic projection images, denoted $ {I}_{R^{(i)}}:{\mathrm{\mathbb{R}}}^2\to \mathrm{\mathbb{R}} $ for $ i=1,\dots, n $, where $ {R}^{(i)}\in \mathrm{SO}(3) $ are 3D rotations associated with each image. The goal of single-particle cryo-EM is to recover the underlying 3D structure $ \varphi $ from the set of 2D tomographic projection images which are observed at unknown rotations. The images, in their idealized and noiseless form, have the following Fourier transforms due to the Fourier slice theorem:(1)\[ {\hat{I}}_{R^{(i)}}\left(\hat{x},\hat{y}\right)=\left({R}^{(i)}\cdot \hat{\varphi}\right)\left(\hat{x},\hat{y},0\right). \]Here $ \hat{\varphi}\left(\hat{x},\hat{y},\hat{z}\right)\hskip0.5em := \hskip0.5em {\int}_{{\mathrm{\mathbb{R}}}^3}\varphi \left(x,y,z\right){e}^{\sqrt{-1}\left(x\hat{x}+y\hat{y}+z\hat{z}\right)} dxdydz $ denotes the Fourier transform of $ \varphi $, and $ {R}^{(i)}\cdot \hat{\varphi} $ denotes the rotation of $ \hat{\varphi} $ by $ {R}^{(i)} $. Writing $ {R}^{(i)}={\left({\mathbf{r}}_1^{(i)}\hskip0.5em {\mathbf{r}}_2^{(i)}\hskip0.5em {\mathbf{r}}_3^{(i)}\right)}^{\top } $, (1) reads(2)\[ {\hat{I}}_{R^{(i)}}\left(\hat{x},\hat{y}\right)=\hat{\varphi}\left({\left({R}^{(i)}\right)}^{\top }{\left(\hat{x}\hskip0.2em \hat{y}\hskip0.2em 0\right)}^{\top}\right)=\hat{\varphi}\left(\hat{x}{\mathbf{r}}_1^{(i)}+\hat{y}{\mathbf{r}}_2^{(i)}\right). \]Generically, for asymmetric molecules $ \varphi $ and distinct rotations $ {R}^{(i)} $ and $ {R}^{(j)} $, there exist unique lines through the origin in the domain of the Fourier-transformed images $ {\hat{I}}_{R^{(i)}} $ and $ {\hat{I}}_{R^{(j)}}, $ respectively $ {\mathrm{\ell}}_{ij}\hskip0.35em \subseteq \hskip0.35em \mathrm{domain}\left({\hat{I}}_{R^{(i)}}\right)={\mathrm{\mathbb{R}}}^2 $ and $ {\mathrm{\ell}}_{ji}\hskip0.35em \subseteq \hskip0.35em \mathrm{domain}\left({\hat{I}}_{R^{(j)}}\right)={\mathrm{\mathbb{R}}}^2 $, such that the restrictions(3)\[ {\hat{I}}_{R^{(i)}}{\left|{}_{{\mathrm{\ell}}_{ij}}={\hat{I}}_{R^{(j)}}\right|}_{{\mathrm{\ell}}_{ji}} \]are equal as functions on $ {\mathrm{\mathbb{R}}}^2 $. In cryo-EM, one says that $ {\mathrm{\ell}}_{ij} $ and $ {\mathrm{\ell}}_{ji} $ are the common lines between the $ i $th and $ j $th image. In modest-noise settings, which are arrived at by working with 2D class averages instead of raw tomographic images,⁽¹²⁾ common lines can be estimated from real cryo-EM data. They give basic ways to do 3D reconstruction in cryo-EM; for example, see the angular reconstitution technique of van Heel⁽²⁾ or the works of Shkolnisky, Singer, and their collaborators⁽³^–⁵⁾ for example.

From (2), the common lines $ {\mathrm{\ell}}_{ij} $ and $ {\mathrm{\ell}}_{ji} $ may be found mathematically by expressing the single line in 3D space:(4)\[ \mathrm{span}\left({\mathbf{r}}_1^{(i)},{\mathbf{r}}_2^{(i)}\right)\hskip0.5em \cap \hskip0.5em \mathrm{span}\left({\mathbf{r}}_1^{(j)},{\mathbf{r}}_2^{(j)}\right)=\mathrm{span}{\left({\mathbf{r}}_3^{(i)}\right)}^{\perp}\hskip0.5em \cap \hskip0.5em \mathrm{span}{\left({\mathbf{r}}_3^{(j)}\right)}^{\perp }=\mathrm{span}\left({\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\right)\subseteq \hskip0.5em \operatorname{dom}\left(\hat{\varphi}\right)={\mathrm{\mathbb{R}}}^3 \]in the coordinate system of the ith and jth images, respectively. Here, $ \times $ denotes the cross product in $ {\mathrm{\mathbb{R}}}^3 $. Combining (2) and (4), (3) may be written as:\[ {\displaystyle \begin{array}{l}{\hat{I}}_{R^{(i)}}(\lambda {\hat{x}}_{ij},\lambda {\hat{y}}_{ij}):= {\hat{I}}_{R^{(j)}}(\lambda {\hat{x}}_{ji},\lambda {\hat{y}}_{ji})\hskip1em \mathrm{f}\mathrm{o}\mathrm{r}\hskip0.8em \mathrm{a}\mathrm{l}\mathrm{l}\hskip0.8em \lambda \in \unicode{x211D},\\ {}\mathrm{w}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{e}\hskip0.8em {\hat{x}}_{ij}\hskip0.5em := \hskip0.5em \langle {\mathbf{r}}_1^{(i)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\rangle, \hskip0.8em {\hat{y}}_{ij}\hskip0.5em := \hskip0.5em \langle {\mathbf{r}}_2^{(i)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\rangle, \\ {}\mathrm{a}\mathrm{n}\mathrm{d}\hskip0.8em {\hat{x}}_{ji}\hskip0.5em := \hskip0.5em -\langle {\mathbf{r}}_1^{(j)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\rangle, \hskip0.8em {\hat{y}}_{ji}\hskip0.5em := \hskip0.5em -\langle {\mathbf{r}}_2^{(j)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\rangle \end{array}} \]where $ \left\langle \cdot, \cdot \right\rangle $ denotes the standard inner product in $ {\mathrm{\mathbb{R}}}^3 $. Common lines can therefore be encoded via:

Definition 2.1. Vectors $ {\mathbf{a}}_{ij},{\mathbf{a}}_{ji}\in {\mathrm{\mathbb{R}}}^2 $ are called representatives for the common lines $ {\mathrm{\ell}}_{ij} $ and $ {\mathrm{\ell}}_{ji} $ if there exists a nonzero scalar $ {\lambda}_{ij}={\lambda}_{ji}\in \mathrm{\mathbb{R}} $ such that(5)\[ {\mathbf{a}}_{ij}={\lambda}_{ij}\left(\begin{array}{c}\left\langle {\mathbf{r}}_1^{(i)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\right\rangle \\ {}\left\langle {\mathbf{r}}_2^{(i)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\right\rangle \end{array}\right),\hskip1em {\mathbf{a}}_{ji}={\lambda}_{ji}\left(\begin{array}{c}-\left\langle {\mathbf{r}}_1^{(j)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\right\rangle \\ {}-\left\langle {\mathbf{r}}_2^{(j)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\right\rangle \end{array}\right). \]

Equivalently, representatives $ {\mathbf{a}}_{ij} $ and $ {\mathbf{a}}_{ji} $ are choices of basis vectors for the common lines $ {\mathrm{\ell}}_{ij} $ and $ {\mathrm{\ell}}_{ji} $ which satisfy $ {\hat{I}}_{R^{(i)}}\left(\lambda {\mathbf{a}}_{ij}\right)={\hat{I}}_{R^{(j)}}\left(\lambda {\mathbf{a}}_{ji}\right) $ for all $ \lambda \in \mathrm{\mathbb{R}} $. Representatives for common lines can be estimated from 2D class averages in practice.

We stress that, although quite standard, the model (1) is greatly simplified. It neglects the effects of contrast transfer functions (CTFs), imperfect centering in particle picking, and blurring in class averaging. Further, we have restricted attention to the case of asymmetric molecules, as otherwise common lines are only unique up to the action of the relevant symmetry group (e.g., see (13)).

3. Constraints on sets of common lines

3.1. The common lines matrix

We introduce an object to keep track of all common lines in a dataset. It is the main object of this article.

Definition 3.1. A common lines matrix associated to rotations $ {R}^{(1)},\dots, {R}^{(n)}\in \mathrm{SO}(3) $ is a matrix $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $, which when regarded as an $ n\times n $ block matrices with $ 2\times 1 $ blocks $ {\mathbf{a}}_{ij}\in {\mathrm{\mathbb{R}}}^2 $ is such that $ {\mathbf{a}}_{ij},{\mathbf{a}}_{ji} $ are representatives for the common lines $ {\mathrm{\ell}}_{ij},{\mathrm{\ell}}_{ji} $ if $ i $ and $ j $ are distinct and $ {\mathbf{a}}_{ii}=0 $ otherwise. If the scalars $ {\lambda}_{ij} $ in (5) are all equal to 1, then we call $ A $ the pure common lines matrix.

Thus a common lines matrix $ A $ associated to $ {R}^{(1)},\dots, {R}^{(n)} $ is uniquely defined up to $ \left(\begin{array}{c}n\\ {}2\end{array}\right) $ nonzero real scalars $ {\lambda}_{ij} $ ($ i<j $). In real data settings where the 2D class averages are sufficiently denoised, we can estimate $ A $ from data by estimating representatives for the common lines.

We now present constraints which a pure common lines matrix must satisfy. Firstly, there is the following low-rank condition. All of our computational methods take advantage of this.

Theorem 3.2. Let $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ be the pure common lines matrix associated to Zariski-generic rotations $ {R}^{(1)},\dots, {R}^{(n)}\in \mathrm{SO}(3) $ where $ n\ge 3 $. Then $ \operatorname{rank}(A)=3 $.

Proof. Since\[ {\mathbf{a}}_{ij}=\left(\begin{array}{c}\left\langle {\mathbf{r}}_1^{(i)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\right\rangle \\ {}\left\langle {\mathbf{r}}_2^{(i)},{\mathbf{r}}_3^{(i)}\times {\mathbf{r}}_3^{(j)}\right\rangle \end{array}\right)=\left(\begin{array}{c}\left\langle {\mathbf{r}}_1^{(i)}\times {\mathbf{r}}_3^{(i)},{\mathbf{r}}_3^{(j)}\right\rangle \\ {}\left\langle {\mathbf{r}}_2^{(i)}\times {\mathbf{r}}_3^{(i)},{\mathbf{r}}_3^{(j)}\right\rangle \end{array}\right)=\left(\begin{array}{c}-\left\langle {\mathbf{r}}_2^{(i)},{\mathbf{r}}_3^{(j)}\right\rangle \\ {}\left\langle {\mathbf{r}}_1^{(i)},{\mathbf{r}}_3^{(j)}\right\rangle \end{array}\right)={\left(\begin{array}{c}-{{\mathbf{r}}_2^{(i)}}^{\top}\\ {}{{\mathbf{r}}_1^{(i)}}^{\top}\end{array}\right)}_{2\times 3}{\mathbf{r}}_3^{(j)}, \]the pure commons line matrix admits the following factorization:(6)\[ A={\left(\begin{array}{c}-{{\mathbf{r}}_2^{(1)}}^{\top}\\ {}{{\mathbf{r}}_1^{(1)}}^{\top}\\ {}\vdots \\ {}-{{\mathbf{r}}_2^{(n)}}^{\top}\\ {}{{\mathbf{r}}_1^{(n)}}^{\top}\end{array}\right)}_{2n\times 3}{\left(\begin{array}{ccc}{\mathbf{r}}_3^{(1)}& \cdots & {\mathbf{r}}_3^{(n)}\end{array}\right)}_{3\times n}. \]

Equation (6) witnesses $ \operatorname{rank}(A)\le \min \left(3,n\right) $ $ =3 $. We have equality when $ {R}^{(i)} $ are generic because the two matrices in the factorization are full rank.

There are also necessary quadratic constraints in the entries of a pure common lines matrix.

Proposition 3.3. Suppose $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ is a pure common lines matrix where $ n\ge 3 $. Then for any $ 1\le i<j\le n $, we have $ \parallel {\mathbf{a}}_{ij}{\parallel}_2^2=\parallel {\mathbf{a}}_{ji}{\parallel}_2^2 $.

Proof. See Appendix A.

Proposition 3.4. Suppose $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ s a pure common lines matrix where $ n\ge 3 $. Then for any $ 1\le i<j<k\le n $, we have $ \det \left(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array}\right)=-\det \left(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array}\right)=\det \left(\begin{array}{cc}{\mathbf{a}}_{ki}& {\mathbf{a}}_{kj}\end{array}\right) $.

Proof. See Appendix A.

From Propositions 3.3 and 3.4, the number of quadratic constraints on a pure common lines matrix equals $ \left(\begin{array}{l}n\\ {}2\end{array}\right)+2\left(\begin{array}{l}n\\ {}3\end{array}\right) $.

Example 1. Consider $ n=4 $. Then a pure common lines matrix written in block form,\[ A=\left(\begin{array}{cccc}\mathbf{0}& {\mathbf{a}}_{12}& {\mathbf{a}}_{13}& {\mathbf{a}}_{14}\\ {}{\mathbf{a}}_{21}& \mathbf{0}& {\mathbf{a}}_{23}& {\mathbf{a}}_{24}\\ {}{\mathbf{a}}_{31}& {\mathbf{a}}_{32}& \mathbf{0}& {\mathbf{a}}_{34}\\ {}{\mathbf{a}}_{41}& {\mathbf{a}}_{42}& {\mathbf{a}}_{43}& \mathbf{0}\end{array}\right) \]has rank at most 3 and satisfies 14 quadratic equations, which are\[ {\displaystyle \begin{array}{c}{\displaystyle \begin{array}{rlrl}\Vert {\mathbf{a}}_{12}{\Vert}_2^2& =\Vert {\mathbf{a}}_{21}{\Vert}_2^2& \Vert {\mathbf{a}}_{14}{\Vert}_2^2& =\Vert {\mathbf{a}}_{41}{\Vert}_2^2\\ {}\Vert {\mathbf{a}}_{13}{\Vert}_2^2& =\Vert {\mathbf{a}}_{31}{\Vert}_2^2& \Vert {\mathbf{a}}_{24}{\Vert}_2^2& =\Vert {\mathbf{a}}_{42}{\Vert}_2^2\\ {}\Vert {\mathbf{a}}_{23}{\Vert}_2^2& =\Vert {\mathbf{a}}_{32}{\Vert}_2^2& \Vert {\mathbf{a}}_{34}{\Vert}_2^2& =\Vert {\mathbf{a}}_{43}{\Vert}_2^2\end{array}}\end{array}} \]\[ {\displaystyle \begin{array}{c}\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{12}& {\mathbf{a}}_{13}\end{array})=-\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{21}& {\mathbf{a}}_{23}\end{array})=\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{31}& {\mathbf{a}}_{32}\end{array})\\ {}\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{12}& {\mathbf{a}}_{14}\end{array})=-\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{21}& {\mathbf{a}}_{24}\end{array})=\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{41}& {\mathbf{a}}_{42}\end{array})\\ {}\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{13}& {\mathbf{a}}_{14}\end{array})=-\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{31}& {\mathbf{a}}_{34}\end{array})=\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{41}& {\mathbf{a}}_{43}\end{array})\\ {}\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{23}& {\mathbf{a}}_{24}\end{array})=-\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{32}& {\mathbf{a}}_{34}\end{array})=\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{42}& {\mathbf{a}}_{43}\end{array})\end{array}} \]

where $ \parallel \cdot {\parallel}_2 $ denotes the Euclidean norm. Note that $ \operatorname{rank}(A)\le 3 $ is equivalent to the vanishing of all $ 4\times 4 $ minors of $ A $, giving a collection of homogeneous degree $ 4 $ polynomial constraints on the entries of $ A $.

We note that (6) furnishes a polynomial map which sends an n-tuple of rotations to a pure common lines matrix:(7)\[ {\displaystyle \begin{array}{l}\hskip1.12em \psi :\mathrm{SO}{(3)}^n\to {\mathrm{\mathbb{R}}}^{2n\times n}\\ {}\left({R}^{(1)},\dots, {R}^{(n)}\right)\mapsto A.\end{array}} \]Studying $ \psi $ will allow us understand additional important properties of pure common lines matrices. To do this, we will need to introduce some terminology and elementary concepts from algebraic geometry (see (14) for precise definitions.)

A subset $ X\hskip0.5em \subseteq \hskip0.5em {\mathrm{\mathbb{R}}}^d $ is called an algebraic variety if it is the set of points in $ {\mathrm{\mathbb{R}}}^d $ where a finite collection of polynomials all simultaneously equal 0. For example, $ SO(3) $ is an algebraic variety since it is the set of matrices $ R $ in $ {\mathrm{\mathbb{R}}}^{3\times 3}\cong {\mathrm{\mathbb{R}}}^9 $ satisfying the polynomial equations $ {R}^{\top }R-{I}_{3\times 3}={0}_{3\times 3} $ and $ \det (R)-1=0 $. Roughly speaking, an algebraic variety is similar to an embedded manifold, expect possibly singular and always defined by polynomial equations. Due to the properties of polynomials, an algebraic variety $ X $ is a “thin” subset of $ {\mathrm{\mathbb{R}}}^d $ in which it lives: provided $ X\ne {\mathrm{\mathbb{R}}}^d $, the complement of $ X $ is always a dense subset filling up almost the entirety of the ambient space. More precisely, if one samples a random point from $ {\mathrm{\mathbb{R}}}^d $ according to any absolutely continuous probability distribution, then with probability $ 1 $ the point will lie in the complement of $ X $. We say that some property $ P $ holds (Zariski) generically if it holds for all points in the complement of some algebraic variety $ X\subsetneq {\mathrm{\mathbb{R}}}^d $, and we call such points (Zariski) generic. Roughly speaking, this means that property $ P $ holds with probability $ 1 $ (even if, as usually the case, the variety $ X\subsetneq {\mathrm{\mathbb{R}}}^d $ is left unspecified).

Recall that the fiber of a map at a point $ p $ in its image is the set of points in its domain which map to $ p $. Therefore to answer the question, “Does a pure common lines matrix uniquely determine the rotations which generated it?” we need to understand the fibers of the map $ \psi $ in (7). Our next result shows that the answer to the question is “Yes,” up to a global rotation, provided the pure common lines matrix is generic.

Theorem 3.5. For $ n\ge 3, $ generically, the fibers of the map $ \psi $ are isomorphic to $ \mathrm{SO}(3) $. More precisely, for generic rotations $ {R}^{(1)},\dots, {R}^{(n)} $ it holds that

\[ {\psi}^{-1}\left(\psi \left({R}^{(1)},\dots, {R}^{(n)}\right)\right)=\left\{\left({R}^{(1)}Q,\dots, {R}^{(n)}Q\right):Q\in \mathrm{SO}(3)\right\}. \]

Proof. See Appendix A.

Remark 3.6. Theorem 3.5 does not contradict the chirality ambiguity in cryo-EM, which states that the 3D molecule and rotations can only ever be recovered up to a global rotation and global reflection given cryo-EM data. In Theorem 5.1, we prove that there are two possible pure common lines matrices for a given non-pure common lines matrix. They differ by a global sign, and correspond to rotation tuples $ \left({R}^{(1)},\dots, {R}^{(n)}\right) $ and $ \left({JR}^{(1)},\dots, {JR}^{(n)}\right) $ where $ J=\mathit{\operatorname{diag}}\left(-1,-1,1\right) $, respectively. The chirality or handedness ambiguity is well-known in the common lines literature⁽²^,⁵⁾ and unavoidable.

3.2. The common lines variety

In general, the image of a polynomial map from an algebraic variety is not an algebraic variety, because polynomial inequalities (in addition to equations) are needed in the description of the image.⁽¹⁵⁾ This means that the set of all pure common lines matrices, that is, the image of $ \psi $ from $ SO(3) $, on its own is not an algebraic variety. To resolve this, we consider the smallest algebraic variety in $ {\mathrm{\mathbb{R}}}^{2n\times n} $ containing $ \psi \left( SO(3)\right) $, that is, we add the smallest set of additional points (which are not pure common lines matrices) to $ \psi \left( SO(3)\right) $ until the union becomes an algebraic variety. The process is called taking the Zariski closure of $ \psi \left( SO(3)\right) $. We call the resulting algebraic variety the common lines variety and it lives in the ambient space $ {\mathrm{\mathbb{R}}}^{2n\times n} $. The common lines variety is defined by polynomial equations in the entries of a matrix $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $. Since the common lines variety includes all pure common lines matrices, the polynomials defining it, in particular, include the constraints we already identified in Section 3.1.

Example 2. When $ n=3 $, we used the computer algebra system Macaulay2⁽¹⁶⁾ to determine the collection of polynomial equations defining the common lines variety. Along with the 5 quadratic polynomials from Propositions 3.3 and 3.4, our computation also found 1 polynomial of degree 6, 64 polynomials of degree 8, and 24 polynomials of degree 10, for a total of 94 polynomial equations. Notice that for $ n=3 $, the rank $ 3 $ constraint of Theorem 3.2 is vacuous. We find the 5 quadratics are the only homogeneous polynomials. The other 89 equations are highly complex and we refrain from explicitly writing them here. They are available at the GitHub repository (8.1).

In view of Example 2, we believe that Section 3.1 identifies all “simple-to-describe” algebraic constraints on pure common lines matrices. As such, it is important to understand to what extent these constraints are enough to characterize pure common lines matrices. This requires understanding the geometry of the common lines variety better, for which we will need to use a couple more basic concepts from algebraic geometry described in the next two paragraphs.

In general, every algebraic variety $ X\subseteq {\mathrm{\mathbb{R}}}^d $ admits a unique decomposition into a finite union of irreducible components $ X={\cup}_{i=1}^r{X}_i $, where $ {X}_i\subseteq {\mathrm{\mathbb{R}}}^d $ are algebraic varieties themselves and each cannot be decomposed as a union of two strictly smaller varieties. We think of $ {X}_i $ as “building blocks” of $ X $. For example, the variety $ \left\{\left(x,y\right)\in {\mathrm{\mathbb{R}}}^2: xy=0\right\} $ is a union of the $ x $- and $ y $-axes, and these lines are its irreducible components. In general, each irreducible component $ {X}_i $ can be ascribed a dimension, which captures the number of degrees of freedom in $ {X}_i $ and coincides with manifold dimension when $ {X}_i $ is smooth. We note that the dimension of different irreducible components $ {X}_i $ of $ X $ may differ.

Given an algebraic variety $ X\subseteq {\mathrm{\mathbb{R}}}^d $, one can construct from it a larger algebraic variety $ C(X)\subseteq {\mathrm{\mathbb{R}}}^d $ called the cone over $ X $ by adding to $ X $ all points in $ {\mathrm{\mathbb{R}}}^d $ which lie on a line passing through the origin and a point on $ X $. This constructs an algebraic variety that includes all scalar multiples of points of $ X $.

Since the constraints we identified in Section 3.1 are all polynomial equations, they define Section 3.1 as an algebraic variety in $ {\mathrm{\mathbb{R}}}^{2n\times n} $. In the next proposition, we show that this algebraic variety contains the cone over the common lines variety as an irreducible component. This means that locally on this component, our constraints from Section 3.1 are sufficient to characterize pure common lines matrices up to scale. The proof relies on computer algebra software,⁽¹⁶⁾ and checks that certain numerical matrices are full-rank, so we also report the range of $ n $ on which the proposition has been confirmed.

Proposition 3.7. The algebraic variety defined by the low-rank constraint in Theorem 3.2, along with the quadratic constraints in Propositions 3.3 and 3.4, and the requirement that all diagonal blocks are $ 0 $, contains the cone over the common lines variety in $ {\mathrm{\mathbb{R}}}^{2n\times n} $ as an irreducible component for $ n=3,\dots, 50 $.

Proof. See Appendix A.

4. Optimization problem

We encode the common lines from cryo-EM data by choosing representatives $ {\hat{\mathbf{a}}}_{ij}\in {\mathrm{\mathbb{R}}}^2 $ (Definition 2.1) to form the $ 2\times 1 $ blocks in a common lines matrix $ \hat{A}\in {\mathrm{\mathbb{R}}}^{2n\times n} $. Suppose we have rescaled the $ 2\times 1 $ blocks of $ \hat{A} $ so they all have norm 1. Then at least in clean situations, Theorem 3.2 and Propositions 3.3 and 3.4 imply we can scale the blocks by nonzero scalars $ {\lambda}_{ij} $ with $ {\lambda}_{ij}={\lambda}_{ji} $ so that the resulting matrix is a pure common lines matrix $ A $, and thus has rank 3 and satisfies the set of norm and $ 2\times 2 $ determinant equations. Proposition 3.7 states that these constraints are sufficient to determine the common lines variety locally. In Section 5.4, we further prove that for purposes of recovering scales $ {\lambda}_{ij} $ to obtain a pure common lines matrix, the constraints are also sufficient.

Proper scales are not directly available from common lines data in cryo-EM. To find the scales we formulate an optimization problem, inspired by work for a mathematically similar problem in (11):(8)\[ \hskip-10em \underset{\begin{array}{c}\left\{{\mathbf{a}}_{ij}\right\},\left\{{\lambda}_{ij}\right\}\\ {}i,j=1,\dots, n\end{array}}{\min}\hskip1em \sum \limits_{i,j=1}^n\parallel {\hat{\mathbf{a}}}_{ij}-{\lambda}_{ij}{\mathbf{a}}_{ij}{\parallel}_2 \](9)\[ \hskip-17em \mathrm{subject}\ \mathrm{to}\hskip2em \left\{\begin{array}{l}{\mathbf{a}}_{ii}=0\hskip0.8em \mathrm{for}\hskip0.8em \mathrm{all}\hskip0.8em 1\le i\le n\\ {}\operatorname{rank}(A)=3\\ {}{\lambda}_{ij}={\lambda}_{ji}\hskip0.8em \mathrm{for}\hskip0.8em \mathrm{all}\hskip0.8em 1\le i<j\le n,\end{array}\right. \](10)

The mixed L1/Frobenius norm $ \parallel \cdot {\parallel}_2 $ in the objective is chosen for its robustness to outliers.

Once we obtain a pure common lines matrix, we show in Section 5.3 how to recover the rotations corresponding to the common lines (up to the ambiguity in Remark 3.6). Later in Section 6, we solve the problem (8) to identify homogeneous clusters among images coming from a discrete number of distinct molecules.

5. Optimization algorithms

Our approach to solving (8) is first to solve the problem with constraint (9) only, and then to enforce the constraint (10) on the solution. These steps are in Sections 5.1 and 5.2, respectively.

5.1. IRLS and ADMM for the rank constraint

To solve (8) with the rank constraint, we closely follow the approach of (11). We relax the mixed L1/Frobenius norm to a weighted least squares objective, where the weights and optimization variables are updated after each iteration of minimization via a procedure called Iterative Reweighted Least Squares (IRLS).⁽¹⁷⁾ Let $ t $ denote the IRLS iteration number. Then the objective (8) becomes:(11)\[ \underset{A\in {\mathrm{\mathbb{R}}}^{2n\times n},\Lambda \in {\mathrm{\mathbb{R}}}^{n\times n}}{\min}\hskip1em \parallel \hat{A}-\left(\Lambda \otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em A{\parallel}_{WF}^2, \]where $ \otimes $ and $ \hskip0.5em \odot \hskip0.5em $ is the Kronecker and Hadamard product of two matrices respectively, $ {\Lambda}_{ij}={\lambda}_{ij}, $\[ \parallel M{\parallel}_{\mathrm{WF}}^2\hskip0.5em := \hskip0.5em \sum \limits_{i,j=1}^n{w}_{ij}^{(t)}\parallel {\mathbf{m}}_{ij}{\parallel}_2^2 \]is the squared weighted Frobenius norm of a block matrix $ M\in {\mathrm{\mathbb{R}}}^{2n\times n} $, the weights in (11) are(12)\[ {w}_{ij}^{(t)}=\left\{\begin{array}{ll}1/\left(\max \left\{\delta, \parallel {\hat{\mathbf{a}}}_{ij}-{\lambda}_{ij}^{\left(t-1\right)}{\mathbf{a}}_{ij}^{\left(t-1\right)}{\parallel}_2\right\}\right)& \mathrm{if}\hskip0.8em i\ne j\\ {}0& \mathrm{if}\hskip0.8em i=j,\end{array}\right. \]and $ 0<\delta \ll 1 $ is a chosen regularization parameter.

Within each iteration of IRLS, we need to solve the problem (11) with the constraints (9). Since the objective is bilinear in $ A $ and $ \Lambda $, we can do this using the Alternate Direction Method of Multiplier (ADMM).⁽¹⁸^,¹⁹⁾ This gives an augmented Lagrangian optimization problem:(13)\[ {\displaystyle \begin{array}{rl}\underset{\Gamma \in {\unicode{x211D}}^{2n\times n}}{\max}\hskip1em \underset{A,B\in {\unicode{x211D}}^{2n\times n},\hskip2pt \Lambda \in {\unicode{x211D}}^{n\times n}}{\min}\hskip1em & \frac{1}{2}\Vert \hat{A}-(\Lambda \otimes {\mathbf{1}}_{2\times 1})\odot A{\Vert}_{WF}^2+\frac{\tau }{2}\Vert B-(A+\Gamma ){\Vert}_F^2\\ {}\mathrm{subject}\ \mathrm{to}\hskip2.00em & \left\{\begin{array}{l}{\mathbf{a}}_{ii}=\mathbf{0}\ \mathrm{for}\ \mathrm{all}\ 1\le i\le n\\ {}\Lambda ={\Lambda}^{\mathrm{\top}}\\ {}\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}(B)=3,\end{array}\right.\operatorname{}\end{array}} \]where $ \Gamma $ is a matrix of Lagrange multipliers and $ \tau ={\sum}_{i,j=1}^n{w}_{ij}^{(t)} $. We now describe the steps of the ADMM procedure. Since the problem (13) is non-convex, we alternatingly optimize for each variable. In the following, let $ k $ denote the ADMM iteration number and let $ {W}^{(t)}\in {\mathrm{\mathbb{R}}}^{n\times n} $ where $ {W}_{ij}^{(t)}={w}_{ij}^{(t)} $ be the matrix of weights within IRLS iteration $ t $.

1. Optimize $ A $ and $ \Lambda $: We alternatingly optimize for $ A $ and $ \Lambda $ until convergence. Let $ {k}^{\prime } $ denote the iteration number for this step.
1a. First we solve the unconstrained problem for $ A $:\[ \underset{A\in {\mathrm{\mathbb{R}}}^{2n\times n}}{\min}\hskip1em \frac{1}{2}\parallel \hat{A}-\left({\Lambda}^{\left({k}^{\prime}\right)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em A{\parallel}_{\mathrm{WF}}^2+\frac{\tau }{2}\parallel {B}^{(k)}-\left(A+{\Gamma}^{(k)}\right){\parallel}_F^2. \]The solution is(14)\[ {\displaystyle \begin{array}{l}{A}^{\left({k}^{\prime }+1\right)}=\left(\left({W}^{(t)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em \left({\Lambda}^{\left({k}^{\prime}\right)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em \hat{A}+\frac{\tau }{4}\left({B}^{(k)}+{\Gamma}^{(k)}\right)\right)\\ {}\hskip5em \oslash \left(\left({W}^{(t)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em \left({\Lambda}^{\left({k}^{\prime}\right)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em \left({\Lambda}^{\left({k}^{\prime}\right)}\otimes {\mathbf{1}}_{2\times 1}\right)+\frac{\tau }{4}{\mathbf{1}}_{2n\times n}\right),\end{array}} \]where $ \oslash $ is the element-wise division of two matrices. Then we project $ A $ onto the set of matrices whose $ 2\times 1 $ diagonals are 0:(15)\[ {\mathbf{a}}_{ii}^{\left({k}^{\prime }+1\right)}=\mathbf{0}. \]1b. Next we solve the unconstrained problem for $ \Lambda $:\[ {\displaystyle {\displaystyle \begin{array}{rl}\underset{\Lambda \in {\unicode{x211D}}^{n\times n}}{\min }& \hskip1em \frac{1}{2}\Vert \hat{A}-(\Lambda \otimes {\mathbf{1}}_{2\times 1})\odot {A}^{({k}^{\mathrm{\prime}}+1)}{\Vert}_{WF}^2\\ {}\mathrm{subject}\ \mathrm{to}& \hskip1em \Lambda ={\Lambda}^{\mathrm{\top}}\end{array}}} \]The solution is(16)\[ {\lambda}_{ij}^{\left({k}^{\prime }+1\right)}=\left\{\begin{array}{ll}\frac{w_{ij}\left\langle {\hat{\mathbf{a}}}_{ij},{\mathbf{a}}_{ij}^{\left({k}^{\prime }+1\right)}\right\rangle +{w}_{ji}\left\langle {\hat{\mathbf{a}}}_{ji},{\mathbf{a}}_{ji}^{\left({k}^{\prime }+1\right)}\right\rangle }{w_{ij}\parallel {\mathbf{a}}_{ij}^{\left({k}^{\prime }+1\right)}{\parallel}_2^2+{w}_{ji}\parallel {\mathbf{a}}_{ji}^{\left({k}^{\prime }+1\right)}{\parallel}_2^2}& \hskip2.8em \mathrm{if}\hskip0.8em i\ne j\\ {}0& \hskip2.8em \mathrm{if}\hskip0.8em i=j.\end{array}\right. \]After repeating 1a. and 1b. until convergence, we obtain $ {A}^{\left(k+1\right)} $ and $ {\Lambda}^{\left(k+1\right)} $.
2. Optimize $ B $: The constrained problem for $ B $ is\[ {\displaystyle \begin{array}{rl}\underset{B\in {\unicode{x211D}}^{2n\times n}}{\min }& \hskip1em \frac{\tau }{2}\Vert B-({A}^{(k+1)}+{\Gamma}^{(k)}){\Vert}_F^2\\ {}\mathrm{subject}\ \mathrm{to}& \hskip1em \mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}(B)=3\end{array}} \]This is solved by(17)\[ {B}^{\left(k+1\right)}=\mathrm{SVP}\left({A}^{\left(k+1\right)}-{\Gamma}^{(k)},3\right), \]where $ \mathrm{SVP}\left(M,3\right) $ is the singular value projection of a matrix $ M $ onto the set of matrices of rank at most $ 3 $, which is computing by taking the highest three singular values of $ M $ and its corresponding left and right singular vectors.
3. Update $ \Gamma $: In ADMM, there is a gradient ascent step for $ \Gamma $, where the step is the solution to\[ \underset{\Gamma \in {\mathrm{\mathbb{R}}}^{2n\times n}}{\max}\parallel {B}^{\left(k+1\right)}-\left({A}^{\left(k+1\right)}-\Gamma \right){\parallel}_F^2. \]This gives the update(18)\[ {\Gamma}^{\left(k+1\right)}={\Gamma}^{(k)}+\left({B}^{\left(k+1\right)}-{A}^{\left(k+1\right)}\right). \]

Steps 1, 2, and 3 are repeated until convergence in the optimization variables. This completes the ADMM procedure for IRLS iteration $ t $. The IRLS weights $ {w}_{ij}^{\left(t+1\right)} $ for iteration $ t+1 $ are updated using (12), and the ADMM procedure is repeated again. The whole pipeline is detailed in Algorithm 1.Algorithm 1. IRLS and ADMM for rank constraint satisfaction

Input: $ \hat{A}\in {\mathrm{\mathbb{R}}}^{2n\times n} $, a common lines matrix

Output: $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $, a common lines matrix satisfying only the rank constraint

1: procedure IRLS-ADMM $ \left(\hat{A}\right) $

2: initialize $ A,\Lambda, W $

3: $ t\leftarrow 0 $

4: while not converged do

5: $ B\leftarrow A $

6: $ \Gamma \leftarrow {0}_{2n\times n} $

7: $ \tau \leftarrow {\sum}_{i,j=1}^n{w}_{ij}^{(t)} $

8: $ k\leftarrow 0 $

9: while not converged do

10: $ {k}^{\prime}\leftarrow 0 $

11: while not converged do

12: update $ A $ using (14) and (15) $ \vartriangleright $ 1. Update $ A $ and $ \Lambda $

13: update $ \Lambda $ using (16)

14: $ {k}^{\prime}\leftarrow {k}^{\prime }+1 $

15: end while

16: update $ B $ using (17) $ \vartriangleright $ 2. Update $ B $

17: update $ \Gamma $ using (18) $ \vartriangleright $ 3. Update $ \Gamma $

18: $ k\leftarrow k+1 $

19: end while

20: update $ W $ using (12) $ \vartriangleright $ Update IRLS weights

21: $ t\leftarrow t+1 $

22: end while

23: end procedure

5.2. Sinkhorn scaling for the quadratic constraints

When successful, IRLS-ADMM in Section 5.1 gives us a solution $ A $ to (8) satisfying the constraint (9). Next, we must enforce (10). As described below, our approach is to scale the rows and columns of $ A $ alternatingly until constraint (10) is satisfied, in a manner analogous to Sinkhorn’s algorithm.⁽²⁰⁾ Note that nonzero row and column scales do not affect the rank of $ A $, so constraint (9) will still be satisfied.

We find the row and column scales by solving least squares problems. First, we handle the norm constraints. Define $ M\in {\mathrm{\mathbb{R}}}^{n\times n} $ where(19)\[ {M}_{ij}=\parallel {\mathbf{a}}_{ij}{\parallel}_2^2. \]

Then the norm constraints are satisfied if and only if $ M={M}^{\top } $, which leads us to the following constrained least squares problems:(20)\[ \boldsymbol{\mu} =\underset{\parallel \boldsymbol{\mu} {\parallel}_2=1}{\arg \min}\parallel \operatorname{diag}\left(\boldsymbol{\mu} \right)M-{\left(\operatorname{diag}\left(\boldsymbol{\mu} \right)M\right)}^{\top }{\parallel}_F^2, \](21)\[ \boldsymbol{\tau} =\underset{\parallel \boldsymbol{\tau} {\parallel}_2=1}{\arg \min}\parallel M\operatorname{diag}\left(\boldsymbol{\tau} \right)-{\left(M\operatorname{diag}\left(\boldsymbol{\tau} \right)\right)}^{\top }{\parallel}_F^2. \]The solutions to problems (20) and (21) are(22)\[ \underset{\parallel \boldsymbol{\mu} {\parallel}_2=1}{\min }{\left\Vert {N}_L\boldsymbol{\mu} \right\Vert}_2^2, \](23)\[ \underset{\parallel \boldsymbol{\tau} {\parallel}_2=1}{\min }{\left\Vert {N}_R\boldsymbol{\tau} \right\Vert}_2^2, \]respectively, where $ {N}_L,{N}_R\in {\mathrm{\mathbb{R}}}^{n\times n} $ are the corresponding least squares matrices. See Appendix B for full details. The problems (22) and (23) are solved by taking $ \boldsymbol{\mu} $ and $ \boldsymbol{\tau} $ to be the right singular vector corresponding to the smallest singular value of $ {N}_L $ and $ {N}_R $, respectively.

Now we handle the determinant constraints. Scaling each $ 2\times n $ row of $ A $ by $ {\mu}_1,\dots, {\mu}_n\in \mathrm{\mathbb{R}} $ and enforcing the constraints leads us to the equations(24)\[ {\mu}_i^2\det \left(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array}\right)=-{\mu}_j^2\det \left(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array}\right)={\mu}_k^2\det \left(\begin{array}{cc}{\mathbf{a}}_{ki}& {\mathbf{a}}_{kj}\end{array}\right) \]for all $ 1\le i<j<k\le n $. Taking the signed root on each equation, we obtain(25)\[ {\displaystyle \begin{array}{rl}{\mu}_i\mathrm{sgn}(\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array}))\sqrt{|\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array})|}& \hskip-0.5em =-{\mu}_j\mathrm{sgn}(\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array}))\sqrt{|\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array})|}\\ {}& \hskip-0.5em ={\mu}_k\mathrm{sgn}(\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ki}& {\mathbf{a}}_{kj}\end{array}))\sqrt{|\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ki}& {\mathbf{a}}_{kj}\end{array})|}\end{array}} \]Scaling the columns of $ A $ by $ {\tau}_1,\dots, {\tau}_n\in \mathrm{\mathbb{R}} $ and enforcing the constraints leads to the equations(26)\[ {\tau}_j{\tau}_k\det \left(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array}\right)=-{\tau}_i{\tau}_k\det \left(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array}\right)={\tau}_i{\tau}_j\det \left(\begin{array}{cc}{\mathbf{a}}_{ki}& {\mathbf{a}}_{kj}\end{array}\right) \]for all $ 1\le i<j<k\le n $. Dividing the first equation above by $ {\tau}_k $ on both sides and the second equation above by $ {\tau}_i $ on both sides, we obtain(27)\[ {\displaystyle \begin{array}{rl}{\tau}_j\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array})& \hskip-0.7em =-{\tau}_i\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array})\\ {}-{\tau}_k\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array})& \hskip-0.7em ={\tau}_j\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ki}& {\mathbf{a}}_{kj}\end{array})\end{array}} \]We observe that (25) and (27) are linear in $ {\mu}_i $ and $ {\tau}_i $. We can enumerate all determinants into three vectors(28)\[ {\displaystyle \begin{array}{rl}{\mathbf{v}}_1& \hskip-0.7em ={(\begin{array}{c}\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array})\end{array})}_{1\le i<<<j<k\le n}\\ {}{\mathbf{v}}_2&&& \hskip-0.7em ={(\begin{array}{c}-\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array})\end{array})}_{1\le i<<<j<k\le n}\\ {}{\mathbf{v}}_3&&& \hskip-0.7em ={(\begin{array}{c}\mathrm{det}(\begin{array}{cc}{\mathbf{a}}_{ki}& {\mathbf{a}}_{kj}\end{array})\end{array})}_{1\le i<<<j<k\le n},\end{array}} \]each of length $ \left(\begin{array}{l}n\\ {}3\end{array}\right) $. The determinant constraints are satisfied if and only if $ {\mathbf{v}}_1={\mathbf{v}}_2={\mathbf{v}}_3 $, which leads to the following constrained least squares problems:(29)\[ \underset{\parallel \boldsymbol{\mu} {\parallel}_2=1}{\min}\parallel \left(\boldsymbol{\mu} \hskip0.8em \Delta \hskip0.8em {\mathbf{v}}_1\right)-\left(\boldsymbol{\mu} \hskip0.8em \Delta \hskip0.8em {\mathbf{v}}_2\right){\parallel}_2^2+\parallel \left(\boldsymbol{\mu} \hskip0.8em \Delta \hskip0.8em {\mathbf{v}}_2\right)-\left(\boldsymbol{\mu} \hskip0.8em \Delta \hskip0.8em {\mathbf{v}}_3\right){\parallel}_2^2, \](30)\[ \underset{\parallel \boldsymbol{\tau} {\parallel}_2=1}{\min}\parallel (\boldsymbol{\tau} \hskip0.8em {\Delta}_1\hskip0.8em {\mathbf{v}}_1)-(\boldsymbol{\tau} \hskip0.8em {\Delta}_1\hskip0.8em {\mathbf{v}}_2){\parallel}_2^2+\parallel (\boldsymbol{\tau} \hskip0.8em {\Delta}_2\hskip0.8em {\mathbf{v}}_2)-(\boldsymbol{\tau} \hskip0.8em {\Delta}_2\hskip0.8em {\mathbf{v}}_3){\parallel}_2^2, \]where the quantities in brackets (defined in (B5) and (B6)) are the corresponding scalings (25) and (27) of $ {\mathbf{v}}_1 $, $ {\mathbf{v}}_2 $, and $ {\mathbf{v}}_3 $ by $ \boldsymbol{\mu} $ and $ \boldsymbol{\tau} $. The solutions to problems (29) and (30) are(31)\[ \underset{\parallel \boldsymbol{\mu} {\parallel}_2=1}{\min }{\left\Vert \left({D}_{L,1}+{D}_{L,2}\right)\boldsymbol{\mu} \right\Vert}_2^2, \](32)\[ \underset{\parallel \boldsymbol{\tau} {\parallel}_2=1}{\min }{\left\Vert \left({D}_{R,1}+{D}_{R,2}\right)\boldsymbol{\tau} \right\Vert}_2^2, \]respectively, where $ {D}_{L,1},{D}_{L,2},{D}_{R,1},{D}_{R,2}\in {\mathrm{\mathbb{R}}}^{n\times n} $ are the corresponding least squares matrices. See Appendix B for full details. The problems (31) and (32) are again solved using SVD.

Now we describe the steps of the Sinkhorn scaling method. Let $ r $ denote the iteration number of the procedure.

1. Scale rows: Let $ \boldsymbol{\mu} \in {\mathrm{\mathbb{R}}}^n $ be the solution to

(33)\[ \underset{\parallel \boldsymbol{\mu} {\parallel}_2=1}{\min }{\left\Vert \left({N}_L+{D}_{L,1}+{D}_{L,2}\right)\boldsymbol{\mu} \right\Vert}_2^2. \]Then we perform the update(34)\[ {A}^{\left(r+\frac{1}{2}\right)}=\frac{\parallel {A}^{(r)}{\parallel}_F^2}{\parallel \left(\operatorname{diag}\left(\boldsymbol{\mu} \right)\otimes {\mathbf{1}}_{2\times 1}\right){A}^{(r)}{\parallel}_F^2}\left(\operatorname{diag}\left(\boldsymbol{\mu} \right)\otimes {\mathbf{1}}_{2\times 1}\right){A}^{(r)}. \]

2. Scale columns: Let $ \boldsymbol{\tau} \in {\mathrm{\mathbb{R}}}^n $ be the solution to

(35)\[ \underset{\parallel \boldsymbol{\tau} {\parallel}_2=1}{\min }{\left\Vert \left({N}_R+{D}_{R,1}+{D}_{R,2}\right)\boldsymbol{\tau} \right\Vert}_2^2. \]Then we perform the update(36)\[ {A}^{\left(r+1\right)}=\frac{\parallel {A}^{\left(r+\frac{1}{2}\right)}{\parallel}_F^2}{\parallel {A}^{\left(r+\frac{1}{2}\right)}\operatorname{diag}\left(\boldsymbol{\tau} \right){\parallel}_F^2}{A}^{\left(r+\frac{1}{2}\right)}\operatorname{diag}\left(\boldsymbol{\tau} \right). \]The Sinkhorn scaling procedure is detailed in Algorithm 2.Algorithm 2. Sinkhorn scaling for quadratic constraint satisfaction

Input: $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $, the output of IRLS-ADMM

Output: $ {A}^{\prime}\in {\mathrm{\mathbb{R}}}^{2n\times n} $, a pure common lines matrix

1: procedure Sinkhorn $ (A) $

2: $ r\leftarrow 0 $

3: while not converged do

4: set $ {N}_L,{D}_{L,1},{D}_{L,2} $ using (B1), (B7), (B8)

5: set $ \boldsymbol{\mu} $ to be the solution of (33)

6: update $ A $ using (34) $ \vartriangleright $ 1. Scale rows

7: set $ {N}_R,{D}_{R,1},{D}_{R,2} $ using (B2), (B9), (B10)

8: set $ \boldsymbol{\tau} $ to be the solution of (35)

9: update $ A $ using (36) $ \vartriangleright $ 2. Scale columns

10: end while

11: $ r\leftarrow r+1 $

12: $ {A}^{\prime}\leftarrow A $

13: end procedure

5.3. Rotation recovery

IRLS-ADMM and Sinkhorn aim to output a pure common lines matrix $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $. Given a pure common lines matrix, we now show how to determine the underlying rotations $ {R}^{(i)} $ in (3) which generated $ A $ (recall Theorem 3.5).

Given $ A $, use singular value decomposition to compute a rank-3 factorization $ A={BC}^{\top } $ for $ B\in {\mathrm{\mathbb{R}}}^{2n\times 3} $ and $ C\in {\mathrm{\mathbb{R}}}^{n\times 3} $. We then seek an invertible matrix $ Q\in {\mathrm{\mathbb{R}}}^{3\times 3} $ so that $ BQ $ and $ {CQ}^{-\top } $ take the form of the factors in (6). In particular, it is required that the rows of $ BQ $ come in orthonormal pairs; more precisely, $ (BQ){(BQ)}^{\top }={BQQ}^{\top }{B}^{\top}\in {\mathrm{\mathbb{R}}}^{2n\times 2n} $ should have $ 2\times 2 $ identities along its diagonal. Setting $ X={QQ}^{\top}\in {\mathrm{\mathbb{R}}}^{3\times 3} $ and relaxing positive semidefiniteness, let us consider the following affine-linear least squares problem:(37)\[ \underset{\begin{array}{c}X\in {\mathrm{\mathbb{R}}}^{3\times 3}\\ {}X={X}^{\top}\end{array}}{\min}\parallel \mathrm{\mathcal{L}}\left({BXB}^{\top}\right){\parallel}_F^2, \]where $ \mathrm{\mathcal{L}}:{\mathrm{\mathbb{R}}}^{2n\times 2n}\to {\mathrm{\mathbb{R}}}^{2n\times 2n} $ denotes the affine-linear operator which sets all entries of a $ 2n\times 2n $ matrix outside of the $ 2\times 2 $ diagonal blocks to 0, and subtracts the $ 2\times 2 $ identity matrices from the diagonal blocks. The normal equations for (37) may be written as(38)\[ L\mathrm{vec}(X)=\mathrm{vec}\left({B}^{\top }B\right), \]where $ L\in {\mathrm{\mathbb{R}}}^{9\times 9} $, whose rows and columns index the variables $ {X}_{ij} $ for $ 1\le i\le j\le 3 $, and whose corresponding $ \left( ij,k\mathrm{\ell}\right) $-th entry is(39)\[ {(L)}_{ij,k\mathrm{\ell}}=\left\langle \left({I}_{n\times n}\otimes {\mathbf{1}}_{1\times 2}\right)\left({\mathbf{b}}_i\hskip0.5em \odot \hskip0.5em {\mathbf{b}}_{\mathrm{\ell}}\right),\left({I}_{n\times n}\otimes {\mathbf{1}}_{1\times 2}\right)\left({\mathbf{b}}_j\hskip0.5em \odot \hskip0.5em {\mathbf{b}}_k\right)\right\rangle, \]where $ {\mathbf{b}}_i $ is the ith column of $ B $. Generically there is a unique symmetric matrix $ X $ solving (38).

Let $ X={\mathrm{VDV}}^{\top } $ be an eigendecomposition for the solution to (37), where $ V,D\in {\mathrm{\mathbb{R}}}^{9\times 9} $. In the clean case, the diagonal matrix $ D $ is already positive semidefinite. Then $ {\mathrm{VD}}^{1/2}\in {\mathrm{\mathbb{R}}}^{3\times 3} $ is a candidate for $ Q $. Next the ith row block of $ \mathrm{BQ} $ determines the first two rows of $ {R}^{(i)} $, and the cross product of these two rows computes the third row of $ {R}^{(i)} $. From this, we recover the n-tuple of rotations $ \left({R}^{(1)},\dots, {R}^{(n)}\right) $ up to global right multiplication by a rotation. The full procedure for rotation recovery is in Algorithm 3, which is formulated to apply to noisy inputs as well. We note that as explained in Remark 3.6, a pure common lines matrix $ A $ can only be recovered up to sign, because there are two distinct n-tuples of rotations that can be recovered corresponding to the chiral ambiguity of common lines data. One can run Rotations twice, on $ A $ and $ -A $, to produce the two possible sets of rotations.Algorithm 3. Rotation recovery

Input: $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $, an estimate for pure common lines matrix

Output: $ \left({R}_1,\dots, {R}_n\right)\in \mathrm{SO}{(3)}^n $, rotations determining the pure common lines matrix

1: procedure Rotations $ (A) $

2: set $ B $ using the SVD to get a rank-3 approximation $ {BC}^{\top } $ for $ A $

3: set $ L $ using (39)

4: set $ X $ to be least squares solution of (38)

5: set $ V,D $ using the eigendecomposition $ X={\mathrm{VDV}}^{\top } $

6: $ {D}_{ii}\leftarrow \max \left({D}_{ii},0\right) $

7: for $ i=1,\dots, n $ do

8: set $ {\mathbf{q}}_j^{\top } $ to be the $ \left(2i-2+j\right) $-th row of $ {\mathrm{BVD}}^{1/2} $ for $ j=1,2 $

9: set $ {\mathbf{q}}_3^{\top } $ to be the ith row of $ C{\left({\mathrm{VD}}^{1/2}\right)}^{-\top } $

10: set $ {R}_i\in {\mathrm{\mathbb{R}}}^{3\times 3} $ to be $ \left(\begin{array}{c}{\mathbf{q}}_2^{\top}\\ {}-{\mathbf{q}}_1^{\top}\\ {}{\mathbf{q}}_3^{\top}\end{array}\right) $

11: end for

12: if $ \det \left({R}_i\right)<0 $ then

13: for $ i=1,\dots, n $ do

14: $ {R}_i\leftarrow -{R}_i $

15: end for

16: end if

17: for $ i=1,\dots, n $ do

18: replace $ {R}_i $ by the nearest rotation matrix to $ {R}_i $ using the SVD of $ {R}_i $

19: end for

20: end procedure

5.4. Justification of algorithms

Suppose the input to IRLS-ADMM is a noiseless common lines matrix $ \hat{A}\in {\mathrm{\mathbb{R}}}^{2n\times n} $, and its output $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ is a global minimizer to the non-convex problem (13). In the next theorem, we show that we can recover the ground-truth pure common lines matrix, up to a global scale, by scaling the $ 2\times 1 $ blocks of $ A $ via IRLS-ADMM and enforcing the quadratic constraints via Sinkhorn. The theorem justifies using IRLS-ADMM and Sinkhorn.

Theorem 5.1. Let $ n\ge 4 $. Let $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ be a generic pure common lines matrix and $ {\lambda}_{ij}\in \mathrm{\mathbb{R}} $ be nonzero scales for $ i,j=1,\dots, n $ with $ i\ne j $. Let $ \Lambda \in {\mathrm{\mathbb{R}}}^{n\times n} $ where $ {\Lambda}_{ij}={\lambda}_{ij} $ if $ i\ne j $, $ {\Lambda}_{ii}=0 $, and $ \Lambda ={\Lambda}^{\mathrm{\top}} $. Suppose\[ B\hskip0.5em := \hskip0.5em \left(\Lambda \otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em A=\left(\begin{array}{ccccc}\mathbf{0}& {\lambda}_{12}{\mathbf{a}}_{12}& \dots & {\lambda}_{1,n-1}{\mathbf{a}}_{1,n-1}& {\lambda}_{1,n}{\mathbf{a}}_{1,n}\\ {}{\lambda}_{21}{\mathbf{a}}_{21}& \mathbf{0}& \dots & {\lambda}_{2,n-1}{\mathbf{a}}_{2,n-1}& {\lambda}_{2,n}{\mathbf{a}}_{2,n}\\ {}\vdots & \vdots & \ddots & \vdots & \vdots \\ {}{\lambda}_{n-1,1}{\mathbf{a}}_{n-1,1}& {\lambda}_{n-1,2}{\mathbf{a}}_{n-1,2}& \dots & \mathbf{0}& {\lambda}_{n-1,n}{\mathbf{a}}_{n-1,n}\\ {}{\lambda}_{n,1}{\mathbf{a}}_{n,1}& {\lambda}_{n,2}{\mathbf{a}}_{n,2}& \dots & {\lambda}_{n,n-1}{\mathbf{a}}_{n,n-1}& \mathbf{0}\end{array}\right) \]has rank 3. Then there exists $ {\mu}_1,\dots, {\mu}_n,{\tau}_1,\dots, {\tau}_n\in \mathrm{\mathbb{R}} $ such that(40)\[ \left(\begin{array}{ccccc}0& {\lambda}_{12}& \dots & {\lambda}_{1,n-1}& {\lambda}_{1,n}\\ {}{\lambda}_{21}& 0& \dots & {\lambda}_{2,n-1}& {\lambda}_{2,n}\\ {}\vdots & \vdots & \ddots & \vdots & \vdots \\ {}{\lambda}_{n-1,1}& {\lambda}_{n-1,2}& \dots & 0& {\lambda}_{n-1,n}\\ {}{\lambda}_{n,1}& {\lambda}_{n,2}& \dots & {\lambda}_{n,n-1}& 0\end{array}\right)=\left(\begin{array}{ccccc}0& {\mu}_1{\tau}_1& \dots & {\mu}_1{\tau}_{n-1}& {\mu}_1{\tau}_n\\ {}{\mu}_2{\tau}_1& 0& \dots & {\mu}_2{\tau}_{n-1}& {\mu}_2{\tau}_{n-1}\\ {}\vdots & \vdots & \ddots & \vdots & \vdots \\ {}{\mu}_{n-1}{\tau}_1& {\mu}_{n-1}{\tau}_2& \dots & 0& {\mu}_{n-1}{\tau}_n\\ {}{\mu}_n{\tau}_1& {\mu}_n{\tau}_2& \dots & {\mu}_n{\tau}_{n-1}& 0\end{array}\right). \]If $ B $ additionally satisfies the quadratic constraints (10), then there exists $ \tau \in \mathrm{\mathbb{R}} $ such that for all $ i,j=1,\dots, n $ ($ i\ne j $) it holds $ {\lambda}_{ij}=\tau $.

Proof. See Appendix A.

Remark 5.2. Theorem 3.5 implies that given a pure common lines matrix, the rotation recovery problem in Section 5.3 is uniquely solvable up to a global rotation. Theorem 5.1 states that the ground-truth pure common lines matrix can only be determined up to a global scale; in particular, $ \tau $ in Theorem 5.1 may be positive or negative. As in Remark 3.6, this sign flip corresponds to chiral ambiguity in cryo-EM. Apart from its sign, the global scale has no effect on rotation recovery in Algorithm 3.

6. Application: Clustering heterogeneous common lines

Here we present an application of our approach for common lines to a challenging problem in cryo-EM. We propose a clustering algorithm for detecting homogeneous communities of consistent common lines from discretely heterogeneous data, using our algebraic constraints. One can then use the clusters of common lines for rotation recovery and 3D reconstruction.

Several successful methods have been proposed for clustering heterogeneous cryo-EM data that consist of images of a single macromolecule with conformational landscapes or differences in subunits.⁽²¹^–²⁷⁾ Recent work of the third author⁽⁷⁾ proposed a method of solving a different heterogeneity challenge in cryo-EM, where the heterogeneity in the data comes from multiple distinct macromolecules rather than variations on one primary structure. Our proposed application focuses on the latter problem. For this setting of heterogeneity, the main prior work to compare against is (7).

The basic idea of our approach is illustrated in Figure 1. There the matrix in gray is a matrix of common lines from simulated heterogeneous data, corresponding to two distinct molecular conformations. The two homogeneous common lines matrices are diagonal blocks in green and purple. The $ 2\times 1 $ entries outside of these diagonal blocks do not correspond to any consistent lines and are just random lines in $ {\mathrm{\mathbb{R}}}^2 $, encoded as representatives. Note that in general, a heterogeneous common lines matrix will not necessarily have consistent common lines matrices as diagonal blocks, but will be a $ 2\times n $ row and $ n\times 1 $ column permutation of such a matrix. Gaussian white noise has also been added to the gray matrix to decrease the signal-to-noise ratio of the common lines. A scree plot in Figure 1 shows a noticeable spectral gap between the third and fourth singular values of the homogeneous common lines matrices (green and purple curve), which is not detectable for the entire heterogeneous common lines matrix (gray curve), thus demonstrating low-rank structure of submatrices.

Algorithm 4. Heterogeneous clustering on common line constraints

Input: $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $, a common lines matrix

Output: $ {C}_1,\dots, {C}_r\subseteq \left\{1,\dots, n\right\} $, a partition of all common lines into consistent clusters

1: procedure Clusters $ (A) $

2: $ L\leftarrow \mathrm{\varnothing} $ $ \vartriangleright $ 1. Generate samples

3: for $ i\ge 1 $ until sufficient do

4: set $ S\subseteq \left\{1,\dots, n\right\} $ to be a random sample such that $ \mid S\mid =4 $

5: set $ {A}_S $ to be the submatrix of $ A $ associated to the common lines in $ S $

6: A_S←Irls-Admm (A_S)

7: A_S ← Sinkhorn (A_S)

8: if converged then

9: set $ M $ using (19) on $ {A}_S $

10: set $ {\mathbf{v}}_1,{\mathbf{v}}_2,{\mathbf{v}}_3 $ using (28) on $ {A}_S $

11: $ e\leftarrow \parallel M-{M}^{\top }{\parallel}_F^2+\parallel {\mathbf{v}}_1-{\mathbf{v}}_2{\parallel}_2^2+\parallel {\mathbf{v}}_2-{\mathbf{v}}_3{\parallel}_2^2 $

12: $ L\leftarrow \mathrm{append}\left(L,\left(S,e\right)\right) $

13: end if

14: end for

15: sort $ L $ by increasing values of $ e $ $ \vartriangleright $ 2. Cluster samples

16: $ G\leftarrow {\mathbf{0}}_{n\times n} $

17: while $ L\ne \varnothing $ do

18: $ \left(S,e\right)\leftarrow \mathrm{pop}(L) $

19: $ {G}_{S,S}\leftarrow \max \left\{{G}_{S,S},-\log (e)\cdot {\mathbf{1}}_{4\times 4}\right\} $

20: end while

21: $ {G}_{ii}\leftarrow 0 $ for all $ i=1,\dots, n $

22: C₁,…,C_r ← CommunityDetection(G) ⊳ Use method from (28)

23: end procedure

View Image - Figure 1. A heterogeneous common lines matrix and rank test for simulated rotations from two distinct molecules. Block-diagonals comparing rotations from the same molecule show rank-3 structure.

Figure 1. A heterogeneous common lines matrix and rank test for simulated rotations from two distinct molecules. Block-diagonals comparing rotations from the same molecule show rank-3 structure.

Our clustering algorithm consists of two main steps:

1. Generate samples: As input, we are given a single common lines matrix $ A $, from which we identify small clusters of consistent common lines. We randomly sample a set of four common lines $ S $ and extract the corresponding $ 8\times 4 $ submatrix $ {A}_S $ from $ A $ (the choice of four common lines is explained in Section 7.3 based on numerical experiments). We then run IRLS-ADMM and Sinkhorn on $ {A}_S $. These methods may occasionally diverge due to numerical instability from noise in the data, as discussed in Section 7, in which case we discard the sample. We also discard the sample if the spectral gap between the third and fourth singular value of $ {A}_S $ is not sufficiently large (i.e., $ {A}_S $ does not have numerical rank 3). Otherwise, we obtain a quadratic constraint satisfaction error for the sample. We record both the sample and its error, and repeat this sampling sufficiently many times.
2. Cluster samples: We view the collection of samples and errors we obtain as a weighted hypergraph on $ n $ vertices whose hyperedges are of size 4 corresponding to the samples and whose hyperedge weights are given by their corresponding errors. We convert the weighted hypergraph into a weighted graph by constructing a weighted adjacency matrix whose $ \left(i,j\right) $ entry is the negative logarithm of the smallest error on a hyperedge containing both common lines $ i $ and $ j $. We then use an unsupervised community detection algorithm on this adjacency matrix to find the clusters of consistent common lines. In our numerical experiments, we use the algorithm proposed by Lancichinetti, Fortunato, and Kertész,⁽²⁸⁾ which can identify overlapping communities and hierarchical structure, and depends only on a single hyperparamter controlling the scale of the hierarchies. In particular, we do not need to specify the number of clusters or their sizes.

The clustering algorithm is detailed in Algorithm 4 and illustrated in Figure 2.

View Image - Figure 2. Algorithm for separating images of distinct molecules using algebraic constraints on common lines. The common lines matrix is first computed from an input set of images or class averages. We then apply Algorithm 4, our clustering algorithm. After clustering, images corresponding to the same molecule can then be used for 3D reconstruction.

Figure 2. Algorithm for separating images of distinct molecules using algebraic constraints on common lines. The common lines matrix is first computed from an input set of images or class averages. We then apply Algorithm 4, our clustering algorithm. After clustering, images corresponding to the same molecule can then be used for 3D reconstruction.

7. Performance on data

We compare the performance of our methods to existing common lines based algorithms in the literature, namely functions in the software package ASPIRE ⁽²⁹⁾ and the clustering algorithm of Verbeke et al.⁽⁷⁾ We study the problems of recovering rotations, denoising common lines, and partitioning discretely heterogeneous image sets into homogeneous subcommunities using common lines. The tests are done on simulated data (at various levels of noise) and real data.

Our simulated data consists of image data of the 40S, 60S, and 80S ribosome (available from the Electron Microscopy Data Bank⁽³⁰⁾ as entries EMD-4214, EMD-2811, and EMD-2858, respectively), generated by ASPIRE. The ribosomes and examples of their clean 2D projection images are displayed in Figure 3. Each image is $ 128\times 128 $ with a pixel size of 3 Å. White Gaussian noise is added to each image corresponding to a specified signal-to-noise ratio (SNR). We define the SNR by taking the signal to be the average squared intensity over each pixel in the clean image and setting the noise variance to achieve the appropriate ratio. The common lines between two images are detected by finding the line projections of the two images with the highest correlation, as computed in ASPIRE.

View Image - Figure 3. 3-D structures and example projection images for the three structures used for simulation. (a) 80S ribosome (EMD-2858) and example projection images. (b) 60S ribosome (EMD-2811) and example projection images. (c) 40S ribosome (EMD-4214) and example projection images.

Figure 3. 3-D structures and example projection images for the three structures used for simulation. (a) 80S ribosome (EMD-2858) and example projection images. (b) 60S ribosome (EMD-2811) and example projection images. (c) 40S ribosome (EMD-4214) and example projection images.

In our numerical experiments, we have observed that the performance of IRLS-ADMM, and consequently Sinkhorn, can depend on its initialization. In particular, we have occasionally observed divergence or vanishing of the entries of $ A $ in Sinkhorn. This behavior appears to be due to numerical instabilities in Sinkhorn arising from noise in the data or from using too many common lines. In these cases, we can either discard such runs and restart the algorithms with new initializations, or skip using Sinkhorn. We address this issue in each of our tests.

7.1. Rotation recovery

Let $ {R}^{(1)},\dots, {R}^{(n)}\in \mathrm{SO}(3) $ be the ground-truth rotations and $ {R}_1,\dots, {R}_n\in \mathrm{SO}(3) $ be the recovered rotations. Then Theorem 3.5 states that there exists a unique rotation $ Q\in \mathrm{SO}(3) $ such that $ {R}^{(i)}={R}_iQ $ for all $ 1\le i\le n $. In other words, $ Q $ can be found by solving the orthogonal Procrustes problem(41)\[ \underset{Q\in SO(3)}{\min}\frac{1}{n}{\left\Vert \left(\begin{array}{c}{R}^{(1)}\\ {}\vdots \\ {}{R}^{(n)}\end{array}\right)-\left(\begin{array}{c}{R}_1\\ {}\vdots \\ {}{R}_n\end{array}\right)Q\right\Vert}_F^2. \]The solution to this problem can be found using SVD.⁽³¹⁾ We note that for $ n=1 $, there is a simple relation between the Procrustes error and the angular error between two rotation matrices. For $ R,S\in \mathrm{SO}(3), $ it holds $ \parallel R-S{\parallel}_F^2=\left\langle R-S,R-S\right\rangle =\parallel {R}^{\top }R{\parallel}_F^2-2\left\langle R,S\right\rangle +\parallel {S}^{\top }S{\parallel}_F^2 $ $ =\parallel {I}_{3\times 3}{\parallel}_F^2-2\left\langle R,S\right\rangle +\parallel {I}_{3\times 3}{\parallel}_F^2=2\cdot 3-2\cdot \mathrm{tr}\left({R}^{\top }S\right) $. Hence the angular error between $ R $ and $ S $ is\[ \theta =\operatorname{arccos}\left(\frac{\mathrm{tr}\left({R}^{\top }S\right)-1}{2}\right)=\operatorname{arccos}\left(\frac{3-\frac{1}{2}\parallel R-S{\parallel}_F^2-1}{2}\right). \]

We compare our method to the procedure in ASPIRE on simulated data. Given common lines, the rotation recovery algorithm used in ASPIRE is the synchronization with voting procedure as described in (5). We use 30 images of the 60S, 80S, and 40S ribosomes at $ \mathrm{SNR}=\mathrm{0.125,0.25,0.5,1} $. For each macromolecule and SNR, we generate 50 sets of 30 random ground-truth rotations and their corresponding noisy images. We report the average Procrustes error (41) per image (i.e., the Procrustes error divided by the number of images).

When running this test, we chose to only use our IRLS-ADMM algorithm followed by Rotations, and omitted the Sinkhorn row and column scaling step due to observed numerical instabilities of Sinkhorn with noise in the data or too many common lines. Also, Remark 5.2 states that our algorithm is only guaranteed to recover the ground-truth pure common lines matrix up to a global scale, and the sign of this scale produces two sets of rotations that differ by left-multiplication with $ J=\operatorname{diag}\left(-1,-1,1\right) $. This sign flip corresponds to the chiral ambiguity in cryo-EM, as explained in Remark 3.6. Thus in our tests, we report the best rotational error amongst the two possible sets of rotations.

The results for rotation recovery are displayed in Table 1. Notably, at lower SNR our method is consistently more accurate than ASPIRE.

Table 1.

The average rotation recovery error from 30 simulated images of macromolecules at various SNR, 50 runs each

		Average Procrustes rotation error (42)
Macromolecule	SNR	Algorithm 1 and Algorithm 3	ASPIRE
EMD–2811 (60S ribosome)	0.125	3.6840	4.3426
	0.25	3.3215	3.6042
	0.5	1.5512	2.2704
	1	0.4215	0.4877
EMD–2858 (80S ribosome)	0.125	1.7903	2.5943
	0.25	0.8038	1.3175
	0.5	0.1576	0.2543
	1	0.0259	0.0240
EMD–4214 (40S ribosome)	0.125	4.0377	4.5281
	0.25	3.9079	4.0104
	0.5	3.6375	3.3994
	1	2.8271	2.1334

Bold values indicate the algorithm with lower error.

7.2. Denoising common lines

The problem we consider here is the following: given a noisy common lines matrix, how well can we recover the ground-truth clean pure common lines matrix? If $ A,B\in {\mathrm{\mathbb{R}}}^{2n\times n} $ are the ground truth and recovered pure common lines matrix respectively, then we measure this error to be(42)\[ \underset{\lambda \in \mathrm{\mathbb{R}}}{\min}\frac{1}{n}{\left\Vert A-\lambda B\right\Vert}_F^2 \]since there is a global scale ambiguity in the recovered pure common lines matrix as discussed in Remark 5.2. The above problem is a least-squares problem in $ \lambda $ and hence has a closed-form solution.

We compare our methods to ASPIRE on simulated data as follows. We run the rotation recovery algorithm of ASPIRE based on common lines to obtain rotations $ {R}^{(1)},\dots, {R}^{(n)}\in SO(3) $. Then we construct a recovered pure common lines matrix $ B $ from these rotations by using the factorization (6). We then compare the denoising error (42) from the ground-truth clean common lines matrix $ A $ to the output of our IRLS-ADMM method and to the matrix $ B $. As before, the simulated data consists of 30 images of the 60S, 80S, and 40S ribosomes at $ \mathrm{SNR}=\mathrm{0.125,0.25,0.5,1} $, and we report the average denoising error (42) per image over 50 runs. We assume we have the ground-truth chiral information when recovering rotations with ASPIRE.

The results for common line denoising are displayed in Table 2. Again our IRLS-ADMM method outperforms ASPIRE for denoising common lines matrices, particularly at low SNR.

Table 2.

The average denoising error of recovered pure common lines matrices from 30 simulated images of macromolecules at various SNR, 50 runs each

		Average denoising error (42)
Macromolecule	SNR	Algorithm 1	ASPIRE
EMD–2811 (60S ribosome)	0.125	14.1982	18.2274
	0.25	12.7786	17.8805
	0.5	8.2349	16.8199
	1	4.1908	6.1365
EMD–2858 (80S ribosome)	0.125	11.2948	17.3292
	0.25	6.8850	13.3778
	0.5	2.3152	3.6690
	1	0.5680	0.3512
EMD–4214 (40S ribosome)	0.125	15.5635	18.0612
	0.25	15.1112	18.0862
	0.5	14.0420	18.0069
	1	11.6964	15.7237

Bold values indicate the algorithm with lower error.

7.3. Clustering heterogeneous image sets

We test the performance of our algorithm Clusters for clustering (see Section 6) on simulated and real data.

The success of clustering is measured by the adjusted Rand index⁽³²⁾ (ARI) between the ground-truth clusters and the recovered clusters. The range of this index is $ -\infty <\mathrm{ARI}\le 1 $, with $ \mathrm{ARI}=1 $ if the two partitions are identical. The ARI is a corrected-for-chance version of the Rand index, meaning that it is the expected Rand index for the cluster and is equal to 0 if every element is placed in a random cluster.

While any number of common lines $ \mid S\mid $ can be sampled on line 4 of Clusters, we found that sampling four common lines at a time was effective for a number of reasons: $ \mid S\mid =4 $ enforces a non-trivial rank 3 constraint, and the small number of common lines allowed us to both generate many samples rapidly and improve the numerical stability of the Sinkhorn scaling procedure. In addition, the CommunityDetection algorithm we use is the one described in (28).

7.3.1. Simulated data

We generate a dataset containing three clusters, with $ n=5+30+15=50 $ images from the 40S, 60S, and 80S ribosomes, respectively, from which we construct a common lines matrix.

Clusters achieved perfect clustering ($ \mathrm{ARI}=1 $) at $ \mathrm{SNR}=10 $, $ \mathrm{ARI}=0.8581 $ at $ \mathrm{SNR}=5 $, and $ \mathrm{ARI}=0.3286 $ at $ \mathrm{SNR}=1 $. The clusters found at $ \mathrm{SNR}=5 $ are displayed in Figure 4, which shows that only one pair of images were placed in incorrect clusters.

$View Image - Figure 4. Clustering results for $ n=50 $ simulated images with SNR = 5. Images are size $ 128\times 128 $ with pixel size of 3 Å, and are colored according to the ground truth labels. Using Clusters achieves ARI = 0.8581 and only one pair of images are incorrectly clustered.$

Figure 4. Clustering results for $ n=50 $ simulated images with SNR = 5. Images are size $ 128\times 128 $ with pixel size of 3 Å, and are colored according to the ground truth labels. Using Clusters achieves ARI = 0.8581 and only one pair of images are incorrectly clustered.

7.3.2. Real data

Our real data consists of a subset of 2D class averages computed from the experimental data described in Verbeke et al.⁽⁷⁾ The subset we consider consists of two clusters with $ n=47+28=75 $images corresponding to the 60S and 80S ribosomes respectively. Each class average is $ 96\times 96 $ with a pixel size of 4.4 Å. We use the labels from (7) as ground truth for clustering.

Figure 5 shows the clusters found by our algorithm Clusters, achieving ARI = 0.8440 and misclassifying only three images.

$View Image - Figure 5. Clustering results for $ n=75 $ 2D class averages from EMPIAR-10268 computed as described in (7). Images are size $ 96\times 96 $ with a pixel size of 4.4 Å, and are colored according to the ground truth labels. Using Clusters achieves ARI = 0.8440 and only three images are incorrectly clustered.$

Figure 5. Clustering results for $ n=75 $ 2D class averages from EMPIAR-10268 computed as described in (7). Images are size $ 96\times 96 $ with a pixel size of 4.4 Å, and are colored according to the ground truth labels. Using Clusters achieves ARI = 0.8440 and only three images are incorrectly clustered.

The clustering algorithm used in (7) is based on performing community detection on a nearest-neighbors graph constructed using Euclidean distances between the best-matching line projections between every pair of images. We stress that our clustering algorithm uses a completely distinct aspect of common lines data: the positions of the common lines. As a proof of concept for our constraints, we do not make use of the correlations between the common lines at all, unlike (7). The test for Clusters only uses a subset of the dataset in (7), which has $ n=100 $ images and includes images with unknown labels. If we compare only the 60S and 80S images that were clustered, then Clusters achieves a similar performance, where one additional image is misclassified by Clusters compared to (7).

8. Conclusion

This article revisited the fundamental topic of common lines in cryo-EM image processing. We discussed a novel approach for dealing with common lines, based on a certain $ 2n\times n $ matrix encoding the common lines between $ n $ projection images. We proved that if the $ 2\times 1 $ blocks of the matrix are properly scaled, then the matrix satisfies nice algebraic constraints: a low-rank condition and several sparse quadratic constraints. The new formulation operates directly on common lines data, and is fully global in that it does not require angular reconstitution or voting procedures at all. It opens the door to different and potentially more robust approaches to computational tasks involving common lines. Using the algebraic constraints, we adapted optimization algorithms from other domains to give new methods to denoise common lines data, and recover the 3D rotations underlying noisy images. Numerical experiments show that these methods have increased accuracy at low SNR, compared to existing methods based on common lines. We also explored a setting where traditional common lines methods fail to apply – cryo-EM datasets with discrete heterogeneity – by proposing a sampling-based process to cluster the images in homogeneous subcommunities based on our algebraic constraints. Experiments with simulated and real data show the method performs well when applied to images with high noise.

Although there is clear promise, several future directions could be pursued for further improvements. Firstly, in Section 5 the optimization algorithm building on (11) is quite complex. Matrix scaling problems as in (11) and our work are an interesting variation on the problem of matrix completion; would other optimization approaches perform better? Secondly, extensions to molecules with nontrivial point group symmetries would be useful (and currently are a focus in other common lines research). Perhaps our formulation can suggest another way to incorporate symmetries into common lines methods. Lastly, in the application to discrete heterogeneity, we neglected correlation scores between the common lines, on which (7) relied. It is likely better to use both the scores and the algebraic constraint errors.

AuthorAffiliation

Corresponding author: Tommi Muller; Email: [email protected]

Word count: 9101

Show less

© The Author(s), 2024. Published by Cambridge University Press. This work is licensed under the Creative Commons Attribution License This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited. (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

We revisit the topic of common lines between projection images in single-particle cryo-electron microscopy (cryo-EM). We derive a novel low-rank constraint on a certain 2n × n matrix storing properly scaled basis vectors for the common lines between n projection images of one molecular conformation. Using this algebraic constraint and others, we give optimization algorithms to denoise common lines and recover the unknown 3D rotations associated with the images. As an application, we develop a clustering algorithm to partition a set of noisy images into homogeneous communities using common lines, in the case of discrete heterogeneity in cryo-EM. We demonstrate the methods on synthetic and experimental datasets.

Details

Title

Algebraic constraints and algorithms for common lines in cryo-EM

Author

Muller, Tommi¹

; Duncan, Adriana L²; Verbeke, Eric J³; Kileel, Joe⁴

¹ Mathematical Institute, University of Oxford, Oxford, UK
² Department of Mathematics, University of Texas at Austin, Austin, TX, USA
³ Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
⁴ Department of Mathematics, University of Texas at Austin, Austin, TX, USA; Oden Institute, University of Texas at Austin, Austin, TX, USA

Publication year

2024

Publication date

2024

Publisher

Cambridge University Press

e-ISSN

2633903X

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1017/S2633903X24000072

ProQuest document ID

3103117537

Algebraic constraints and algorithms for common lines in cryo-EM

Jump to:

Full text

Impact Statement

1. Introduction

1.1. Advantages

2. Background

3. Constraints on sets of common lines

3.1. The common lines matrix

3.2. The common lines variety

4. Optimization problem

5. Optimization algorithms

5.1. IRLS and ADMM for the rank constraint

5.2. Sinkhorn scaling for the quadratic constraints

5.3. Rotation recovery

5.4. Justification of algorithms

6. Application: Clustering heterogeneous common lines

7. Performance on data

7.1. Rotation recovery

7.2. Denoising common lines

7.3. Clustering heterogeneous image sets

7.3.1. Simulated data

7.3.2. Real data

8. Conclusion

Abstract

Details

Suggested sources