Introduction
Protein degradation is an essential biological process that occurs continuously throughout the life of a cell. Degradative protein turnover helps maintain protein homeostasis by regulating protein abundance and eliminating misfolded and damaged proteins from cells (Varshavsky, 2011; Collins and Goldberg, 2017; Hanna and Finley, 2007). In eukaryotes, most protein degradation occurs through the concerted actions of the ubiquitin system and the proteasome, together known as the ubiquitin-proteasome system (UPS) (Coux et al., 1996; Collins and Goldberg, 2017; Hershko and Ciechanover, 1998; Bachmair et al., 1986; Ciechanover et al., 2000). Ubiquitin system enzymes bind degradation-promoting signal sequences, termed degrons (Varshavsky, 1991), in cellular proteins and mark them for degradation by covalently attaching chains of the small protein ubiquitin (Bett, 2016; Hershko and Ciechanover, 1998; Finley et al., 2012). The proteasome binds poly-ubiquitinated proteins, then processively deubiquitinates, unfolds, and degrades them to small peptides (Kisselev et al., 1999). The UPS degrades a wide array of proteins spanning diverse biological functions and subcellular localizations (Schwanhäusser et al., 2011; Kong et al., 2021; Christiano et al., 2020). By controlling the turnover of a large fraction of the cellular proteome, the UPS regulates numerous aspects of cellular physiology and function, including gene expression, protein homeostasis, cell growth and division, stress responses, and energy metabolism (Varshavsky, 2011; Hershko and Ciechanover, 1998; Hanna and Finley, 2007; Pohl and Dikic, 2019).
Because of the central role of UPS protein degradation in regulating protein abundance, variation in UPS activity can influence a variety of cellular and organismal phenotypes (Varshavsky, 2011; Schwartz and Ciechanover, 1999; Hanna and Finley, 2007; Schmidt and Finley, 2014). Physiological variation in UPS activity enables cells to respond to changes in their internal and external environments. For example, UPS activity increases when misfolded or oxidatively damaged proteins accumulate, preventing these molecules from damaging the cell (Sontag et al., 2014; Grimm et al., 2012; Finley and Prado, 2020). Conversely, UPS activity decreases during nutrient deprivation, when the energetic demands of UPS protein degradation would be costly to the cell (Waite et al., 2016; Laporte et al., 2008; Bajorek et al., 2003). Variation in UPS activity may also create discrepancies between protein degradation and the proteolytic needs of the cell, leading to adverse phenotypic outcomes. For example, age-related declines in UPS activity exacerbate the accumulation of damaged and misfolded proteins that occurs during aging, compromising protein homeostasis and, in turn, cellular viability (Stolzing and Grune, 2001; Baraibar and Friguet, 2012; Shringarpure and Davies, 2002). Understanding the sources of variation in UPS activity thus has considerable implications for our understanding of the many traits influenced by protein degradation.
A handful of examples have shown that variation in UPS activity can be caused by individual genetic differences. Rare mutations that ablate or diminish the function of ubiquitin system or proteasome genes impair UPS protein degradation and cause a variety of incurable syndromes. For example, nonsense and frameshift mutations in
Our understanding of how natural genetic variation affects the UPS comes largely from these limited examples, leaving critical knowledge gaps in several key areas. First, a focus on rare, large-effect mutations linked to Mendelian syndromes likely provides a narrow, incomplete view of the genetics of UPS activity. Most traits are genetically complex, shaped by many loci of small effect and few loci of large effect throughout the genome (Mackay et al., 2009; Ehrenreich et al., 2009), suggesting that variants that completely or largely ablate UPS gene functions represent only one extreme of a continuum of genetic effects on UPS activity. Second, we have virtually no knowledge of how natural variation in non-UPS genes affects UPS activity. Third, variation in UPS activity can differentially affect the degradation of distinct UPS substrates (Christiano et al., 2020; Kong et al., 2021). Whether genetic effects on UPS activity affect the turnover of distinct proteins consistently or in a substrate-specific manner remains a fundamentally open question. Finally, we do not know how genetic effects on UPS activity influence other traits. For example, many genetic effects on gene expression influence protein levels without altering mRNA abundance for the same gene (Battle et al., 2015; Ghazalpour et al., 2011; Mirauta et al., 2020; Albert et al., 2014; Brion et al., 2020; Chick et al., 2016; Cenik et al., 2015; Abell et al., 2022; Foss et al., 2011). These protein-specific effects could arise through differences in UPS activity, but there have been no efforts to understand how natural variation that alters UPS activity influences global gene expression at the protein and RNA levels.
Technical challenges have precluded a comprehensive view of the genetics of UPS activity. Mapping genetic influences on a trait with high statistical power requires assaying large, genetically diverse populations of thousands of individuals (Bloom et al., 2013). At this scale, in vitro biochemical assays of UPS activity are impractical. Several synthetic reporter systems can measure UPS activity with high-throughput in vivo (Geffen et al., 2016; Yu et al., 2016; Yen et al., 2008). However, these systems use genetically encoded fluorescent proteins coupled to degrons to measure UPS activity. When deployed in genetically diverse populations, their output is likely confounded from genetic effects on reporter expression levels.
Here, we leveraged advances in synthetic reporter design to obtain high-throughput, reporter expression level-independent measurements of UPS activity in millions of live, single cells. We use these measurements to map genetic influences on the N-end rule, a UPS pathway that recognizes degrons in protein N-termini (N-degrons) (Varshavsky, 1991) of thousands of endogenous cellular proteins (Kats et al., 2018; Bartel et al., 1990; Hwang et al., 2010; Varshavsky, 2011). Different N-degrons are processed by one of two distinct targeting systems (Figure 1A), which allowed us to test for potential pathway-specific effects of natural genetic variation on UPS activity. Systematic, statistically powerful genetic mapping revealed the complex, polygenic genetic architecture of UPS activity. Across the set of 20 N-degrons, we identified 149 loci influencing UPS activity, many of which had pathway- or substrate-specific effects. Resolving causal nucleotides at four loci identified regulatory and missense variants in ubiquitin system genes whose products process, recognize, and ubiquitinate cellular proteins. By measuring the effect of a causal variant in the
Figure 1.
UPS N-end rule activity reporters and genetic mapping method.
(A) Schematic of the production and degradation of UPS activity reporters according to the UPS N-end rule. (B) Density plots of the log2 RFP / GFP ratio from 10,000 cells for each of 8 independent biological replicates per strain per reporter for representative Arg/N-end and Ac/N-end pathway reporters. "BY" and "RM" are genetically divergent yeast strains. "BY
Figure 1—figure supplement 1.
Comparison of UPS activity between strains across N-degron reporters.
The -log2 RFP / GFP ratio value was extracted from 10,000 cells from each of 8 independent biological replicates per strain per reporter and converted to Z-scores. High values correspond to high UPS activity and low values correspond to low UPS activity. Tukey HSD
Figure 1—figure supplement 2.
Overview of the constructs and strain construction steps used to generate yeast strains harboring TFT UPS activity reporters.
Results
Single-cell measurements identify heritable variation in UPS activity
To understand how genetic variation influences UPS activity, we focused on the N-end rule, in which a protein’s N-terminal amino acid functions as an N-degron that results in a protein’s ubiquitination and proteasomal degradation (Figure 1A). The UPS N-end rule can be subdivided into the Arg/N-end and Ac/N-end pathways based on the molecular properties and recognition mechanisms of each pathway’s constituent N-degrons (Figure 1A; Varshavsky, 2011). We reasoned that the breadth of degradation signals and recognition mechanisms encompassed in the N-end rule would allow us to identify diverse genetic influences on UPS activity and that the well-characterized effectors of the N-end rule would aid in defining the molecular mechanisms of variant effects. We used a previously described approach (Varshavsky, 2005) to generate constructs containing each of the 20 possible N-degrons and appended these sequences to tandem fluorescent timers (TFTs; Figure 1A; Khmelinskii et al., 2012). TFTs are fusions of a rapidly maturing green fluorescent protein (GFP) and a slower maturing red fluorescent protein (RFP) (Khmelinskii et al., 2012; Khmelinskii and Knop, 2014). The TFT’s output, expressed as the -log2 RFP / GFP ratio, is directly proportional to its degradation rate and, when fused to N-degrons, measures UPS N-end rule activity (Kats et al., 2018; Kong et al., 2021; Khmelinskii et al., 2012). Because the TFT is expressed as a single protein construct, the output of the TFT is also independent of its expression level (Kats et al., 2018; Khmelinskii et al., 2014; Khmelinskii et al., 2012; Kong et al., 2021), enabling its use in genetically diverse populations.
We characterized the performance of our TFTs by measuring their output in yeast strains with gene deletions that alter UPS activity towards N-end rule substrates. As expected, deleting the E3 ubiquitin ligases of the Arg/N-end (
To understand how natural genetic variation influences UPS activity, we compared two genetically divergent
Genetic mapping reveals a complex, polygenic genetic architecture for UPS activity
We mapped quantitative trait loci (QTLs) for UPS activity using bulk segregant analysis (Figure 1E; Michelmore et al., 1991; Ehrenreich et al., 2010; Albert et al., 2014). In our implementation, this approach attains high statistical power by comparing whole-genome sequence data from pools of thousands of single cells with extreme UPS activity selected from a large population of haploid, recombinant progeny obtained by crossing BY and RM (Figure 1E–G; Ehrenreich et al., 2010; Albert et al., 2014). Using this method, we reproducibly identified 149 UPS activity QTLs across the set of 20 N-degrons at a false discovery rate of 0.5% (Figure 2A/B, Figure 2—source data 1, Supplementary file 1, Appendix 1). The number of QTLs per reporter ranged from 1 (for the Ile N-degron) to 15 (for the Ala N-degron) with a median of 7 (Figure 2B, Figure 2—source data 1). Using the absolute difference in allele frequency between the high and low UPS activity pools as a measure of effect size, we found that most QTLs had small effects, with only 5 loci (3%) causing an allele frequency difference greater than 0.5 (Figure 2C, Figure 2—source data 1). Thus, UPS activity is a complex, polygenic trait, shaped by variation throughout the genome.
Figure 2.
UPS activity QTL mapping results.
(A) Results from the alanine (Ala) N-degron reporter illustrate the results and reproducibility of the method. Asterisks denote QTLs, colored by biological replicate. (B) QTL mapping results for the 20 N-degrons. Colored blocks of 100 kb denote QTLs detected in each of two independent biological replicates, colored according to the direction and magnitude of the effect size (RM allele frequency difference between high and low UPS activity pools). Experimentally validated (boxed) and candidate (unboxed) causal genes for select QTLs are annotated above the plot. (C) Cumulative distributions of the effect size and direction for Arg/N-end and Ac/N-end QTLs. (D) Cumulative distribution of LOD scores for Arg/N-end and Ac/N-end QTLs.
Analysis of the set of UPS QTLs revealed several patterns. First, the RM allele was associated with higher UPS activity in a significant majority of UPS QTLs (89 out of 149, 60%, binominal test
Third, multiple QTLs for distinct N-degrons occurred in close proximity and had the same direction of effect (Figure 2B), suggesting these QTLs may result from the same causal genes or variants. To better understand potential pleiotropy among the set of UPS activity QTLs, we computed overlap among the set of 149 UPS activity QTLs. We considered QTLs for distinct N-degrons overlapping when their peak position occurred within 100 kb and they had the same direction of effect (the sign of the RM allele frequency between the high and low UPS activity pools). Applying these criteria revealed that the 149 UPS activity QTLs were located at 35 distinct QTL regions (Figure 2—source data 2). Of these 35 regions, 23 (66%) affected only reporters from either the Arg/N-end (12) or Ac/N-end (11) pathways of the N-end rule (Figure 2—source data 2). Five of the 23 pathway-specific QTL regions affected only individual N-degrons (Figure 2—source data 2). Use of more lenient LOD score thresholds for QTL detection did not alter these general conclusions (Figure 2—source data 2, Supplementary file 2). Thus, the majority of QTLs for the N-end rule are pathway-specific, revealing considerable complexity in the genetics of UPS protein degradation.
Multiple causal DNA variants in
We leveraged the high degree of pathway specificity in our N-end rule QTLs to aid in the identification of causal genes in broad genomic QTL regions. A QTL on chromosome VII detected with 8 of 12 Arg/N-degron reporters (Figure 2B) was centered on
Figure 3.
Substrate-specific effects of
(A) Schematic illustrating Ubr1’s role in Arg/N-degron recognition. (B) Multiple causal DNA variants in
Figure 3—figure supplement 1.
Raw
Raw
Figure 3—figure supplement 2.
Raw
Fine-mapping the causal nucleotide in the
Figure 3—figure supplement 3.
Population frequency and distribution of the causal
Population frequencies and distribution of causal variants. Tree diagrams show genetic distance among a global panel of
Consistent with our QTL mapping results, The RM
QTL causal genes may contain multiple causal variants, making it necessary to test the effects of individual gene regions and variants in isolation (Lutz et al., 2019; Abell et al., 2022; Laurie-Ahlberg and Stam, 1987). We used CRISPR-swap to test the effect of partial RM
The partial RM
To identify individual causal variants, we tested the effect of the two BY / RM
To gain further insight into the causal –469A>T variant, we examined its molecular properties, evolutionary history and population frequency using genome sequence data from a panel of 1,011
Causal variants in functionally diverse ubiquitin system genes influence UPS activity
Some of the QTLs with the largest effects were specific to distinct N-end rule pathways or substrates and centered on known ubiquitin system genes (Figure 2B). We used allelic engineering to test whether these genes contained causal DNA variants for UPS activity.
A QTL on chromosome X was specific to the Type 1 asparagine (Asn) N-degron of the Arg/N-end pathway (Figure 2B). The QTL’s peak occurred within
Figure 4.
Identification of causal DNA variants for UPS activity in functionally diverse ubiquitin system genes.
(A, E, and I). Schematics showing the role of Nta1 (A), Doa10 (E), and Ubc6 (I) in UPS substrate processing, recognition, and ubiquitination, respectively. (B, F, and J). Location of regulatory and missense BY / RM variants, as well as active sites and functional domains in the proteins encoded by
Figure 4—figure supplement 1.
Raw
The BY strain was engineered to contain full or partial RM
Figure 4—figure supplement 2.
Raw
The BY strain was engineered to contain full or partial RM
Figure 4—figure supplement 3.
Raw
The BY strain was engineered to contain full or partial RM
Figure 4—figure supplement 4.
Population frequencies and distributions of causal variants.
Tree diagrams show genetic distance among a global panel of
A QTL on chromosome IX detected for 6 of 8 Ac/N-end degrons contained
A QTL on chromosome V detected for 7 of 8 Ac/N-degrons contained
Knowledge of the causal nucleotides in
The BY alleles of the causal
We examined additional QTLs to nominate candidate causal genes. The most frequently observed UPS QTL was detected for 8 of 8 Ac/N-end and 6 of 12 Arg/N-end TFTs and was located on chromosome XII in the immediate vicinity of a Ty1 insertion in the
Taken together, our analysis of causal genes and nucleotides illustrates the breadth and diversity of genetic influences on UPS activity. Each fine-mapped causal gene harbored multiple causal variants that may differentially affect distinct UPS substrates. Regulatory and missense variants in ubiquitin system genes that shape the full sequence of molecular events in protein ubiquitination, including substrate processing, recognition, and ubiquitination, alter UPS activity.
Protein-specific effects of
Previous efforts to understand how genetic variation influences gene expression have revealed considerable discrepancies between genetic effects on mRNA versus protein abundance. Many gene expression QTLs alter protein abundance without detectable effects on mRNA levels (Battle et al., 2015; Ghazalpour et al., 2011; Mirauta et al., 2020; Albert et al., 2014; Brion et al., 2020; Chick et al., 2016; Cenik et al., 2015; Abell et al., 2022; Foss et al., 2011). We reasoned that protein-specific gene expression QTLs could arise through effects on UPS protein degradation. To test this idea and explore how variant effects on UPS activity influence other aspects of cellular physiology, we measured global gene expression at the protein and RNA levels in the same cultures of the BY strain and a BY strain engineered to contain the causal –469A>T RM allele in the
Figure 5.
Proteomic and RNA-seq analysis of the effect of the
(A) Protein fold-change versus statistical significance for BY versus BY
Figure 5—figure supplement 1.
Over-represented GO biological processes and Reactome pathways in the set of differentially expressed transcripts.
Barchart of significantly over-represented Gene Ontology Biological Process and Reactome Pathway terms identified using the list of differentially expressed mRNA transcripts between the wild-type BY strain and BY
Out of 3,046 proteins quantified by mass spectrometry, 39 proteins were differentially abundant at a 10% FDR (Figure 5A, Figure 5—source data 1). Consistent with the reduced UPS activity conferred by the BY
To determine whether differences in protein abundance were reflected at the mRNA level, we used RNA-seq to quantify the levels of 5,675 transcripts. A total of 78 transcripts were differentially expressed between BY and BY
Discussion
Protein degradation by the UPS is an essential biological process that influences virtually all aspects of eukaryotic cellular physiology (Hanna and Finley, 2007; Varshavsky, 2011; Schwartz and Ciechanover, 1999; Finley and Prado, 2020). Understanding the sources of variation in UPS activity thus has considerable implications for our understanding of numerous cellular and organismal traits, including human health and disease (Schmidt and Finley, 2014; Petrucelli and Dawson, 2004; Gomes, 2013; Schwartz and Ciechanover, 1999). Our statistically powerful, systematic genetic mapping of the N-end rule has revealed that individual genetic differences create heritable variation in UPS protein degradation. Genetic effects on UPS activity are numerous and comprise a continuous distribution of many loci with small effects and few loci of large effect (Figure 2), similar to other complex traits (Mackay et al., 2009; Ehrenreich et al., 2009). Previous efforts to understand how individual genetic differences cause variation in UPS activity have focused on individual disease-causing mutations in UPS genes (Gomes, 2013; Agarwal et al., 2010; Zenker et al., 2006; Deng et al., 2011; Kröll-Hermi et al., 2020). Our results show that these large-effect mutations in UPS genes sit atop one extreme of a continuous distribution of variant effects that is dominated by many loci of small effect. Aberrant UPS activity is a hallmark of many common diseases with a poorly-understood, complex genetic basis (Schmidt and Finley, 2014; Petrucelli and Dawson, 2004; Zheng et al., 2014). Our results raise the possibility that the effects of many common, small-effect alleles may contribute to the risk of these diseases through their effects on UPS activity.
Using genome engineering, we experimentally identified causal regulatory and missense variants in four functionally distinct ubiquitin system genes. A major function of the ubiquitin system is conferring specificity to UPS protein degradation (Komander and Rape, 2012; Johnson et al., 1992; Bett, 2016). Non-ubiquitinated proteins are blocked by the proteasome’s 19S regulatory particle from degradation by the 20S catalytic core (Inobe and Matouschek, 2014). The selective binding of ubiquitinated substrates by the 19S regulatory particle ensures that only proteins targeted for degradation enter the proteasome. The activity of the ubiquitin system towards distinct substrates is highly variable, even for proteins degraded by the same UPS pathway (Bachmair et al., 1986; Kats et al., 2018; Christiano et al., 2020). Consistent with these observations, the effects of causal ubiquitin system gene variants were highly substrate-specific (Figures 3 and 4). Our results raise the question of whether UPS protein degradation is also shaped by variation in proteasome genes and whether any such effects would be less substrate-specific than those in the ubiquitin system. Given the multiple QTLs arising from ubiquitin system genes, detecting genetic influences on proteasome activity may benefit from assays that can measure proteasome activity independently of the ubiquitin system.
The remarkable complexity in causal variants we uncovered underscores the challenge of predicting variant effects on UPS protein degradation. Similar to recent results (Lutz et al., 2022; Abell et al., 2022), each of the four QTL regions we fine-mapped contained multiple causal variants in a single gene (Figures 3 and 4). In the case of
Our results suggest that genetic effects on UPS activity are an important source of post-translational variation in gene expression. A promoter variant that reduces UPS activity by decreasing
We have developed a generalizable framework for mapping genetic influences on protein degradation. Our results lay important groundwork for future efforts to understand how heritable differences in UPS activity contribute to variation in complex cellular and organismal traits, including the many diseases marked by aberrant UPS activity.
Materials and methods
Key resources table
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Gene ( |
| Saccharomyces Genome Database (SGD) | YGR184C | edited to contain |
Gene ( |
| SGD | YIL030C | edited to contain |
Gene ( |
| SGD | YER100W | edited to contain |
Gene ( |
| SGD | YJR062C | edited to contain |
Gene ( |
| SGD | YOR202W | selectable marker for genome engineering |
Gene ( |
| SGD | YNL268W | selectable marker for genome engineering |
Strain, strain | BY4741 | Leonid Kruglyak | YFA0040 | Supplementary file 5 |
Strain, strain | RM11.1a | Leonid Kruglyak | YFA0039 | Supplementary file 5 |
Strain, strain | recombinant progeny of BY4741 x RM11.1a | this study | SFA- | Supplementary file 5 |
Strain, strain | strains with tandem fluorescent timer reporters | this study | YFA- | Supplementary file 5 |
Strain, strain | strains lacking individual ubiquitin-proteasome system genes | this study | YFA- | Supplementary file 5 |
Strain, strain | strains with alternative UPS gene alleles / variants | this study | YFA- | Supplementary file 5 |
Strain, strain | DH5α | New England Biolabs | for plasmid cloning and propagation | |
Recombinant DNA reagent | 23 plasmids | this study | PFA- | Supplementary file 4 |
Recombinant DNA reagent | backbone plasmid | Addgene | 35121 | |
Recombinant DNA reagent | backbone plasmid | Addgene | 41030 | |
Recombinant DNA reagent | KanMX cassette | Wach et al., 1994; | selectable marker for genome engineering | |
Recombinant DNA reagent | NatMX cassette | Wach et al., 1994; | selectable marker for genome engineering | |
Sequence-based reagent | 102 oligonucleotides | Integrated DNA | OFA- | Supplementary file 3 |
Commercial assay or kit | Nextera DNA | Illumina | FC-121– | |
Commercial assay or kit | EB Ultra II | New England Biolabs | E7760 | |
Commercial assay or kit | Monarch Gel | New England Biolabs | T1010L | |
Commercial assay or kit | HiFi DNA Assembly | New England Biolabs | E5520S | |
Commercial assay or kit | TMT10plex Isobaric | ThermoFisher Scientific | 90110 | |
Commercial assay or kit | ZR Fungal / Bacterial | Zymo Research | R2014 | |
Commercial assay or kit | Quick-96 DNA Plus kit | Zymo Research | D4070 | |
Software, algorithm | MULTIPOOL | Edwards and Gifford, 2012; | ||
Software, algorithm | trimmomatic |
Bolger et al., 2014
| ||
Software, algorithm | kallisto | Bray et al., 2016; | ||
Software, algorithm | PANTHER | Mi et al., 2021; | ||
Software, algorithm | fastp | Chen et al., 2018; | ||
Software, algorithm | RSeQC | Wang et al., 2012; | ||
Software, algorithm | Scaffold | https://www.proteomesoftware.com/ | ||
Software, algorithm | Proteome Discoverer | Thermo Scientific | ||
Software, algorithm | AlphaFold | Jumper et al., 2021; | ||
Software, algorithm | Inkscape | https://inkscape.org | ||
Other | LSR II Flow Cytometer | BD | flow cytometry | |
Other | FACSAria II Cell Sorter | BD | cell sorting | |
Other | Orbitrap Fusion Tribrid | Thermo Scientific | mass spectrometry | |
Other | Next-Seq 550 | Illumina | DNA / RNAsequencing |
Tandem fluorescent timer ubiquitin-proteasome system activity reporters
We used tandem fluorescent timers (TFTs) to measure ubiquitin-proteasome system (UPS) activity. TFTs are fusions of two fluorescent proteins (FPs) with distinct spectral profiles and maturation kinetics (Khmelinskii et al., 2012; Khmelinskii and Knop, 2014). In the most common implementation, a TFT consists of a faster maturing green fluorescent protein (GFP) and a slower maturing red fluorescent protein (RFP). Because the FPs in the TFT mature at different rates, the RFP / GFP ratio changes over time. If the degradation rate of a TFT exceeds the maturation rate of the RFP, the -log2 RFP / GFP ratio is directly proportional to the construct’s degradation rate (Khmelinskii and Knop, 2014; Khmelinskii et al., 2012). When fused to N-degrons, the TFT’s RFP / GFP ratio measures UPS N-end rule activity (Khmelinskii et al., 2012; Khmelinskii et al., 2014). The RFP / GFP ratio is also independent of the TFT’s expression level, (Khmelinskii et al., 2012; Khmelinskii and Knop, 2014; Kong et al., 2021) preventing confounding from genetic effects on reporter expression in genetically diverse cell populations.
We used fluorescent proteins from previously characterized TFTs in our experiments (Khmelinskii et al., 2016; Khmelinskii and Knop, 2014; Khmelinskii et al., 2012; Khmelinskii et al., 2014). superfolder GFP (Pédelacq et al., 2006) (sfGFP) was used as the faster maturing FP in all TFTs. sfGFP matures in approximately 5 min and has excitation and emission maximums of 485 nm and 510 nm, respectively (Pédelacq et al., 2006). The slower maturing FP in each TFT was either mCherry or mRuby. mCherry matures in approximately 40 min and has excitation and emission maximums of 587 nm and 610 nm, respectively (Shaner et al., 2004). mRuby matures in approximately 170 min and has excitation and emission maximums of 558 nm and 605 nm, respectively (Kredel et al., 2009). All TFT fluorescent proteins are monomeric. We separated green and red FPs in each TFT with an unstructured 35 amino acid linker sequence to minimize fluorescence resonance energy transfer (Khmelinskii et al., 2012).
Construction of Arg/N-end and Ac/N-end pathway TFTs
To generate TFT constructs with defined N-terminal amino acids, we used the ubiquitin-fusion technique (Bachmair et al., 1986; Varshavsky, 2005; Varshavsky, 2011), which involves placing a ubiquitin moiety immediately upstream of a sequence encoding the desired N-degron. During translation, ubiquitin-hydrolases cleave the ubiquitin moiety, exposing the N-degron (Figure 1A). We synthesized DNA (Integrated DNA Technologies [IDT], Coralville, Iowa, USA) encoding the
We then devised a general strategy to assemble TFT-containing plasmids with defined N-terminal amino acids (Figure 1—figure supplement 2). We first obtained sequences encoding each reporter element by PCR or DNA synthesis. We codon-optimized the sfGFP, mCherry, mRuby, and the TFT linker sequences for expression in
We used Addgene plasmid #35121 (a gift from John McCusker) to construct all TFT plasmids. Digesting this plasmid with BamHI and EcoRV restriction enzymes produces a 2,451 bp fragment that we used as a vector backbone for TFT plasmid assembly. We obtained a DNA fragment containing 734 bp of sequence upstream of the
Yeast strain handling
We used two strains of the yeast
We built additional strains for characterizing our UPS activity reporters by deleting individual UPS genes from the BY strain. Each deletion strain was constructed by replacing the targeted gene with the NatMX cassette (Goldstein and McCusker, 1999), which confers resistance to the antibiotic nourseothricin. We PCR amplified the NatMX cassette from Addgene plasmid #35121 using primers with homology to the 5’ upstream and 3’ downstream sequences of the targeted gene. The oligonucleotides for each gene deletion cassette amplification are listed in Supplementary file 3. We created a BY strain lacking the
Table 1.
Strain genotypes.
Short Name | Genotype | Antibiotic Resistance | Auxotrophies |
---|---|---|---|
BY | histidine | ||
RM | clonNAT, hygromycin | histidine | |
BY | clonNAT | histidine | |
BY | clonNAT | histidine | |
BY | clonNAT | histidine |
Table 2 describes the media formulations used for all experiments. Synthetic complete amino acid powders (SC -lys and SC -his -lys -ura) were obtained from Sunrise Science (Knoxville, TN, USA). Where indicated, we added the following reagents at the indicated concentrations to yeast media: G418, 200 mg/mL (Fisher Scientific, Pittsburgh, PA, USA); clonNAT (nourseothricin sulfate, Fisher Scientific), 50 mg/L; thialysine (S-(2-aminoethyl)-L-cysteine hydrochloride; MilliporeSigma, St. Louis, MO, USA), 50 mg/L; canavanine (L-canavanine sulfate, MilliporeSigma), 50 mg/L.
Table 2.
Media formulations.
Media Name | Abbreviation | Formulation |
---|---|---|
Yeast-Peptone-Dextrose | YPD | 10 g/L yeast extract |
20 g/L peptone | ||
20 g/L dextrose | ||
Synthetic Complete | SC | 6.7 g/L yeast nitrogen base |
1.96 g/L amino acid mix -lys | ||
20 g/L dextrose | ||
Haploid Selection | SGA | 6.7 g/L yeast nitrogen base |
1.74 g/L amino acid mix -his -lys -ura | ||
20 g/L dextrose | ||
Sporulation | SPO | 1 g/L yeast extract |
10 g/L potassium acetate | ||
0.5 g/L dextrose |
Yeast transformation
We used a standard yeast transformation protocol to construct reporter control strains and build strains with UPS activity reporters (Gietz and Schiestl, 2007). In brief, we inoculated yeast strains growing on solid YPD medium into 5 mL of YPD liquid medium for overnight growth at 30°C. The following morning, we diluted 1 mL of saturated culture into 50 mL of fresh YPD and grew the cells for 4 hr. The cells were then successively washed in sterile ultrapure water and transformation solution 1 (10 mM Tris HCl [pH 8.0], 1 mM EDTA [pH 8.0], and 0.1 M lithium acetate). At each step, we pelleted the cells by centrifugation at 3000 rpm for 2 min in a benchtop centrifuge and discarded the supernatant. The cells were suspended in 100 μL of transformation solution 1 along with 50 μg of salmon sperm carrier DNA and 300 ng of transforming DNA. The cells were incubated at 30 for 30 min and 700 μL of transformation solution 2 (10 mM Tris HCl [pH 8.0], 1 mM EDTA [pH 8.0], and 0.1 M lithium acetate in 40% polyethylene glycol [PEG]) was added to each tube, followed by a 30-min heat shock at 42°C. We then washed the transformed cells in sterile, ultrapure water. We added 1 mL of liquid YPD medium to each tube and incubated the tubes for 90 min with rolling at 30°C to allow for expression of the antibiotic resistance cassettes. After washing with sterile, ultrapure water, we plated 200 μL of cells on solid SC -lys medium with G418 and thialysine, and, for strains with the NatMX cassette, clonNAT. For each strain, we streaked multiple independent colonies (biological replicates) from the transformation plate for further analysis as indicated in the text. We verified reporter integration at the targeted genomic locus by colony PCR (Ward, 1992). The primers used for these experiments are listed in Supplementary file 3.
Yeast mating and segregant populations
We created populations of genetically variable, recombinant cells ("segregants") for genetic mapping using a modified synthetic genetic array (SGA) approach (Baryshnikova et al., 2010; Kuzmin et al., 2016). We first mated BY strains with a given UPS activity reporter to RM by mixing freshly streaked cells of each strain on solid YPD medium. For each UPS activity reporter, we mated two independently-derived clones (biological replicates) to the RM strain. Cells were grown overnight at 30°C and we selected for diploid cells (successful BY-RM matings) by streaking mated cells onto solid YPD medium with G418 (which selects for the KanMX cassette in the TFT in the BY strain) and clonNAT (which selects for the NatMX cassette in the RM strain). We inoculated 5 mL of YPD with freshly streaked diploid cells for overnight growth at 30°C. The next day, we pelleted the cultures, washed them with sterile, ultrapure water, and resuspended the cells in 5 mL of SPO liquid medium (Table 2). We sporulated the cells by incubating them at room temperature with rolling for 9 days. After confirming sporulation by brightfield microscopy, we pelleted 2 mL of culture, washed cells with 1 mL of sterile, ultrapure water, and resuspended cells in 300 μL of 1 M sorbitol containing 3 U of Zymolyase lytic enzyme (United States Biological, Salem, MA, USA) to degrade ascal walls. Digestions were carried out at 30°C with rolling for 2 hr. We then washed the spores with 1 mL of 1 M sorbitol, vortexed for 1 min at the highest intensity setting, resuspended the cells in sterile ultrapure water, and confirmed the release of cells from ascii by brightfield microscopy. We plated 300 μl of cells onto solid SGA medium containing G418 and canavanine. This media formulation selects for haploid cells with (1) a UPS activity reporter via G418, (2) the
Flow cytometry
We measured UPS activity by flow cytometry as follows. Yeast strains were manually inoculated into 400 μL of liquid SC -lys medium with G418 and grown overnight in 2 mL 96-well plates at 30°C with 1000 rpm mixing using a MixMate (Eppendorf, Hamburg, Germany). The following morning, we inoculated a fresh 400 μL of G418-containing SC -lys media with 4 μL of each saturated culture. Cells were grown for an additional 3 hr prior to analysis by flow cytometry. All flow cytometry experiments were performed on an LSR II flow cytometer (BD, Franklin Lakes, NJ, USA) equipped with a 20 mW 488 nm laser with 488/10 and 525/50 filters for measuring forward/side scatter and sfGFP, respectively, as well as a 40 mW 561 nm laser and a 610/20 filter for measuring mCherry and mRuby. Table 3 lists the parameters and settings that were used for all flow cytometry and fluorescence-activated cell sorting (FACS) experiments. We recorded 10,000 cells each from 8 independent biological replicates per strain for our analyses of BY, RM, and reporter control strains.
Table 3.
Flow cytometry and FACS settings.
Parameter | Laser Line (nm) | Laser Setting (V) | Filter |
---|---|---|---|
forward scatter (FSC) | 488 | 500 | 488/10 |
side scatter (SSC) | 488 | 275 | 488/10 |
sfGFP | 488 | 500 | 525/50 |
mCherry | 561 | 615 | 610/20 |
mRuby | 561 | 615 | 610/20 |
We analyzed flow cytometry data using R (R Foundation for Statistical Computing, Vienna Austria) and the flowCore R package (Hahne et al., 2009). We first filtered each flow cytometry dataset to include only those cells within 10% ± the forward scatter (a proxy for cell size) median. We empirically determined that this gating approach captured the central peak of cells in the FSC histogram. It also removed cellular debris, aggregates of multiple cells, and restricted our analyses to cells of the same approximate size. We observed that the TFT’s output changed with the passage of time during flow cytometry experiments. We used the residuals of a loess regression of the TFT’s output on time to correct for this effect, similar to a previously-described approach (Brion et al., 2020).
To characterize our TFT reporters, we used the following analysis steps. We extracted the median -log2 RFP / GFP ratio from each of 10,000 cells per strain per reporter. These values were Z-score normalized relative to the sample lowest degradation rate (typically the E3 ligase deletion strain). Following this transformation, the strain with lowest degradation rate has a degradation rate of approximately 0 and the now-scaled RFP / GFP ratio is directly proportional to the construct’s degradation rate. To compare degradation rates between strains and individual UPS activity reporters, we then converted scaled RFP/GFP ratios to Z scores, which we report as "Normalized UPS Activity". Statistical significance was assessed using a one-way ANOVA with Tukey’s HSD post-hoc test.
For fine-mapping causal genes and variants for UPS activity QTLs, we used the following approach. We extracted the median -log2 RFP / GFP ratio from each of 10,000 cells per strain per reporter. These values were Z-score normalized relative to the median of the control strain (a BY strain engineered to contain the BY allele of a candidate causal gene). Statistical significance was assessed using a t-test of each experimental strain versus the control strain with Benjamini-Hochberg correction for multiple testing (Benjamini and Hochberg, 1995).
Fluorescence-activated cell sorting
We selected populations of segregants for QTL mapping using a previously described approach for isolating phenotypically extreme cell populations by FACS (Albert et al., 2014; Brion et al., 2020). Segregant populations were thawed approximately 16 hr prior to cell sorting and grown overnight in 5 mL of SGA medium containing G418 and canavanine. The following morning, 1 mL of cells from each segregant population was diluted into a fresh 4 mL of SGA medium containing G418 and canavanine. Segregant cultures were then grown for an additional 4 hours prior to sorting. All FACS experiments were carried out using a FACSAria II cell sorter (BD). We used plots of side scatter (SSC) height by SSC width and forward scatter (FSC) height by FSC width to remove doublets from each sample. We then filtered cells on the basis of FSC area, restricting our sorts to ±7.5% of the central FSC peak, which we empirically determined excluded cellular debris and aggregates while encompassing the primary haploid cell population. Finally, we defined a fluorescence-positive population by comparing each segregant population to negative control BY and RM strains without TFTs. We collected pools of 20,000 cells each from three gates drawn on each segregant population:
The 2% lower tail of the UPS activity distribution
The 2% upper tail of the UPS activity distribution
Fluorescence-positive cells without selection on UPS activity (“null pools”), which were used to determine the false positive rate of the QTL mapping method (see below)
We collected cell pools from two independent biological replicates (spore preparations) for each reporter. Each pool of 20,000 cells was collected into sterile 1.5 mL polypropylene tubes containing 1 mL of SGA medium and grown overnight at 30°C with rolling. The next day, we mixed 750 μL of cells with 450 μL of 40% (v/v) glycerol and stored this mixture in 2 mL 96-well plates at −80°C.
Genomic DNA isolation and library preparation
We extracted genomic DNA from sorted segregant pools for whole-genome sequencing. Deep-well plates containing glycerol stocks of sorted segregant pools were thawed and 800 μL of each sample was pelleted by centrifugation at 3,700 rpm for 10 min. We discarded the supernatant and resuspended cell pellets in 800 μL of a 1 M sorbitol solution containing 0.1 M EDTA, 14.3 mM β-mercaptoethanol, and 500 U of Zymolyase lytic enzyme to digest cell walls prior to DNA extraction. The digestion reaction was carried out by resuspending cell pellets with mixing at 1,000 rpm for 2 min followed by incubation for 2 hr at 37°C. When the digestion reaction finished, we discarded the supernatant, resuspended cells in 50 μL of phosphate buffered saline, and used the Quick-DNA 96 Plus kit (Zymo Research, Irvine, CA, USA) to extract genomic DNA. We followed the manufacturer’s protocol to extract genomic DNA with the following modifications. We incubated cells in a 20 mg/mL proteinase K solution overnight with incubation at 55°C. After completing the DNA extraction protocol, we eluted DNA using 40 μL of DNA elution buffer (10 mM Tris-HCl [pH 8.5], 0.1 mM EDTA). The DNA concentration for each sample was determined using the Qubit dsDNA BR assay kit (Thermo Fisher Scientific, Waltham, MA, USA) in a 96 well format using a Synergy H1 plate reader (BioTek Instruments, Winooski, VT, USA).
We used a previously-described approach to prepare libraries for short-read whole-genome sequencing on the Illumina Next-Seq platform (Albert et al., 2014; Brion et al., 2020). We used the Nextera DNA library kit (Illumina, San Diego, CA, USA) according to the manufacturer’s instructions with the following modifications. For the tagmentation reaction, 5 ng of genomic DNA from each sample was diluted in a master mix containing 4 μL of Tagment DNA buffer, 1 μL of sterile molecular biology grade water, and 5 μL of Tagment DNA enzyme diluted 1:20 in Tagment DNA buffer. The tagmentation reaction was run on a SimpliAmp thermal cycler (Thermo Fisher Scientific) using the following parameters: 55°C temperature, 20 μL reaction volume, 10 min incubation. To prepare libraries for sequencing, we added 10 μL of the tagmentation reaction to a master mix containing 1 μL of an Illumina i5 and i7 index primer pair mixture, 0.375 μL of ExTaq polymerase (Takara Bio, Mountain View, CA, USA), 5 μL of ExTaq buffer, 4 μL of a dNTP mixture, and 29.625 μL of sterile molecular biology grade water. We generated all 96 possible index oligo combinations using 8 i5 and 12 i7 index primers. The library amplification reaction was run on a SimpliAmp thermal cycler with the following parameters: initial denaturation at 95°C for 30 s, then 17 cycles of 95°C for 10 s (denaturation), 62°C for 30 s (annealing), and 72°C for 3 min (extension). We quantified the DNA concentration of each reaction using the Qubit dsDNA BR assay kit (Thermo Fisher Scientific) and pooled 10 μL of each reaction. This pooled mixture was run on a 2% agarose gel and we extracted and purified DNA in the 400 bp to 600 bp region using the Monarch Gel Extraction Kit (NEB) according to the manufacturer’s instructions.
Whole-genome sequencing
We submitted pooled, purified DNA libraries to the University of Minnesota Genomics Center (UMGC) for Illumina sequencing. Prior to sequencing, UMGC staff performed three quality control (QC) assays. Library concentration was determined using the PicoGreen dsDNA quantification reagent (Thermo Fisher Scientific) with libraries at a concentration of 1 ng/μL passing QC. Library size was determined using the Tapestation electrophoresis system (Agilent Technologies, Santa Clara, CA, USA) with libraries in the range of 200–700 bp passing QC. Library functionality was determined using the KAPA DNA Library Quantification kit (Roche, Penzberg, Germany), with libraries with a concentration greater than 2 nM passing. All submitted libraries passed each QC assay. We submitted 7 libraries for sequencing at different times. Libraries were sequenced on a NextSeq 550 instrument (Illumina). Depending on the number of samples, we used the following output settings. For libraries with 70 or more samples (2 libraries), 75 bp paired end sequencing was performed in high-output mode to generate approximately 360 × 106 reads. For libraries with 50 or fewer samples (5 libraries), 75 bp paired end sequencing was performed in mid-output mode to generate approximately 120 × 106 reads. Average read coverage of the genome ranged from 9 to 35 with a median coverage of 28 across all libraries. Sequence data de-multiplexing was performed by UMGC. Whole-genome sequencing data have been deposited into the NIH Sequence Read Archive under Bioproject accession PRJNA881749.
Raw whole-genome sequencing data processing
We calculated allele frequencies from our whole-genome sequencing data using the following pipeline. We initially filtered reads to include only those reads with mapping quality scores greater than 30. We aligned the filtered reads to the
QTL mapping
We identified QTLs from sequence data following established procedures for bulk segregant analysis (Ehrenreich et al., 2010; Albert et al., 2014; Brion et al., 2020). Allele counts in the vcf files generated above were provided to the MULTIPOOL algorithm (Edwards and Gifford, 2012). MULTIPOOL computes logarithm of the odds (LOD) scores by comparing two models: (1) a model in which the high and low UPS activity pools come from one from common population and thus share the same frequency of the BY and RM allele, and (2) a model in which these pools come from two populations with two different allele frequencies, indicating the presence of a QTL. We identified QTLs as genomic regions exceeding an empirically-derived significance threshold (see below). We used MULTIPOOL with the following settings: bp per centiMorgan = 2,200, bin size = 100 bp, effective pool size = 1,000. As in previous QTL mapping in the BY/RM cross by bulk segregant analysis (Albert et al., 2014; Brion et al., 2020), we excluded variants with allele frequencies higher than 0.9 or lower than 0.1 (Albert et al., 2014; Brion et al., 2020). We also used MULTIPOOL to estimate confidence intervals for each significant QTL, which we defined as a 2-LOD drop from the QTL peak position. To visualize QTLs and gauge their effects, we also computed the RM allele frequency differences (ΔAF) at each site between our high and low UPS activity pools. Because allele frequencies are affected by random counting noise, we used loess regression to smooth the allele frequency for each sample before computing ΔAF. We used the smoothed values to plot the ΔAF distribution along the genome and as a measure of QTL effect size.
Null sorts and empirical false discovery rate estimation
We used "null" segregant pools (fluorescence-positive cells with no selection on UPS activity) to empirically estimate the false discovery rate (FDR) of our QTL mapping method. Because these cells are obtained as two pools from the same null population in the same sample, any ΔAF differences between them are the result of technical noise or random variation. We permuted these null comparisons across segregant pools with the same UPS activity reporter for a total of 112 null comparisons. We define the "null QTL rate" at a given LOD threshold as the number of QTLs that exceeded the threshold in these comparisons divided by the number of null comparisons. To determine the FDR for a given LOD score, we then determined the number of QTLs for our experimental comparisons (high UPS activity versus low UPS activity). We define the "experimental QTL rate" as the number of experimental QTLs divided by the number of experimental comparisons. The FDR is thus computed as follows:
We evaluated the FDR over a LOD range of 2.5–10 in 0.5 LOD increments. We found that a LOD value of 4.5 led to a null QTL rate of 0.0625 and an FDR of 0.507%. We used this value as our significance threshold for QTL mapping and further filtered our QTL list by excluding QTLs that were not detected in each of two independent biological replicates. Replicating QTLs were defined as those whose peaks were within 100 kb of each other on the same chromosome with the same direction (positive or negative) of RM allele frequency difference between high and low UPS activity pools.
QTL fine-mapping by allelic engineering
We used ‘CRISPR-Swap’ (Lutz et al., 2019), a two-step method for scarless allelic editing, to fine-map QTLs to the level of their causal genes and nucleotides. In the first step of CRISPR-Swap, a gene of interest (GOI) is deleted and replaced with a selectable marker. In the second step, cells are co-transformed with (1) a plasmid that expresses CRISPR-cas9 and a guide RNA targeting the selectable marker and (2) a repair template encoding the desired allele of the GOI.
We used CRISPR-Swap to generate BY strains harboring either RM alleles or chimeric BY/RM alleles of several genes, as described below. To do so, we first replaced the gene of interest in BY with the NatMX selectable marker by transforming a PCR product encoding the NatMX cassette with 40 bp overhangs at the 5’ and 3’ ends of the targeted gene. To generate
We then modified the original CRISPR-Swap plasmid (PFA0055, Addgene plasmid #131774) to replace its
We used genomic DNA from BY and RM strains as a template to PCR amplify repair templates for CRISPR-Swap. Genomic DNA was extracted from BY and RM strains using the ‘10-min prep’ protocol (Hoffman and Winston, 1987). We amplified full-length repair templates from RM and BY containing each GOI’s promoter, open-reading frame (ORF), and terminator using Phusion Hot Start Flex DNA polymerase (NEB). We also created chimeric repair templates containing combinations of BY and RM alleles using PCR splicing by overlap extension (Horton et al., 1989). Table 4 lists the repair templates used for CRISPR swap. The sequence of all repair templates was verified by Sanger sequencing.
Table 4.
CRISPR-swap repair templates.
Gene | Allele Name | Promoter | ORF | Terminator |
---|---|---|---|---|
| BY | BY | BY | |
| RM | RM | RM | |
| RM | BY | BY | |
| BY | RM | BY | |
| BY | BY | RM | |
| –469, RM; all other, BY | BY | BY | |
| –197, RM; all other, BY | BY | BY | |
| BY | BY | BY | |
| RM | RM | RM | |
| BY | 1228, RM; all other, BY | BY | |
| BY | 3036, RM; all other, BY | BY | |
| BY | 3557, RM; all other, BY | BY | |
| BY | BY | BY | |
| RM | RM | RM | |
| RM | BY | BY | |
| RM | 331, RM; all other, BY | BY | |
| RM | 386, RM; all other, BY | BY | |
| BY | BY | BY | |
| RM | RM | RM | |
| RM | BY | BY | |
| BY | 1686, RM; all other, BY | BY | |
| BY | BY | RM |
To create allele swap strains, we co-transformed BY strains with 200 ng of plasmid PFA0227 and 1.5 μg of GOI repair template. Transformants were selected and single colony purified on synthetic complete medium lacking histidine and then patched onto solid YPD medium. We tested each strain for the desired exchange of the NatMX selectable marker with a
We tested whether a QTL on chromosome V results from variation in
RNA isolation
We isolated total RNA from 5 independent biological replicates each of the wild-type BY strain and a BY strain edited to contain the –469A>T RM variant in the
Total RNA was extracted from frozen cell pellets using the ZR Fungal/Bacterial miniprep kit (Zymo), according to the manufacturer’s instructions. Briefly, total RNA was isolated from cell pellets in two batches, each containing equal numbers of BY and
RNA-seq
We isolated mRNA from each total RNA sample using the 550 ng of total RNA input and the NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB). All samples were processed in a single batch and the isolated mRNA from each sample was used to prepare RNA sequencing libraries using the NEBNext Ultra II Directional RNA Library Prep kit (NEB) according to the manufacturer’s instructions. Libraries were amplified using NEBNext Ultra II Q5 polymerase and unique combinations of primers from the NEBNext Multiplex Oligos for Illumina (NEB). The following amplification protocol was used: initial denaturation at 98°C for 30 s, followed by 10 cycles of 98°C (10 s; denaturation), 65°C (75 s; annealing and extension), and a 65°C final extension for 5 min. PCR reactions were pooled using equal amounts of DNA and submitted to UMGC for three quality control assays, which measured the library concentration by PicoGreen, library functionality by KAPA qPCR, and library size using the Tapestation electrophoresis system (Agilent). The resulting library contained a small amount of adapter dimer (approximately 9%), which was subsequently removed via a bead-based cleanup. The final, cleaned library passed all three QC assays and was sequenced on a Next-Seq 2000 instrument (Illumina) in paired-end mode with 150 bp reads. The sequencing run generated 1,367,252,076 reads with an average of 136,725,207 (range: 112,285,619–152,571,763) reads per sample.
RNA-seq data processing and analysis
We performed quality control and preprocessing of RNA-seq data using fastp (Chen et al., 2018). Our initial processing removed reads with a length less than 36 bp and any reads where the mean quality dropped below a mean quality score of 15 in a 4 bp window. We also used fastp to trim adapter sequences from the ends of all reads. We then used Kallisto (Bray et al., 2016) to pseudoalign processed reads to the
To identify differentially expressed transcripts, we used the estimated counts obtained from Kallisto as a measure of gene expression and filtered the estimated counts using the following procedures. First, we computed a transcript Transcript Integrity Number (TIN) for each gene using the RSeqQC (Wang et al., 2012) and removed any genes with a TIN less than 1 for any sample. We also removed any genes that Kallisto estimated to have an effective length less of less than 1 and those genes whose estimated counts were less than 10 in any sample. The resulting dataset comprised 5,676 expressed genes. Raw RNA-seq reads and processed counts were deposited in the NIH Gene Expression Omnibus database under accession number GSE213689. We used DESeq2 (Love et al., 2014) to perform statistical analysis of the resulting dataset. We used the RNA harvest batch and OD at time of sample harvest as covariates in our analysis. To further control for confounding sample-to-sample variation, we used surrogate variable analysis (Leek et al., 2012; Leek and Storey, 2007), which identified two significant surrogate variables that were subsequently added to our statistical model. We corrected for multiple testing using the Benjamini-Hochberg method (Benjamini and Hochberg, 1995) and considered significant differences as those with a corrected
To link differences in transcript abundance to biological pathways, we performed gene ontology enrichment analysis using PANTHER (Mi et al., 2021). The ‘statistical overrepresentation test’ was used to search for gene ontology (GO) biological processes and Reactome pathways enriched in our set of 78 transcripts differentially expressed between BY and
Protein isolation and proteomic analysis by mass spectrometry
To quantify gene expression at the protein level, we submitted five cell pellets each from the same BY and
Samples were prepared and analyzed by mass spectrometry as follows. CMSP first labeled individual samples using the tandem mass tag (TMT) 10plex labeling kit (Thermo). After tagging, samples were pooled for analysis by mass spectrometry on an Orbitrap Tribrid Eclipse instrument (Thermo). Database searching was performed using the Proteome Discoverer software and the statistical analysis of protein abundance was performed in Scaffold (Proteome Software, Portland, OR, USA). We considered proteins to be differentially abundant between strains if they had a and a Benjamini-Hochberg corrected
Evolutionary analysis of variants
We inferred the allelic status of individual variants by comparing them to two outgroups: a likely-ancestral Taiwanese
Data and statistical analysis
All data were analyzed using R (version 3.6.1; R Project for Statistical Computing). For all boxplots, the center line shows the median, the box excludes the upper and lower quartiles, the whiskers extend to 1.5 times the interquartile range. Protein structure predictions were obtained from the AlphaFold Protein Structure Database (Jumper et al., 2021) and visualized using ChimeraX (Pettersen et al., 2021). DNA binding motifs were determined using the Yeast Transcription Factor Specificity Compendium database (de Boer and Hughes, 2012). Final figures and illustrations were made using Inkscape (version 0.92; Inkscape Project).
Computational scripts used to process data, for statistical analysis, and to generate figures are available at: https://www.github.com/mac230/N-end_Rule_QTL_paper; copy archived at swh:1:rev:24baa12af4e9c45691be2590ab30b2c1faf0c497 (Collins, 2022).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022, Collins et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Precise control of protein degradation is critical for life, yet how natural genetic variation affects this essential process is largely unknown. Here, we developed a statistically powerful mapping approach to characterize how genetic variation affects protein degradation by the ubiquitin-proteasome system (UPS). Using the yeast
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer