Content area
Background
Bovine leukemia virus (BLV) is an oncogenic deltaretrovirus that induces enzootic bovine leukosis. A defining feature of BLV is its viral miRNA cluster, which is transcribed atypically by RNA polymerase III via internal type 2 promoters rather than through the canonical Pol II pathway. These miRNAs accumulate to high levels within infected lymphocytes and can alter expression of a variety of host genes involved in lymphocyte proliferation and impose leukemogenic processes.
Results
Here, we present a comprehensive in silico characterization of new A-box and B-box promoter motifs within the BLV miRNA-coding region. As the first step, a taxonomically diverse dataset of small non-coding RNAs (tRNAs, SINEs, and other ncRNAs) was assembled to derive position-weight matrices and corresponding IUPAC consensus sequences for type 2 internal Pol III promoter motifs. Using these models, all available BLV miRNA cluster sequences were scanned to identify and map A-box-like and B-box-like elements and to reconstruct the underlying promoter architecture. Our analyses reveal a noncanonical BLV promoter organization: overlapping degenerate A-box variants—most frequently three distinct elements—reside within the pre-miRNA hairpin region, whereas B-box elements were positioned downstream of the Pol III termination signal, effectively excluded from the mature transcript.
Conclusions
Despite motif degeneration, critical nucleotide positions remained strongly conserved, indicating evolutionary pressure to preserve Pol III recruitment while accommodating viral genome constraints. These findings fill a crucial gap in understanding of BLV Pol III promoter architecture and provide a foundation for future studies on how unconventional promoter configurations regulate viral miRNA expression and virus–host interactions.
Background
Bovine leukemia virus (BLV) is an oncogenic deltaretrovirus that causes enzootic bovine leukemia (EBL), a chronic B-lymphoproliferative disease in cattle [1]. BLV establishes a lifelong latent infection of B lymphocytes in which most animals remain clinically asymptomatic; over time, approximately 30% develop persistent lymphocytosis and a small proportion progress to B-cell lymphoma [2]. Within infected cells, viral elements such as the Tax oncoprotein and microRNAs (miRNAs) drive proliferation of infected lymphocytes and contribute to oncogenic transformation [2,3,4]. The discovery that BLV encodes its own regulatory small RNAs (miRNAs) has illuminated novel aspects of its pathogenesis [4,5,6]. Unlike most retroviruses, BLV harbors a highly conserved miRNA cluster that is transcribed by RNA polymerase III (Pol III), rather than by Pol II [5,6,7]. These viral miRNAs accumulate to high levels in tumor-derived B cells despite negligible expression of other viral genes [8, 9]. The prolonged latency of BLV, with minimal protein expression, had long obscured the mechanisms by which the virus induces malignancy [8]. It is now recognized that BLV miRNAs can play a central role in transformation [3]. As an example, BLV-miR-B4 mimics the cellular oncomiR miR-29 in sequence and target profile, and overexpression of this viral miRNA promotes B-cell tumorigenesis [5, 8]. Moreover, BLV miRNAs modulate host gene networks involved in cell signaling, proliferation and immune regulation, and are essential for tumor induction in animal models as well as efficient viral replication in vivo [3, 8]. Thus, miRNAs constitute a critical interface between virus and host, orchestrating transcriptional and post-transcriptional programs that underlie viral persistence and oncogenesis [10].
Transcription of BLV miRNAs proceeds through a Pol III type 2 promoter mechanism, quite similar to that used by cellular small RNAs such as transfer RNAs (tRNAs) [7, 11, 12]. In canonical Pol III type 2 promoters—characteristic of tRNA genes, adenovirus VA RNAs and short interspersed nuclear elements (SINE) retrotransposons—the promoter resides within the transcribed region and comprises two internal control regions (ICRs), the A-box and B-box motifs [11,12,13]. These motifs are bound by the TFIIIC complex, which in turn recruits TFIIIB (including TBP and the Pol III–specific subunits Brf1 and Bdp1) to position the polymerase at the transcription start site (TSS) [12].
Unlike Pol III type 1 promoters (5 S rRNA gene) or type 3 promoters (U6 snRNA), the type 2 promoter does not require upstream TATA or PSE elements—the entire promoter information is encoded within the appropriately spaced A-box and B-box sequences [11, 14]. Consensus sequences for the A-box and B-box motifs have been defined through comparative analyses of Pol III–transcribed genes across several taxa. A classical 11-nt consensus for the A-box is often represented as TRGYNNARNNG, whereas the B-box consensus is GGTTCRANYCY [15,16,17,18]. These motifs overlap structural elements of the tRNA cloverleaf and are typically separated by 30–60 nucleotides in the genome, ensuring proper TFIIIC binding and initiation complex assembly [11, 19, 20]. Similar intragenic promoter architectures were found in many repetitive noncoding sequences, such as SINE, whose type 2 promoters are derived from tRNA genes [20]. Extensive motif data are curated in resources including the Genomic tRNA Database (GtRNAdb) [21], the RNA Families Database (Rfam) [22], the Database of repetitive DNA elements (Dfam) [23], and the Trypanosomatid Genomics Database (TriTrypDB) [24].
In the case of BLV, the B-box motif is situated downstream of the RNA polymerase III termination signal and, is therefore not included in the transcribed pre-miRNA hairpin [7, 25]. Although mechanistically analogous to tRNA promoters, this arrangement represents a noncanonical variant of the Pol III type 2 promoter. Crucially, BLV pre-miRNA transcripts bypass the canonical microprocessor - Drosha/DGCR8 cleavage step and are directly processed by Dicer, in contrast to most eukaryotic small noncoding RNAs [7, 10, 26].
Although the general sequence for A-box and B-box motifs is conserved, subtle sequence variations occur among Pol III promoters of different classes and evolutionary lineages of tRNA genes [27]. To date, while single A-box-like and B-box-like elements have been identified in the BLV genome, a comprehensive analysis of degenerate motif variants within the cluster of pre-miRNAs —and their spatial organization in the context of a viral Pol III promoter—has not been undertaken. It remains unknown how BLV A-box and B-box sequences diverge from cellular Pol III consensus and what functional consequences such divergence may have for viral miRNA transcription.
We aim to identify and characterize all the degenerate A-box and B-box type 2 promoter motifs of RNA polymerase III within a cluster of virus-encoded miRNAs by: (1) deriving universal consensus sequences for A-box and B-box motifs based on a broad comparative analysis of small ncRNA sequences from diverse organisms, (2) constructing position weight matrices (PWMs) for these motifs, (3) identifying and mapping homologous A-box-like and B-box-like motifs within the BLV miRNA encoding region and delineating their promoter architecture and (4) evaluating the conservation and variability of these motifs in an evolutionary context and assessing their potential impact on Pol III transcriptional initiation. These analyses will fill a critical gap in our understanding of BLV Pol III promoters and reveal how atypical or degenerate motif sequences may modulate viral miRNA expression and virus–host interactions. Taken together, all of these studies provide compelling evidence that the degenerate, triplicated Pol III promoters of BLV are not accidental products of evolution, but rather represent a sophisticated mechanism enabling the virus to both transform host cells and evade the immune response.
Materials and methods
Compilation of A-box and B-box promoter motifs in tRNAs, SINEs, and other ncRNAs across phylogenetically diverse organisms
To compile degenerate A-box and B-box motifs, a total of 32,990 sequences classified as tRNAs, SINEs or other small non-coding RNAs (ncRNAs) were retrieved from available databases, including the GtRNAdb, Dfam, TriTrypDB, Rfam, and NCBI Nucleotide. The subsequent analyses were conducted according to the computational pipeline summarized in Fig. 1.
[IMAGE OMITTED: SEE PDF]
The selected sequences originated from 70 phylogenetically diverse organisms, representing a broad spectrum of taxonomic groups such as viruses, archaea, protists, fungi, lower plants, seed plants, invertebrates, jawless vertebrates, fish, amphibians, reptiles, birds, and mammals. All collected sequences contained type 2 promoter elements recognized by RNA polymerase III, specifically comprising A-box and B-box motifs. The source organisms (Dfam, GtRNAdb, GenBank and Rfam), number of sequences, RNA types and detailed consensus sequences of the promoter motifs, including their lengths, were summarized in Supplementary Table 1 and Supplementary Table 7. The dataset included sequences derived from both unicellular organisms (e.g., archaea and protists) and multicellular taxa (e.g., invertebrates, vertebrates, and higher plants), as well as viral RNAs. Promoter motifs, A-box and B-box, were mapped based on established sequence patterns reported in the literature [15, 17, 28,29,30,31,32,33,34,35,36]. Annotation of the motifs was performed using the Geneious Prime software [37]. In the next step, consensus sequences for both promoter types were independently determined for each of the diverse organisms. To derive each consensus a minimum nucleotide identity threshold of 95% was applied across the aligned sequences. The resulting motifs were encoded using IUPAC nucleotide codes and presented in Table 1. Among sequences from diverse taxa, three A-box motifs—designated A1, A2, and A3—were identified in some transcripts, whereas others contained fewer (Table 1). In total, 184 distinct A-box consensus promoter sequences and 70 B-box consensus promoter sequences were collected.
[IMAGE OMITTED: SEE PDF]
Construction of position weight matrices for A-box and B-box motifs based on taxonomically diverse organisms
Based on 184 variants of 11-mer A-box motifs and 70 variants of 11-mer B-box motifs, position weight matrices (PWMs) were constructed. For each position within a motif, the nucleotide frequency percentages were calculated and subsequently transformed into log-odds values relative to a uniform background model, according to the following formula:
$${\mathrm W}_{i,b}=\log_2\left(f_{\left(i\mathit,b\right)}/p_{\mathit b}\right)$$
where:
*
f(i,b) denoted the frequency of nucleotide b at position i in the aligned motif sequences,
*
pb represented the background frequency of nucleotide b (typically assumed to be 0.25 for each nucleotide under a uniform distribution).
A PWM had the structure of an L × 4, where L was the motif length and the columns corresponded to the nucleotides A, C, G, and T. Each cell wi,b contained a log-odds value that reflected the relative preference for nucleotide b at position i, compared to its random occurrence. This approach enabled the identification of positions critical for motif recognition by transcription factors. The analyses were performed in Python 3.12 (Python Software Foundation) using pandas and NumPy [38].
Identification of A-box-like and B-box-like motifs in BLV miRNA-encoding locus
To identify putative type 2 A-box and B-box promoter motifs, a total of 452 available sequences, each 554 nucleotides in length and corresponding to the BLV miRNA cluster, were retrieved from GenBank (Supplementary Table 2), formatted as FASTA, and aligned using the Geneious Alignment module in Geneious Prime. Previously constructed PWMs were loaded into FIMO (Find Individual Motif Occurrences), a tool within the MEME Suite, to scan all the aligned sequences for occurrences of the predefined motifs [39]. FIMO used a sliding window to calculate cumulative log-odds scores for each 11-nucleotide segment (11-mer) [40]. For each hit, FIMO reported the sequence position, motif match, score and p-value, enabling identification of the most consistent A-box and B-box motif instances.
The total score for an 11-mer s = s₁s₂…s₁₁ was computed as:
$$\mathrm S\left(\mathrm s\right)=\sum\;\log_2\;\left[\mathrm P\_\mathrm{PW}\mathrm M\left(\left.{\mathrm S}_{\mathrm i}\;\right|\;\mathrm i\right)/\mathrm P\_\mathrm{bg}\left({\mathrm S}_{\mathrm i}\right)\right]$$
where:
*
- si: nucleotide at position i;
*
- P_PWM(si | i): observed probability of si at position i in the motif model;
*
- P_bg(si) = 0.25: background probability under a uniform distribution.
Segments with S(s) > 0 were considered motif-like, reflecting better agreement with the PWM model than with the background. Those with statistically significant scores (P-value < 0.05) were then designated as candidate motif matches.
Analysis of sequence conservation and variability in defined BLV A-box-like and B-box-like promoters
To assess sequence variability within the identified A-box and B-box motifs, Shannon entropy was calculated at each position in the multiple sequence alignment. Entropy values were computed based on nucleotide frequencies using the formula:
$$\mathrm H=-\sum\;{\mathrm p}_{\mathrm i}\;\log_2\left({\mathrm p}_{\mathrm i}\right)$$
where pi denotes the frequency of nucleotide i at a given position.
The resulting entropy profile provided a quantitative measure of positional variability within the identified promoter motifs. Conservation was estimated as the proportion of sequences sharing the most frequent nucleotide at each alignment position, according to the formula:
$$C_i=n_{max,i}/N$$
where Ci is the conservation at position i, nmax,i is the number of sequences with the most frequent nucleotide at that position, and N is the total number of sequences in the alignment. Highly conserved regions were defined as those with Ci ≥ 0.9 across contiguous positions. Calculations were performed using Biopython library [41].
To provide a robust estimate of conservation and to account for potential sampling variation, bootstrapping was used to calculate 95% confidence intervals for the mean conservation score of each motif. This method was used because the analyzed motifs varied in both number and length. In our dataset, the complete BLV genome was represented by 8,714 base pairs; the TATA-box by a single 7-nucleotide motif; the CCAAT-box by a 6-nucleotide motif; and multiple A- and B-boxes, each typically 11 nucleotides in length, were included in the analysis. The use of bootstrapping allowed for a more reliable comparison of conservation despite these differences.
Results
A universal consensus and positional log-odds matrices for A-box and B-box motifs across diverse organisms
PWM analysis of 184 A-box motifs, each 11 nucleotides in length, revealed distinct conservation patterns at individual positions (Table 2).
[IMAGE OMITTED: SEE PDF]
Position 1 was invariantly thymine (T, 87.3%), reflecting its critical role in protein–DNA recognition. Position 2 showed a purine preference, with adenine present in 63.9% and guanine in 35.3% of sequences (IUPAC code: R). Position 3 was dominated by guanine (97.8%), whereas position 4 displayed a strong pyrimidine preference (Y, 67.7%). Position 5 showed near-equal representation of adenine and thymine (W, 66.5%). Cytosine predominated in position 6 (62.7%), and position 7 was almost exclusively adenine (81.7%). Position 8 combined guanine and pyrimidines (G/Y 65.1%/20.1%), and position 9 favored pyrimidines (Y, 73.0%). Position 10 was balanced between guanine and thymine (K), and position 11 was highly conserved as guanine (75.7%). The highest log-odds values were observed for guanine at position 3 (1.96), thymine at position 1 (1.80), and adenine at position 7 (1.70), indicating their strongest enrichment relative to a uniform background model. From these frequency distributions and corresponding log‐odds scores, a unified IUPAC consensus was derived: TAGYWCAGYKG (Fig. 2A).
[IMAGE OMITTED: SEE PDF]
(A) Sequence logo of the 11-position A-box core, generated from the position weight matrix derived from 184 sequences. (B) Sequence logo of the 11-position B-box core, generated from the position weight matrix derived from 70 sequences. In both panels, the height of each letter at a given position is proportional to its information content (bits) and reflects the relative frequency of A, C, G, and T. Highly conserved positions (e.g., positions 1, 3, 10, and 11 for A-box; positions 2, 4, 5, 7, 10, and 11 for B-box) appear with larger letters, while sites exhibiting tolerated variability show smaller, stacked nucleotide symbols. Continuous stacks of nucleotides indicate degenerate positions where multiple bases contribute to > 50% of the positional signal.
When a minimum frequency threshold of 15% was applied, the following IUPAC codes were obtained for positions 1–11: TRGHDSRRHNG (Supplementary Table 3). This threshold highlighted very strong conservation at the motif termini (positions 1, 3, 11) while capturing nucleotide flexibility at central positions (4–6, 8–10).
PWM analysis of B-box motifs was performed on 70 aligned sequences of 11 nucleotides each to assess motif conservation and variability. Positions 2 (G, 99%), 4 (T, 97.5%), 5 (C, 91.9%), 7 (A, 94.6%), 10 (C, 90.4%) and 11 (C, 84.0%) exhibited almost complete invariance, defining the motif’s core (Table 3).
[IMAGE OMITTED: SEE PDF]
Positions 1 (G, 83.7%), 3 (T, 68.8%) and 6 (G, 60.8%) were moderately conserved, with the remaining bases appearing at frequencies between 16% and 29%. The greatest variability occurred at positions 8 (G, 40.1%) and 9 (T, 42.4%), indicating that substitutions at these peripheral sites are tolerated by TFIIIC and may fine-tune its binding affinity. Log-odds scores calculated against a uniform background (p₀ = 0.25) confirmed significant enrichment of dominant nucleotides, with the highest positive values for G at positions 2 (1.99), T at position 4 (1.96), A at position 7 (1.92) and C at positions 5 (1.88) and 10 (1.85). From the strict consensus (> 50%) the sequence GGTTCGAKYCC was derived (Fig. 2B).The IUPAC consensus sequences including all bases with frequency thresholds of greater than or equal to 25%, 20%, 15% and 10% were showed in Supplementary Table 4. These thresholds highlighted strong conservation at the motif termini (positions 2, 4, 5, 10) while capturing nucleotide flexibility at central positions (8–9).
Structure and localization of A-box-like motifs in BLV miRNA locus
A scan of 452 BLV miRNA cluster sequences was performed to identify A-box motifs using PWM, yielding a total of 7964 matches with positive log-odds scores and p-value < 0.05 (Supplementary Table 5). The number of matches per sequence ranged from 17 to 23 (mean: 20.54, SD: 1.38), and PWM scores spanned 0.01 to 10.82 (mean: 3.93; median: 2.86) and p-value < 0.05. The most frequent A-box-like motifs, each found in all 452 sequences, were identified at positions 10–20 nt for TRGNRCVCTGG; 18–28 nt for TGGCTTARTRR; 23–33 nt for TARTRRRRTRG; 123–133 nt for TAGCGCAGAGA; 257–267 nt for TGGTYTAGTRG; 262–272 nt for TAGTRGAARVA; 323–333 nt for TGGTRCTGRGG; 329–339 nt for TGRGGATAARR; 465–473 nt for GGARRGTTG; 474–484 nt for TGGCYCAGAGG; and 491–501 nt for TRGCTCGRRCC. Figure 3 showed the positions of these elements along with other identified A-box–like motifs. The highest PWM score (10.81) was recorded for TGGCYCAGAGG at 474–484 nt in all 452 sequences (p-value = 0.002), whereas the lowest score (0.014) corresponded to TGACRGGGGCG in three sequences at 281–291 nt (p-value = 0.032).
[IMAGE OMITTED: SEE PDF]
A characteristic arrangement of RNA Pol III type 2 promoters was observed, in which A-box-like elements preceded the Pol III terminator and B-box-like elements immediately followed. A-box–like elements situated at positions requisite for TFIIIC and TFIIIB recruitment and transcription initiation were categorized into subclasses A1, A2, and A3 based on two stringent metrics: maintenance of their genomic interval from the transcription start site (TSS) and accurate spatial orientation immediately upstream of the cognate B-box motif (Fig. 3). Sequence variants that satisfied only one or neither of these spatial constraints yet retained high PWM-derived affinity scores and with p-value < 0.05 were grouped under the A*-box-like subclass (Table 4).
[IMAGE OMITTED: SEE PDF]
Each row includes the motif name, coordinates, consensus IUPAC sequence, PWM score, p-value, total number of matches at that position, and distances from the TSS and the nearest B-box-like motif. Localization within or between specific BLV pre-miRNAs is noted. Canonical motifs (A1, A2, A3) were positioned within biologically relevant distances from both TSS and B-box-like elements and were considered functionally plausible. Motifs labeled as A* represent cryptic variants, potentially non-functional due to suboptimal spacing. The A** label refers to a 9-nucleotide conserved core fragment of the A-box-like motif. # consensus threshold: 100% identical bases matching all sequences. B1–B2–intergenic region between BLV-miR-B1 and BLV-miR-B2. @ PWM score e.g.: 8.323 corresponds to ∼2^8.323 ≈ 319-fold higher than random sequence.
A1-box–like elements were located at a mean distance of + 3.3 nt from the TSS (range: − 1 to + 7 nt), with the majority clustering between + 1 and + 6 nt. Their highest PWM‐derived affinity scores ranged from 1.57 to 8.32 (mean: 4.21), corresponding to 18-fold higher likelihood than a random sequence. A2‐box–like elements occurred on average at + 11.8 nt from the TSS (range: +9 to + 16 nt), most frequently at + 11 to + 12 nt, and exhibited maximal PWM scores of 6.60–10.81 (mean: 8.73), corresponding to a 426-fold increase over random. A3‐box–like elements showed a mean TSS offset of + 19.2 nt (range: +16 to + 23 nt), peaking at + 17 to + 21 nt, with maximum PWM scores between 3.51 and 7.69 (mean: 5.40), which is 42 times higher than random sequence.
Schematic analysis of distances between A-box and B-box elements revealed a highly consistent spatial arrangement within the BLV miRNA cluster (Fig. 3). In all miRNA hairpins, canonical A1–A4 and cryptic A-box motifs were positioned at characteristic intervals upstream of one or more B-box elements. In most cases, canonical A-boxes were located approximately + 34 to + 95 nt upstream of their respective B-box motifs, while cryptic A variants occurred both proximally and overlapping with B-boxes. This spatial organization was observed across all miRNA precursors (miR-B1 to miR-B5), as well as in the intergenic regions. A summary of these distances and motif arrangements is provided in Table 4. Ultimately, each BLV miR-B1 - miR-B5 precursor was found to harbor a minimum of three contiguous or overlapping A-box-like elements—most frequently the A1–A2–A3 triad—indicative of an extended RNA Pol III type 2 promoter architecture.
Structure and localization of B-box-like elements in BLV miRNA cluster
A total of 452 BLV miRNA region sequences were scanned for B-box motifs using a custom PWM, yielding 1859 matches with positive log-odds scores and p-value < 0.05. The number of matches per sequence ranged from 3 to 7 (mean: 4.11), and PWM scores spanned 0.1 to 14.36 (mean: 8.46) (Supplementary Table 6). The average distance of all motif occurrences from the TSS was + 74.9 nt (range + 34 to + 105 nt), with most hits clustering between + 65 and + 85 nt. Notably, motifs with the highest PWM scores localized within the + 65 to + 80 nt “optimal window” from the TSS, a pattern consistent with enhanced recruitment of the multi-subunit TFIIIC complex [42]. A total of ten distinct B-box–like elements, designated B1–B3 and B*, were identified (Fig. 3). Of the 452 BLV miRNA region sequences analyzed, the B2-box variant RGTTCRARYAC was the most prevalent, occurring at 74 to 84 nt in every sequence (Table 5). The second most frequent variant, B2-box RGTTCGCGCYY, consistently occupied 182 to 192 nt, while B1-box RGKTCRARTCT was detected at 375 to 385 nt across all transcripts. B1-box, located at 375–385 nt, yielded the highest PWM log-odds score of 14.36 (≈ 21 000-fold enrichment over background; p-value < 0.001), whereas the lowest score of 0.10 bits (≈ 1.07-fold over background, corresponding to only 10% of the maximum possible PWM match; p-value = 0.008) mapped to the B3 motif YRRWTAARACC at 210–220 nt. In a subset of pre-miRNAs, two B-box–like elements occurred within 30 nucleotides of each other, suggesting potential cooperative or redundant elements that may enhance Pol III initiation.
[IMAGE OMITTED: SEE PDF]
Determination of a universal BLV-type A-box motif with positional conservation across various frequency thresholds
Analysis of the BLV A1–A3-box PWM revealed strong conservation at the N-terminal positions of the motif: thymine dominated position 1 (82.8%), and guanine was prevalent at positions 2 (62.5%) and 3 (72.4%) (Table 6).
[IMAGE OMITTED: SEE PDF]
Position 4 exhibited balanced pyrimidine usage (C 29.7%; T 35.9%), while positions 5 and 6 showed purine/pyrimidine ambiguity (A 28.1%; G 40.6%/C 25.0%; G 37.5%). The downstream core (positions 7–11) returned to purine bias or A/T degeneracy, most notably position 9 (A 31.3%; T 46.9%). The derived IUPAC consensus motif, TGGTGGAGTGG, captures the conserved architecture of BLV Pol III canonical A-box elements (Fig. 4A).
[IMAGE OMITTED: SEE PDF]
(A) Sequence logo of the 11-position A1–A3 core, generated from the PWM of 184 aligned sequences. (B) Sequence logo of the 11-position B-box core, generated from the PWM of 70 aligned sequences. In both panels, the height of each letter at a given position is proportional to its information content (bits) and reflects the relative frequency of A, C, G, and T. Prominent single-letter stacks indicate highly conserved nucleotides (e.g., positions 1–3 for A1–A3 box; G at positions 1 and 2, T at 3 and 4, C at 5, etc. for B-box), while mixed- or multi-letter stacks reflect tolerated sequence ambiguity or degeneracy, where multiple bases contribute substantially to the positional signal.
By systematically delineating conserved versus variable positions, we applied multiple frequency thresholds (≥ 25%, ≥ 20%, ≥ 15%, ≥ 10%). A higher cutoff (≥ 25%) highlights strictly conserved “core” nucleotides essential for promoter function, ensuring high specificity in motif detection, whereas progressively lower cutoffs capture rarer variants, thus illustrating peripheral degeneracy and helping to balance sensitivity (identifying all potential motif occurrences) and specificity (minimizing spurious PWM matches). At the stringent ≥ 25% cutoff, only four positions were strictly monomorphic (1 = T, 3 = G, 10 = G, 11 = G), while the other seven each contained at least two nucleotides (A/G, C/T, or A/T; see Table 7).
[IMAGE OMITTED: SEE PDF]
Relaxing the threshold to ≥ 20% reduced monomorphism to three core positions (1, 3, 10), introduced tri-allelic variation at position 4 (A/C/T), while the other sites remained bi-allelic. At ≥ 15%, only position 1 was monomorphic, position 6 became fully degenerate (A/C/G/T), and tri-allelic variation appeared at positions 4, 5, 9 and 10, while most sites were bi-allelic. Even at the most permissive ≥ 10% level, no site was strictly monomorphic—several flanking positions (e.g., 1 [Y], 2 [R], 3[R] and 8 [R]) remained limited to defined IUPAC codes, indicating a conserved core with flexible peripheral nucleotides (Table 7).
To sum up, the A-box motif in eukaryotic tRNA genes- classically defined by the consensus TRGYNNARNNG (R = A/G, Y = C/T, N = any nucleotide) [17, 18], with some variability depending on tRNA type and lineage [27]. Analysis of A1-A3-box consensus as TGGTGGAGTGG. At a ≥ 25% frequency threshold, the consensus was TRGYRSRRWGG, while the overall consensus for all detected motifs (A1–A4, A*, A**) was TRRYDSRRWSG. These sequences closely resembled the canonical eukaryotic A-box motif but showed substantial central degeneracy, with the most highly conserved nucleotides at positions 1, 3, and 10–11. More variable positions likely permit sequence flexibility. Comparison with Burke et al. ‘motifs partial overlapped between previously reported consensus sequences (e.g., TGRNNNNNNGR, YRRNNNDRHDV) and those identified here [7]. Supplementary Fig. 7 illustrates this overlap.
Determination of a universal BLV-type B-box motif with positional conservation across various frequency thresholds
Analysis of the BLV B-box PWM revealed strong conservation at the 5′ terminus: guanine dominated position 1 (65.5%) and position 2 (71.7%), while position 3 displayed a modest thymine bias (55.0%) (Table 8).
[IMAGE OMITTED: SEE PDF]
Position 5 was highly enriched for cytosine (66.7%), and position 11 exhibited maximal conservation with cytosine (85.0%). Intermediate positions showed defined ambiguity: position 6 reflected purine degeneracy (A 41.7%; G 36.7%), and position 9 captured pyrimidine variability (C 30.0%; T 40.0%). The result in consensus motif, GGTTCAAGTCC, encapsulated the conserved core architecture of BLV Pol III type 2 B-box elements (Fig. 4B).
At a stringent ≥ 25% cutoff, only three positions were strictly monomorphic—2 (G), 5 (C) and 11 (C)—while all other were at least bi-allellic (e.g., 1 = A/G; 4 = C/T; 7 = A/C) (Table 9).
[IMAGE OMITTED: SEE PDF]
At ≥ 20%, core-site variability changed little; position 3 became tri-allelic (A/G/T), position 5 was C/T, and positions 1 and 7 remained bi-allelic (R, M). At the ≥ 15% cutoff, no positions remained strictly monomorphic; tri‐allelic variation appeared at positions 3 (A/G/T), 7 (A/C/G) and 9 (A/C/T), although all sites still matched IUPAC codes. At the most permissive ≥ 10% threshold, several flanking sites (e.g., 1, 2, 4, 5, 8, 11) remained bi‐allelic, while tri‐allelic sets expanded to positions 3, 6, 7, 9 and 10.
Analysis of conservation and variability of A-box-like and B-box-like sequences
To assess the conservation of the putative A-box and B-box promoter sequences, we analyzed the multiple sequence alignment of the miRNA coding cluster spanning positions 1 to 554 and divided it into conserved and variable regions. Conserved regions were defined as contiguous columns in which the most frequent nucleotide appeared in ≥ 90% of sequences. All other columns (conservation < 0.9) were classified as variable regions (Fig. 5).
[IMAGE OMITTED: SEE PDF]
The analysis identified 14 conserved regions that together cover over 80% of the miRNA locus (Table 10).
[IMAGE OMITTED: SEE PDF]
These segments range in length from 2 nt (positions 14–15) up to 142 nt (173–314) and uniformly pass the ≥ 0.90 mean-conservation threshold. The three longest conserved intervals—173–314 (142 nt), 89–171 (83 nt) and 393–461 (69 nt)—each harbor multiple A-box and B-box–like motifs. With the exception of the 68–81 nt window (mean Shannon entropy ≈ 0.105 bits), all conserved regions exhibit very low mean entropy (≤ 0.063 bits), underscoring strong purifying selection to maintain critical TFIIIC-binding nucleotides (Fig. 6A, B). In contrast to this conserved backdrop, thirteen singleton positions were classified as variable, with entropy values that exceeded 1 bit at positions 13 and 88, which indicated localized sequence flexibility. Of the 13 variable single-nucleotide sites, ten coincide with annotated Pol III control elements: positions 13 and 16 each fall within overlapping A1- and A2-box–like motifs, position 315 within an A1-box–like motif, positions 82 and 342 within B2- and B1-box–like motifs respectively, and position 550 within a B1-box–like motif (Table 10). Cryptic A*- and B*-box–like elements also harbor SNPs at positions 67, 88, 172 and 498 (with positions 82 and 550 overlapping both canonical and cryptic boxes), while three variable sites (348, 392 and 462) lie outside any known motif.
[IMAGE OMITTED: SEE PDF]
Finally, the conservation of the A-box and B-box motifs (Pol III), as well as the TATA-box and CCAAT-box motifs (Pol II), was compared across 452 BLV strains. All four motifs were found to be highly conserved, but to differing degrees. The mean conservation score for the A-box (Pol III) was 0.994 (SD = 0.027, 95% CI: 0.988–0.997), and for the B-box (Pol III) it was 0.984 (SD = 0.039, 95% CI: 0.975–0.991). The CCAAT-box showed a mean conservation score of 0.992 (SD = 0.019, 95% CI: 0.977–1.000). In comparison, the mean conservation score for the TATA-box (Pol II) was 0.962, with a standard deviation (SD) of 0.091 and a 95% confidence interval (CI) of 0.893–0.998. These results indicate that the A- and B-boxes, associated with RNA polymerase III promoters, as well as the CCAAT-box, are more highly conserved than the TATA-box motif recognized by RNA polymerase II in the BLV LTR.
Detailed scrutiny of variable alignment positions relative to canonical A-box–like and B-box–like motifs revealed distinct conservation profiles for each element. Within A-box–like motifs, positions 1, 2, 3 and 8 were under strong purifying selection and remained nearly invariant, whereas positions 4, 5, 6 and 9 tolerated substitutions and exhibited elevated sequence variability. In contrast, B-box–like motifs showed high conservation at positions 1, 2, 4, 5, 8 and 11, yet positions 3, 6, 7, 9 and 10 displayed comparatively greater diversity. This pattern underscored a conserved core of critical nucleotides required for TFIIIC binding. Mapping the variable sites onto the predefined A- and B-box–like motifs revealed the following: position 13 corresponded to nucleotide 7 of A1 (TGRTRGNRCVC) and nucleotide 4 of A2 (TRGNRCVCTGG), both of which preserved the invariant A-box core (positions 1–3) but tolerated downstream diversity. Position 16 fell at nucleotide 10 of A1 and nucleotide 7 of A2, again within a variable segment of the A-box. Position 67 aligned with nucleotide 2 of the A* variant (YDGBRCHRRGT), demonstrating variability despite overall A-box conservation at its N-terminus. At position 82, the variable site overlapped nucleotide 6 of A* (TCRARYACVGC), nucleotide 4 of A3 (RARYACVGCYY) and nucleotide 9 of B2 (RGTTCRARYAC), intersecting both a flexible A-box region and a core B-box residue. Position 88 mapped to nucleotide 10 of A3—a less conserved locus—whereas position 171 coincided with nucleotide 3 of another A* variant (YDRCYGRTYVR), reflecting core A-box stability. Further downstream, position 313 corresponded to nucleotide 7 of A1 (YRNGCGRGAGGC), within a variable interval, and position 340 fell at nucleotide 6 of B1 (ARRYDHRGYCC), outside the highly conserved B-box core. Finally, position 496 affected nucleotide 8 of A* (TRGCTCGRRCC) and nucleotide 2 of B* (GRRCCVCAACC), while position 548 overlapped nucleotide 9 of an A* variant (GGGTTCWRRRC) and nucleotide 8 of the B1 motif (GGTTCWRRRCC), both of which lay in peripheral, variable positions of their respective elements.
Discussion
Here, we analyzed the type 2 RNA Pol III promoter motifs in the BLV miRNA cluster, revealing a degenerate, suboptimal promoter architecture with three overlapping internal elements. BLV has an atypical Pol III promoter configuration, with three overlapping intragenic A-box–like elements located immediately downstream of the transcription start site (~ + 3 to + 19 nt) and one (occasionally two) B-box–like motifs around + 65–85 nt of TSS (the first B-box lies just beyond the Pol III termination signal). The A–B-box spacing in BLV (~ 34–95 nt) is much larger than the ~ 30–40 nt distance typical of canonical type 2 promoters (e.g., in tRNA genes) [43]. Even A–B-box separations up to ~ 365 bp are compatible with transcriptional activity [44]. Consistent with this tolerance, Burke et al. found that despite the extended A–B distance and an intercalated terminator, TFIIIC can still bind both the A-box and B-box in BLV, enabling TFIIIB recruitment and accurate Pol III initiation [7, 9]. A similar phenomenon is seen in certain SINE –type retrotransposons, where an additional B-box ~ 80 nt away from the A-box remains functional [42]. Although insertion of a terminator sequence between the A-box and B-box increased their linear spacing, the three-dimensional DNA conformation and TFIIIC’s ability to bridge these elements appeared to permit continued Pol III transcription initiation [25]. Moreover, mutational analyses showed that an intact B-box is critical for BLV miRNA transcription: B-box mutations abolished miRNA expression, thus confirming the essential role of the Pol III promoter in driving the viral miRNA cluster.
Triple A-box promoter arrangements have also been documented in other systems. Orioli et al. reported that ~ 10% of human and mouse tRNA genes contain triplicated A-box motifs coupled with weak terminators, forming a module that extends transcripts beyond the usual termination point [43, 45]. Likewise, a triple A-box internal promoter exists in the tRNA–miRNA-encoded RNA (TMER) genes of murine gammaherpesvirus 68 (MuHV-4), each of which uses a multi–A-box configuration to drive Pol III transcription of two downstream miRNA hairpins [15, 46]. Disrupting any single A-box in the MuHV-4 TMER cluster eliminates transcription and miRNA production, demonstrating that multiple A-box copies are essential for efficient Pol III transcription of the precursor RNA [15, 46]. This strategy contrasts with classical viral Pol III promoters (e.g., adenovirus VA RNA) that use only a single A- and B-box; evidently, a multi–A-box arrangement is required in these cases to achieve full transcriptional output [11, 15, 47].
Our findings shed light on how this unconventional promoter architecture may influence BLV miRNA expression and pathogenesis. BLV encodes at least five pre-miRNAs that are abundantly transcribed by Pol III in infected B cells despite minimal viral protein expression [3, 8]. These miRNAs are major factors in maintaining latency and driving B-cell transformation [10, 48]. The presence of three A-box motifs likely boosts transcription initiation by providing multiple TFIIIC binding sites, thereby enhancing TFIIIB/Pol III recruitment and offering redundancy if one site is impaired [20]. This can result in abundant pre-miRNA transcription and consequently high levels of mature miRNAs that modulate host gene expression [10]. For example, Frie et al. demonstrated that BLV-infected cows exhibit reduced expression of B-cell transcriptional regulators, such as BLIMP1 and BCL6, which correlates with high levels of BLV miRNAs [49]. Another example is BLV-miR-B4, which mimics the host miR-29, thereby suppressing tumor suppressor gene expression and promoting proliferation of infected clones [5, 50]. BLV has likely optimized its promoters to produce sufficient miRNA for manipulating cellular pathways, while avoiding overly strong activity that could be detrimental to the virus. Variations in these promoter motifs could hypothetically underlie differences in viral replication or disease progression between BLV strains.
We identified additional cryptic A-box–like sequences (A* and A**) that overlap or lie adjacent to B-box motifs in the miRNAs locus, suggesting an ultra-compact promoter module. This arrangement could increase TFIIIC binding site density and stabilize TFIIIB/Pol III assembly. Such a promoter configuration could enable the virus to flexibly modulate miRNA expression under varying cellular conditions, ranging from basal latency to stress-induced activation. This observation underscores the need for experimental validation. TFIIIC binding assays and reporter gene experiments should be conducted to confirm that the internal cryptic A-box and external cryptic B-box motifs in the miRNA cluster are indeed functional elements. Notably, these cryptic motifs were found within the miRNA-coding region, overlapping predicted hairpin structures. In contrast, 105 A-box–like sequences detected elsewhere in the ~ 8.7 kb BLV genome did not overlap hairpin structures according to RNAfold analysis and may represent nonfunctional background matches (Supplementary Fig. 8A). Nonetheless, further analysis revealed that 35 of the scattered A-box-like motifs outside the miRNA cluster were accompanied by appropriately spaced B-box-like sequences and flanking termination signals, suggesting the presence of other potential Pol III genes (Supplementary Fig. 8B). Thus, our results provide a foundation for future research on the roles of these elements, and understanding both the canonical miRNA loci and the incidental loci—particularly in transformed cells—may help optimize BLV-based gene transfer systems.
The canonical eukaryotic A-box consensus is TRGYNNARNNG, with lineage-specific tRNA variants noted [17, 18]. BLV’s A-box-like motifs generally conform to this pattern but display substantial central degeneracy [27]. In BLV sequences, positions 1, 3, and 10–11 of the A-box are highly conserved (likely crucial for TFIIIC binding), whereas other positions vary. This is consistent with Burke et al.’s earlier predictions for BLV A-box motifs, indicating that the BLV A-box tolerates considerable sequence flexibility as long as key terminal nucleotides are preserved [51]. Notably, BLV A-box sequences have even been exploited in synthetic systems: Burke et al. used a BLV-like A-box element–YRRHNNNNNNN–to construct a compact Pol III–driven shRNA expression cassette, highlighting the functional potential of this viral promoter sequence [52].
Consensus B-box sequences (e.g., RGTTCRANTCC or GGTTCGANNCC) are found across eukaryotes and resemble the human tRNA B-box motif –GGTTCRANYCY [18, 53]. The core sequence GWTCRANNC is critical for TFIIIC binding (especially the terminal C), while allowing minor variations at W and R positions among species [54, 55]. However, BLV B-box–like motifs show greater variability. Earlier analyses of the BLV genome proposed several B-box–like candidates (e.g., GGTTSGNG, GKWCAAGTC, GTTCNANNC), but none were present in all viral isolates [4]. In our dataset, the predominant BLV B-box consensus was GGTTCAAGTCC, aligning closely with the classical motif but degenerate at several positions. At lower stringency, this consensus broadened to RGWYCRMRHYC, reflecting substantial flexibility. Notably, positions 2, 5, and 11 remain nearly invariant across BLV sequences, whereas positions 3, 6, 7, and 9 tolerate mutation—implying these latter sites are less critical for TFIIIC recognition.
BLV appears to have evolved promoter sequences that recruit Pol III efficiently while fitting within the virus’s limited genome. Using Pol III for miRNA transcription also offers an advantage: Pol III–derived noncoding RNAs are mostly “invisible” to host immune sensors, facilitating latent infection [56]. Moreover, a moderate level of promoter activity may be optimal for BLV: an overly strong promoter could produce excess double-stranded RNA and trigger antiviral defenses, whereas a too-weak promoter would yield insufficient miRNA for host manipulation. The degenerate, suboptimal motifs in BLV likely represent a compromise that balances transcriptional efficiency with stealth, ensuring enough miRNA production to influence the host without activating innate defenses or other host immune responses [57].
An analogous strategy is seen in the BLV long terminal repeat (LTR) promoter controlled by Pol II. Each of the three 21-bp Tax-responsive elements in the BLV LTR contains a suboptimal cAMP response element (CRE) that dampens transcriptional activation [58]. Reverting these variant CREs (AGACGTCA, TGACGGCA, TGACCTCA) to the optimal consensus “TGACGTCA” dramatically increases viral transcription and proviral load, confirming that BLV naturally uses suboptimal CRE sequences to restrain its replication. This example underscores how BLV deliberately balances its promoter strengths to fine-tune virus production and persistence [58].
Despite their positional degeneracy compared to universal eukaryotic promoters, the sequences of the A-box and B-box motifs remained highly conserved among different BLV strains, indicating their critical functional roles, as previously suggested by Kincaid et al. [26]. TATA- and CCAAT-box sequences, that are found in most LTRs—and in most other viral and cellular genes—at relatively fixed positions upstream of the cap site. Their presence and relative location are indispensable for viral expression, and they are often strongly conserved even among otherwise highly divergent LTRs [59]. These elements cannot be deleted, inverted, or relocated without greatly reducing viral expression. In this study, the A-box, B-box, and CCAAT-box motifs were all found to be highly conserved across BLV strains. In comparison, the TATA-box, although also highly conserved, showed slightly greater sequence variability. These results suggest that Pol III promoter elements (A-box and B-box), as well as the CCAAT-box, are under strong evolutionary constraint, likely reflecting their essential roles in viral transcription.
Our analyses of promoter sequence entropy confirmed that majority of motif positions were highly conserved (low entropy), indicating their critical role in TFIIIC recognition and intolerance to substitutions. In contrast, positions exhibiting high variability (high entropy) likely reflected neutral mutations or permissible variants tolerated by the virus. The presence of single nucleotide polymorphisms (SNPs) in the A-box and B-box motifs may reflect long-term evolutionary selective pressures on the viral genome, as the analyzed sequences were derived from ten genotypes sampled worldwide over four decades. The localization of these SNPs may also suggest modulation of Pol III promoter function. We speculate that high conservation of positions 1–3 and 8 in the A-box and 2, 5, and 11 in the B-box may reflect fundamental interactions with transcription factors, while greater variability at other positions might allow for subtle tuning of promoter strength depending on the cellular context. Burke et al. introduced point mutations into the A-box and B-box sequences within the BLV-miR-B1 and miR-B4 regions and analyzed pri- and mature miRNA expression using Northern blotting [7]. They demonstrated that mutations corresponding in location to the highly conserved positions 2 and 3 of the A3-box-like motif and positions 2 and 4 of the B2-box-like motif in miR-B1, as well as positions 2 and 4 of the B1-box-like motif in miR-B4, resulted in maximal reduction of miRNA expression. In contrast, mutations at positions corresponding to the more variable sites—such as positions 4, 5, and 9 in the A3-box-like motif of miR-B4—only slightly decreased B4 expression. These findings strongly confirm the functional importance of specific positions within these motifs for the initiation of RNA Pol III transcription. Nevertheless, further studies are needed to clarify the individual roles of, in particular, the A1/A2/A3-boxes and B1/B2-boxes—for example, in promoting RNAP III transcription initiation, influencing the RNAP III transcription start site, and/or affecting pri-miRNA processing or pre-miRNA production. The observation that the majority of the variable positions corresponded to A- and B-box motifs suggests that some of these mutations may be linked to viral pathogenicity.
In our comparative analysis conducted in 2020 in the North Caucasus region, we found that particular variants in the BLV putative A- and B-box elements were enriched in cattle with high persistent lymphocytosis and high proviral load, but were absent in mild cases [60]. These included substitutions in conserved positions, such as G211A at position 2 and T213A at position 4 of the B3-box in miR-B2; A311del at position 2 of the A1-box in miR-B4; and a two-nucleotide change, GA498/9AG, at positions 2 and 3 of the B*-box in miR-B5, among others. Notably, we observed two substitutions in variable positions—A421G at position 6 of the B2-box in miR-B4 and A467G at position 4 of the A1**-box in miR-B5—that were found exclusively in low-lymphocytosis isolates. The fact that strains carrying these promoter-region mutations correlate with increased B-cell proliferation in vivo suggests that even subtle changes in Pol III promoter sequence can have functional consequences. One hypothesis is that these mutations alter the strength of Pol III binding or the stability of the pre-miRNA transcripts, thereby modulating the amount of viral miRNA produced. A small increase in miRNA levels could enhance the suppression of host anti-tumor pathways, tipping the balance toward persistent lymphocytosis or leukemic progression. Conversely, certain polymorphisms found only in low-pathogenicity strains might slightly impair Pol III promoter efficiency, resulting in lower miRNA output and a more benign clinical course. Our analysis also points to practical applications and potential avenues for intervention. If certain motif variants enhance viral miRNA expression, they could serve as molecular markers for more pathogenic BLV strains. This would be particularly useful in veterinary surveillance to identify herds at higher risk of developing high proviral load, marked lymphocytosis and tumors in infected cattle.
Another factor shaping BLV miRNA promoter sequences likely involved overlapping sequence functions. A-box motifs are located within or adjacent to miRNA hairpin sequences, requiring reconciliation between the structural requirements for stable pre-miRNA folding and seed-sequence preservation and the constraints imposed by promoter sequences. Our findings indicated that in BLV promoters, the three A-box motifs partially overlapped with the 3’ arms of hairpin structures. Such redundancy might reflect evolutionary integration of promoter elements within miRNA sequences [20]. Conserved miRNA seed regions (nucleotides 2–8 responsible for target recognition) remained intact despite promoter motif overlaps, underscoring strong selective pressure to preserve miRNA functionality. Surrounding nucleotides likely evolved to simultaneously serve as TFIIIC recognition sites [44]. In conclusion, degenerate BLV promoter motifs were likely shaped by dual pressures to maintain miRNA function and facilitate their transcription by Pol III. These adaptations potentially formed part of the viral strategy, allowing regulated gene expression while circumventing host defense mechanisms [61, 62]. Our data demonstrated how BLV optimized its Pol III promoter by balancing transcriptional efficiency with sequence uniqueness.
It should be emphasized that the primarily bioinformatic and comparative nature of our analyses is a limitation for functional interpretation. The identification of degenerate A-box and B-box motifs relied on sequence-based predictions (PWMs, FIMO, comparative analyses), without direct experimental validation of which motifs are truly used by Pol III during infection. Techniques such as TFIIIC/RPC6 ChIP-seq or chromatin footprinting could pinpoint the true binding sites. Similarly, the functional impact of promoter SNPs remains to be tested; while we observed correlations with disease state, causality could be addressed by introducing specific mutations into BLV molecular clones and assessing their effects on miRNA expression and pathogenicity in animal models. Conversely, engineering a BLV mutant with a perfect consensus A-box and B-box could reveal whether optimized promoters enhance miRNA transcription and accelerate leukemogenesis in animal models. Additional experiments, such as replacing the viral Pol III promoter with a cellular tRNA promoter, could help reveal virus-specific adaptations. While deep sequencing of leukemic bovine B cells has confirmed consistently high BLV miRNA expression [10], it would be informative to compare asymptomatic carriers with leukemic animals or to stratify expression by viral genotype [63]. Longitudinal samples collected before and after the onset of lymphocytosis could reveal whether increasing miRNA levels and mutational patterns predict disease progression. Nonetheless, our results provided a solid foundation for formulating hypotheses and designing subsequent experiments. This study underscores the importance of integrating sequence analysis with functional context and shows that even a conserved transcription mechanism like Pol III can be creatively adapted by viruses, with major implications for infection biology and disease progression in bovids.
Conclusion
In summary, this study provides a detailed picture of BLV type 2 Pol III promoters that drive viral miRNA expression and highlights their role as a molecular linchpin in maintaining latency and promoting oncogenesis. We demonstrated that BLV has evolved overlapping, ultra-compact, and degenerate A-box and B-box motifs to maximize Pol III recruitment within its compact genome—a strategy that ensures abundant miRNA production without requiring canonical gene expression. These findings support the concept that motif degeneracy and redundancy may be a deliberate viral strategy for maintaining miRNA transcription under strong evolutionary constraints. Pol III promoters likely enable BLV to manipulate host cell fate and effectively evade immune surveillance. Further elucidation of the mechanisms governing these promoters opens new possibilities for intervention. Targeted disruption of these elements or their miRNAs could provide the basis for novel anti-BLV strategies to prevent infection and block leukemia progression in infected animals.
Data availability
The datasets used during the current study 32,990 sequences from 70 organisms from Dfam (https://dfam.org), GtRNAdb (http://gtrnadb.ucsc.edu), GenBank (https://www.ncbi.nlm.nih.gov), TriTrypDB (https://tritrypdb.org) and Rfam (https://rfam.org) are available in Supplementary Tables 7 and 452 sequences from bovine leukemia virus isolates derived from GenBank (https://www.ncbi.nlm.nih.gov) are available in Supplementary Table 2.
Gillet N, Florins A, Boxus M, Burteau C, Nigro A, Vandermeers F, et al. Mechanisms of leukemogenesis induced by bovine leukemia virus: prospects for novel anti-retroviral therapies in human. Retrovirology. 2007;4(1):18. https://doi.org/10.1186/1742-4690-4-18.
Aida Y, Murakami H, Takagashi M, Takeshima S-n. Mechanisms of pathogenesis induced by bovine leukemia virus as a model for human T-cell leukemia virus. Front Microbiol. 2013;4–2013. https://doi.org/10.3389/fmicb.2013.00328.
Safari R, Hamaidia M, de Brogniez A, Gillet N, Willems L. Cis-drivers and trans-drivers of bovine leukemia virus oncogenesis. Curr Opin Virol. 2017;26:15–9. https://doi.org/10.1016/j.coviro.2017.06.012.
Kincaid RP, Burke JM, Sullivan CS. RNA virus microRNA that mimics a B-cell oncomiR. Proceedings of the National Academy of Sciences. 2012;109(8):3077-82. https://doi.org/10.1073/pnas.1116107109.
Cullen BR. MicroRNA expression by an oncogenic retrovirus. Proc Natl Acad Sci U S A. 2012;109(8):2695–6. https://doi.org/10.1073/pnas.1200328109. Epub 2012/02/07.
Zhuo Y, Gao G, Shi J, Zhou X, Wang X, miRNAs. Biogenesis, origin and evolution, functions on Virus-Host interaction. Cell Physiol Biochem. 2013;32(3):499–510. https://doi.org/10.1159/000354455.
Burke JM, Bass CR, Kincaid RP, Sullivan CS. Identification of tri-phosphatase activity in the biogenesis of retroviral MicroRNAs and RNAP III-generated ShRNAs. Nucleic Acids Res. 2014;42(22):13949–62. https://doi.org/10.1093/nar/gku1247. Epub 2014/11/28.
Safari R, Jacques J-R, Brostaux Y, Willems L. Ablation of non-coding RNAs affects bovine leukemia virus B lymphocyte proliferation and abrogates oncogenesis. PLoS Pathog. 2020;16(5):e1008502. https://doi.org/10.1371/journal.ppat.1008502.
Van Driessche B, Rodari A, Delacourt N, Fauquenoy S, Vanhulle C, Burny A, et al. Characterization of new RNA polymerase III and RNA polymerase II transcriptional promoters in the bovine leukemia virus genome. Sci Rep. 2016;6(1):31125. https://doi.org/10.1038/srep31125.
Rosewick N, Momont M, Durkin K, Takeda H, Caiment F, Cleuter Y, et al. Deep sequencing reveals abundant noncanonical retroviral MicroRNAs in B-cell leukemia/lymphoma. Proc Natl Acad Sci. 2013;110(6):2306–11. https://doi.org/10.1073/pnas.1213842110.
Schramm L, Hernandez N. Recruitment of RNA polymerase III to its target promoters. Genes Dev. 2002;16(20):2593–620. https://doi.org/10.1101/gad.1018902.
Shekhar AC, Wu W-J, Chen H-T. Mutational and biophysical analyses reveal a TFIIIC binding region in the TFIIF-related Rpc53 subunit of RNA polymerase III. J Biol Chem. 2023;299(7):104859. https://doi.org/10.1016/j.jbc.2023.104859.
Vassetzky NS, Kramerov DA. SINEBase: a database and tool for SINE analysis. Nucleic Acids Res. 2012;41(D1):D83–9. https://doi.org/10.1093/nar/gks1263.
Kor SD, Chowdhury N, Keot AK, Yogendra K, Chikkaputtaiah C, Sudhakar Reddy P. RNA pol III promoters—key players in precisely targeted plant genome editing. Front Genet. 2023;13–2022. https://doi.org/10.3389/fgene.2022.989199.
Diebel KW, Claypool DJ, van Dyk LF. A conserved RNA polymerase III promoter required for gammaherpesvirus TMER transcription and MicroRNA processing. Gene. 2014;544(1):8–18. https://doi.org/10.1016/j.gene.2014.04.026. Epub 2014/04/22.
Tatosyan KA, Stasenko DV, Koval AP, Gogolevskaya IK, Kramerov DA. TATA-Like Boxes in RNA Polymerase III Promoters: Requirements for Nucleotide Sequences. Int J Mol Sci. 2020;21(10). Epub 2020/05/30. https://doi.org/10.3390/ijms21103706. PubMed PMID: 32466110; PubMed Central PMCID: PMCPMC7279448.
Frendewey D, Barta I, Gillespie M, Potashkin J. Schizosaccharomyces U6 genes have a sequence within their introns that matches the B box consensus of tRNA internal promoters. Nucleic Acids Res. 1990;18(8):2025–32. https://doi.org/10.1093/nar/18.8.2025. Epub 1990/04/25.
Sharp S, DeFranco D, Dingermann T, Farrell P, Söll D. Internal control regions for transcription of eukaryotic tRNA genes. Proc Natl Acad Sci USA. 1981;78(11):6657–61.
Arimbasseri AG, Maraia RJ. A high density of cis-information terminates RNA polymerase III on a 2-rail track. RNA Biol. 2016;13(2):166–71. https://doi.org/10.1080/15476286.2015.1116677.
Sizer RE, Chahid N, Butterfield SP, Donze D, Bryant NJ, White RJ. TFIIIC-based chromatin insulators through eukaryotic evolution. Gene. 2022;835:146533. https://doi.org/10.1016/j.gene.2022.146533.
Chan PP, Lowe TM. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009;37(Database issue):D93-7. Epub 2008/11/06. https://doi.org/10.1093/nar/gkn787. PubMed PMID: 18984615; PubMed Central PMCID: PMCPMC2686519.
Kalvari I, Nawrocki EP, Ontiveros-Palacios N, Argasinska J, Lamkiewicz K, Marz M et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 2021;49(D1):D192-d200. Epub 2020/11/20. https://doi.org/10.1093/nar/gkaa1047. PubMed PMID: 33211869; PubMed Central PMCID: PMCPMC7779021.
Hubley R, Finn RD, Clements J, Eddy SR, Jones TA, Bao W, et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 2016;44(D1):D81–9. https://doi.org/10.1093/nar/gkv1272. Epub 2015/11/28.
Aslett M, Aurrecoechea C, Berriman M, Brestelli J, Brunk BP, Carrington M et al. TriTrypDB: a functional genomic resource for the Trypanosomatidae. Nucleic Acids Res. 2010;38(Database issue):D457-62. Epub 2009/10/22. https://doi.org/10.1093/nar/gkp851. PubMed PMID: 19843604; PubMed Central PMCID: PMCPMC2808979.
Park J-L, Yeon-Su L, Nawapol K, Seon-Young K, In-Hoo K, Lee YS. Epigenetic regulation of noncoding Rna transcription by mammalian Rna polymerase III. Epigenomics. 2017;9(2):171–87. https://doi.org/10.2217/epi-2016-0108.
Kincaid RP, Burke JM, Sullivan CS. RNA virus MicroRNA that mimics a B-cell OncomiR. Proc Natl Acad Sci U S A. 2012;109(8):3077–82. https://doi.org/10.1073/pnas.1116107109. Epub 2012/02/07.
Savina EA, Shumilina TG, Porolo VA, Lebedev GS, Orlov YL, Anashkina AA, et al. Structural features of DNA in tRNA genes and their upstream sequences. Int J Mol Sci. 2024;25(21):11758. https://doi.org/10.3390/ijms252111758. PubMed PMID:.
Nishida K, Kawasaki T, Fujie M, Usami S, Yamada T. Aminoacylation of tRNAs encoded by chlorella virus CVK2. Virology. 1999;263(1):220–9. https://doi.org/10.1006/viro.1999.9949. Epub 1999/11/02.
Conti A, Carnevali D, Bollati V, Fustinoni S, Pellegrini M, Dieci G. Identification of RNA polymerase III-transcribed Alu loci by computational screening of RNA-Seq data. Nucleic Acids Res. 2015;43(2):817–35. https://doi.org/10.1093/nar/gku1361. Epub 2015/01/01.
Zovoilis A, Cifuentes-Rojas C, Chu HP, Hernandez AJ, Lee JT. Destabilization of B2 RNA by EZH2 activates the stress response. Cell. 2016;167(7):1788–802. https://doi.org/10.1016/j.cell.2016.11.041. .e13. Epub 2016/12/17.
Ma Y, Mathews MB. Structure, function, and evolution of adenovirus-associated RNA: a phylogenetic approach. J Virol. 1996;70(8):5083–99. https://doi.org/10.1128/jvi.70.8.5083-5099.1996. Epub 1996/08/01.
Howe JG, Shu M-D. Epstein-Barr virus small RNA (EBER) genes: unique transcription units that combine RNA polymerase II and III promoter elements. Cell. 1989;57(5):825–34. https://doi.org/10.1016/0092-8674(89)90797-6.
Marschalek R, Amon-Böhm E, Stoerker J, Klages S, Fleckenstein B, Dingermann T. CMER, an RNA encoded by human cytomegalovirus is most likely transcribed by RNA polymerase III. Nucleic Acids Res. 1989;17(2):631–43. https://doi.org/10.1093/nar/17.2.631. Epub 1989/01/25.
Oliveira HP, Dos Santos ER, Harrison RL, Ribeiro BM, Ardisson-Araújo DMP. Identification and analysis of putative tRNA genes in baculovirus genomes. Virus Res. 2022;322:198949. https://doi.org/10.1016/j.virusres.2022.198949. Epub 2022/10/02.
Palmer JR, Daniels CJ. In vivo definition of an archaeal promoter. J Bacteriol. 1995;177(7):1844–9. https://doi.org/10.1128/jb.177.7.1844-1849.1995. Epub 1995/04/01.
Padilla-Mejía NE, Florencio-Martínez LE, Figueroa-Angulo EE, Manning-Cela RG, Hernández-Rivas R, Myler PJ, et al. Gene organization and sequence analyses of transfer RNA genes in trypanosomatid parasites. BMC Genomics. 2009;10(1):232. https://doi.org/10.1186/1471-2164-10-232.
Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–9. https://doi.org/10.1093/bioinformatics/bts199. Epub 2012/05/01.
Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with numpy. Nature. 2020;585(7825):357–62. https://doi.org/10.1038/s41586-020-2649-2.
Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27(7):1017–8. https://doi.org/10.1093/bioinformatics/btr064.
Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37(Web Server issue):W202-8. Epub 2009/05/22. https://doi.org/10.1093/nar/gkp335. PubMed PMID: 19458158; PubMed Central PMCID: PMCPMC2703892.
Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3. https://doi.org/10.1093/bioinformatics/btp163. Epub 2009/03/24.
Koval AP, Veniaminova NA, Kramerov DA. Additional box B of RNA polymerase III promoter in SINE B1 can be functional. Gene. 2011;487(2):113–7. https://doi.org/10.1016/j.gene.2011.08.001.
Arimbasseri AG, Keshab R, Maraia RJ. Comparative overview of RNA polymerase II and III transcription cycles, with focus on RNA polymerase III termination and reinitiation. Transcription. 2014;5(1):e27369. https://doi.org/10.4161/trns.27369.
Paule MR, White RJ. Survey and summary: transcription by RNA polymerases I and III. Nucleic Acids Res. 2000;28(6):1283–98. https://doi.org/10.1093/nar/28.6.1283. Epub 2000/02/24.
Orioli A, Pascali C, Quartararo J, Diebel KW, Praz V, Romascano D, et al. Widespread occurrence of non-canonical transcription termination by human RNA polymerase III. Nucleic Acids Res. 2011;39(13):5499–512. https://doi.org/10.1093/nar/gkr074.
Diebel KW, Smith AL, van Dyk LF. Mature and functional viral MiRNAs transcribed from novel RNA polymerase III promoters. RNA. 2010;16(1):170–85. https://doi.org/10.1261/rna.1873910. Epub 2009/12/02.
Knox AN, Mueller A, Medina EM, Clambey ET, van Dyk LF. Lytic infection with murine gammaherpesvirus 68 activates host and viral RNA polymerase III promoters and enhances noncoding RNA expression. J Virol. 2021;95(14):e0007921. https://doi.org/10.1128/jvi.00079-21. Epub 2021/04/30.
Kincaid RP, Sullivan CS. Virus-Encoded micrornas: an overview and a look to the future. PLoS Pathog. 2012;8(12):e1003018. https://doi.org/10.1371/journal.ppat.1003018.
Frie MC, Droscha CJ, Greenlick AE, Coussens PM. MicroRNAs encoded by bovine leukemia virus (BLV) are associated with reduced expression of B cell transcriptional regulators in dairy cattle naturally infected with BLV. Front Veterinary Sci. 2018;4–2017. https://doi.org/10.3389/fvets.2017.00245.
Santanam U, Zanesi N, Efanov A, Costinean S, Palamarchuk A, Hagan JP et al. Chronic lymphocytic leukemia modeled in mouse by targeted < i > miR-29 expression. Proceedings of the National Academy of Sciences. 2010;107(27):12210-5. doi: https://doi.org/10.1073/pnas.1007186107.
Burke JM, Kuny CV, Kincaid RP, Sullivan CS. Identification, validation, and characterization of noncanonical MiRNAs. Methods. 2015;91:57–68. https://doi.org/10.1016/j.ymeth.2015.07.013.
Burke JM, Kincaid RP, Aloisio F, Welch N, Sullivan CS. Expression of short hairpin RNAs using the compact architecture of retroviral MicroRNA genes. Nucleic Acids Res. 2017;45(17):e154. https://doi.org/10.1093/nar/gkx653. Epub 2017/10/04.
Oler AJ, Alla RK, Roberts DN, Wong A, Hollenhorst PC, Chandler KJ, et al. Human RNA polymerase III transcriptomes and relationships to pol II promoter chromatin and enhancer-binding factors. Nat Struct Mol Biol. 2010;17(5):620–8. https://doi.org/10.1038/nsmb.1801.
Orioli A, Pascali C, Pagano A, Teichmann M, Dieci G. RNA polymerase III transcription control elements: themes and variations. Gene. 2012;493(2):185–94. https://doi.org/10.1016/j.gene.2011.06.015.
Nagarajavel V, Iben JR, Howard BH, Maraia RJ, Clark DJ. Global ‘bootprinting’ reveals the elastic architecture of the yeast TFIIIB-TFIIIC transcription complex in vivo. Nucleic Acids Res. 2013;41(17):8135–43. https://doi.org/10.1093/nar/gkt611. Epub 2013/07/17.
Mishra R, Kumar A, Ingle H, Kumar H. The interplay between Viral-Derived MiRNAs and host immunity during infection. Front Immunol. 2020;10–2019. https://doi.org/10.3389/fimmu.2019.03079.
Gillet NA, Hamaidia M, de Brogniez A, Gutiérrez G, Renotte N, Reichert M, et al. Bovine leukemia virus small noncoding RNAs are functional elements that regulate replication and contribute to oncogenesis in vivo. PLoS Pathog. 2016;12(4):e1005588. https://doi.org/10.1371/journal.ppat.1005588.
Merezak C, Pierreux C, Adam E, Lemaigre F, Rousseau GG, Calomme C, et al. Suboptimal enhancer sequences are required for efficient bovine leukemia virus propagation in vivo: implications for viral latency. J Virol. 2001;75(15):6977–88. https://doi.org/10.1128/jvi.75. Epub 2001/07/04.
Benachenhou F, Jern P, Oja M, Sperber G, Blikstad V, Somervuo P, et al. Evolutionary conservation of orthoretroviral long terminal repeats (LTRs) and Ab initio detection of single LTRs in genomic data. PLoS ONE. 2009;4(4):e5179. https://doi.org/10.1371/journal.pone.0005179. Epub 2009/04/15.
Pluta A, Blazhko NV, Ngirande C, Joris T, Willems L, Kuźmak J. Analysis of nucleotide sequence of tax, MiRNA and LTR of bovine leukemia virus in cattle with different levels of persistent lymphocytosis in Russia. Pathogens. 2021;10(2):246. https://doi.org/10.3390/pathogens10020246. PubMed PMID:.
Treiber T, Treiber N, Meister G. Regulation of MicroRNA biogenesis and its crosstalk with other cellular pathways. Nat Rev Mol Cell Biol. 2019;20(1):5–20. https://doi.org/10.1038/s41580-018-0059-1.
Kincaid RP, Chen Y, Cox JE, Rethwilm A, Sullivan CS. Noncanonical MicroRNA (miRNA) biogenesis gives rise to retroviral mimics of lymphoproliferative and immunosuppressive host MiRNAs. mBio. 2014;5(2):e00074. https://doi.org/10.1128/mBio.00074-14. Epub 2014/04/10.
Ochiai C, Miyauchi S, Kudo Y, Naruke Y, Yoneyama S, Tomita K, et al. Characterization of MicroRNA expression in B cells derived from Japanese black cattle naturally infected with bovine leukemia virus by deep sequencing. PLoS ONE. 2021;16(9):e0256588. https://doi.org/10.1371/journal.pone.0256588.
© 2025. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.