Content area
ABSTRACT
Background
Clustered regularly interspaced short palindromic repeats —CRISPR‐associated protein 9 (CRISPR/Cas9) is a gene editing technology that can deliver highly precise genome editing. However, it is difficult to predict both on‐ and off‐target effects of CRISPR/Cas9, which is essential for ensuring the safety and efficiency of genetic modifications made using this technology.
Methods
In this study, we used the SITE‐Seq dataset, which comprises CRISPR targets, to classify sequences for both on‐ and off‐target effects. To evaluate sequence pairs, we built a feedforward neural network (FNN) with 10 fully connected layers and compared its performance with that of other state‐of‐the‐art models.
Results
We showed that our FNN model attained an accuracy rate of 0.95, greatly improving prediction reliability for both on‐ and off‐target effects compared with other methods.
Conclusion
This work contributes a valuable predictive modeling framework to the field of CRISPR research, addressing both on‐ and off‐target effects in a unified manner, which is an essential requirement for the safe and effective application of genomic editing technologies.
- CIRCLE-Seq
- circularization for in vitro reporting of cleavage effects by sequencing
- CNN
- convolutional neural network
- CRISPOR
- crispr optimal target finder
- CRISPR/Cas9
- clustered regularly interspaced short palindromic repeats—CRISPR-associated protein 9
- crRNA
- CRISPR RNA
- DNA
- deoxyribonucleic acid
- FNN
- feedforward neural network
- GUIDE-Seq
- genome-wide unbiased identification of double-stranded breaks enabled by sequencing
- LSTM
- long short-term memory
- PAM
- protospacer adjacent motif
- RNA
- ribonucleic acid
- sgRNA
- single guide RNA
- SITE-Seq
- selective integration of targeted elements sequencing
Abbreviations
Introduction
Clustered regularly interspaced short palindromic repeats —CRISPR-associated protein 9 (CRISPR/Cas9), an acronym for clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein 9 (Cas9), technology is a groundbreaking achievement in genome editing [1]. Originally discovered as an adaptive immune system in bacteria and archaea [2], CRISPR/Cas9 has been developed into a highly versatile tool for precise genetic manipulation in various organisms, including humans. This system operates with remarkable simplicity and efficiency by introducing double-strand breaks at specific genomic locations guided by RNA molecules [3].
The CRISPR/Cas9 system comprises a guide RNA (gRNA) that directs the Cas9 protein to a specific DNA sequence. gRNA comprises two elements: CRISPR RNA (crRNA), which binds to the target DNA sequence, and transactivating crRNA (tracrRNA), which interacts with the Cas9 protein [3]. Upon locating the target sequence, Cas9 generates a double-strand break, typically three nucleotides upstream of the protospacer adjacent motif (PAM) site [4]. This precision makes CRISPR/Cas9 a powerful tool for genome editing. The overall mechanism of the CRISPR system is illustrated in Figure 1.
[IMAGE OMITTED. SEE PDF]
Despite its revolutionary potential, CRISPR/Cas9 technology faces significant challenges, particularly with respect to off-target effects, and on-target precision [5]. Unintended off-target genetic alterations that cause harmful effects can compromise the safety and effectiveness of genetic modifications and are often caused by factors such as gRNA design, PAM sequence similarity, and Cas9 fidelity, whereas on-target efficiency reflects successful modification at the intended site.
Usually, the system relies on a guide RNA (gRNA) designed to match a specific DNA sequence that guides the Cas9 enzyme to create a double-strand break at the intended genomic locus. However, even within on-target sites, unintended outcomes, such as imprecise repairs through non-homologous end joining, can occur, which can result in small insertions or deletions (indels) that are not part of the original editing plan [6]. Therefore, while on-target precision offers a powerful tool for gene modification, careful management of the repair processes following Cas9 cleavage is essential for achieving the desired outcomes.
Experimental methods for detecting off-target sites can be categorized into in vitro and in vivo approaches [7]. SITE-Seq provides a valuable resource for understanding CRISPR/Cas9 activity in vitro and was used to identify 3767 active off-targets within sgRNA-DNA sequence pairs for nine guide sequences [8]. Here, we examined in vitro data, specifically the SITE-Seq dataset, to train and evaluate our feedforward neural network (FNN) model. Our approach allowed us to investigate the influence of sequence pairs on Cas9 activity in a controlled environment, thereby excluding the complexities of the cellular environment.
Previous studies have investigated the use of advanced computational approaches, including deep learning algorithms, using SITE-Seq datasets. These approaches can be used to predict off-target sites with high accuracy. This capability has enabled the development of high-fidelity gRNAs and optimization of delivery methods to increase target specificity [6, 9, 10].
Accurate identification of on-target activity is crucial for ensuring reliable genetic modifications and is influenced by various factors, including gRNA design and the cellular environment [8]. Although numerous models exist for predicting off-target effects [11], the simultaneous prediction of both on- and off-target effects using the SITE-Seq dataset remains unexplored and further exploration is needed to predict both on- and off-target effects simultaneously.
We developed predictive models using the SITE-Seq dataset to assess both on- and off-target effects. Our study is unique in applying specific sequence classification to the SITE-Seq dataset, which enables the use of the CRISPR target sequence as input to detect on- and off-target events. The overall study design is shown in Figure 2. By employing advanced deep learning techniques, we sought to increase the precision and safety of CRISPR/Cas9 applications.
[IMAGE OMITTED. SEE PDF]
Methodology
Data Description
This study aimed to predict both the off- and on-target effects of Cas9 using a benchmark dataset named SITE-Seq, which was used in a previous study [6]. This dataset is a valuable resource for experimentally validated CRISPR/Cas9 off-target sites. It comprises gRNA–DNA pairs from nine different guide sequences and contains 217,733 sample pairs. For our classification model, we focused on the on_seq and off_seq columns, where each sequence consists of 23 base pairs (bps). The on_seq sites were labeled as “1” (on-target), and the off_seq sites as “0” (off-target). The dataset is publicly available on GitHub ().
Experimental Configuration
Python 3.8 (Python Software Foundation, Wilmington, DE, USA) was used along with a number of necessary packages for data processing, modeling, and visualization, including: Pandas v1.5.3 (Pandas Development Team, PyData, Austin, TX, USA), NumPy v1.24.3 (NumPy Developers, Open Source Initiative, Palo Alto, CA, USA), TensorFlow v2.14.0 (Google Brain Team, Google LLC, Mountain View, CA, USA), Scikit-learn v1.3.2 (INRIA, Paris, France), imbalanced-learn v0.11.0 (Université Paris-Saclay, Paris, France), Matplotlib v3.8.1 (Matplotlib Development Team, NumFOCUS, Austin, TX, USA), Argparse v1.4.0 (Python Software Foundation, Wilmington, DE, USA). An Intel Xeon W-2123 server with CPU running at 3.60 GHz*8, NVIDIA Quadro P4000, 4-TB HDD and running Linux Ubuntu 20.04.6 LTS, 64-bit, GNOME version 3.36.8 handled all computations.
Data Preprocessing and Encoding
Various preprocessing steps were performed on the selected dataset, including data cleaning, encoding, and splitting, to ensure that the data were ready for training the FNN model [12]. First, data were loaded from a CSV file containing CRISPR sequences on the basis of their on- or off-target sites. The dataset was then shuffled to ensure randomness and eliminate potential ordering effects, thereby preventing the model from learning unintended patterns on the basis of data order. For data cleaning, all sequences were converted to uppercase letters to improve consistency and reduce complexity because CRISPR sequences are case insensitive. Next, the sequences were split into triplets to capture contextual information. Each triplet was then encoded and transformed into a binary matrix representation via one-hot encoding [13]. This ensured that each nucleotide in the triplet sequence was represented as a binary vector that was suitable for feeding into the neural network (Figure 3). After encoding, the sequences were padded to the maximum sequence length via pad sequences to ensure uniformity across all the data points.
[IMAGE OMITTED. SEE PDF]
Neural Network Model
The FNN proposed by Charlier et al. was used in our experiments [14]. The FNN architecture is less complex than convolutional neural network (CNN) architecture and performed well in our simulations. The encoded sequences were used as the inputs. The FNN model comprises multiple fully connected dense layers. We experimented with models containing 10 fully connected dense layers by decreasing the number of neurons from 256 to 1. The first dense layer of 256 neurons collected information from the encoded matrices. The dropout layer was fully connected to the first layer. Second, third, fourth, fifth, sixth, and seventh dense layers with 128, 64, 32, 16, 8, and 4 neurons, respectively, were then used, each with a dropout layer at the output. All dense layers had a uniform kernel initializer and a ReLU activation unit, except for the last layer, which had sigmoid activation for the final prediction of the on-target and off-target results. To optimize the model, the SGD optimizer [15] was used with a specified learning rate, and the model was compiled via a binary cross-entropy loss function and accuracy as performance metrics. This architecture effectively weighs the dropout layers and kernel regularization to enhance the robustness and performance of the model across various sequence pairs (Figure 4).
[IMAGE OMITTED. SEE PDF]
Performance Evaluation
The performance of the model was assessed via established criteria including accuracy, specificity, sensitivity, and precision. In addition, the assessment involved evaluating the area under the receiver operating characteristic curve (AUC of the ROC or AUROC) and using formulas for true positives, true negatives, false positives, and false negatives [16].
The formulas for these metrics are:
“True positive” (TP) denotes a sample correctly classified as belonging to the positive class by the model.
“False positive” (FP) indicates a sample that is inaccurately classified as belonging to the positive class but is actually part of the negative class.
“True negative” (TN) represents a sample correctly identified as belonging to the negative class by the model.
“False negative” (FN) represents a sample that was erroneously categorized as belonging to the negative class despite being part of the positive class.
Results
Dataset Preparation and Labeling
The original dataset consisted of 217,733 samples each with six characteristics. We focused on the sequences represented in the on-seq and off-seq columns. The 217,733 sequences were evenly split between on- and off-target sequences, with each sequence having an average length of 23 nucleotides. To prepare the data for binary classification, labels were assigned using the Label Encoder tool [17], with on-target sequences labeled as “1” and off-target sequences labeled as “0”. This binary labeling enabled the data to be used in a classification task. An example of the input-data structure is presented in Table 1. Hand labeling increased the number of true positives predicted by the model and decreased the number of false-positive results in the trials.
TABLE 1 Representation of the input SITE-Seq dataset before encoding.
| S.No | Direction | Location | Sequence | Class | Labeling |
| 0 | − | chr1:13386 | GGAGGCTCTAGGGAAAGGAAAAG | Off target | 0 |
| GGGGCCACTAGGGACAGGATNGG | On target | 1 | |||
| 1 | + | chr1:158621 | TGGGAAACTAGGGACAGTACTTG | Off target | 0 |
| GGGGCCACTAGGGACAGGATNGG | On target | 1 | |||
| 2 | − | chr1:427716 | GGGGGCACAGGAGACAGGCCTGG | Off target | 0 |
| GGGGCCACTAGGGACAGGATNGG | On target | 1 | |||
| 3 | + | chr1:556867 | GGGGGCACAGGAGACAGGCCTGG | Off target | 0 |
| GGGGCCACTAGGGACAGGATNGG | On target | 1 | |||
| 4 | + | chr1:812137 | GGGGGCTTCATGGACAGGAGTGG | Off target | 0 |
| GGGGCCACTAGGGACAGGATNGG | On target | 1 | |||
| 5 | + | chr1:762586 | GAGGCCGCGCGCGATAGGATGGG | Off target | 0 |
| GGGGCCACTAGGGACAGGATNGG | On target | 1 |
Dataset Encoding
We employed a one-hot encoding method following sequence labeling. After one-hot encoding of the on-target (on-seq) and off-target (off-seq) columns for each sequence in the dataset, sequence padding was applied. This transformation produces a binary matrix in which each row denotes the position of the nucleotides in the sequence, and each column represents a single nucleotide. The encoded sequences maintain the nucleotide position information for accurate model training. One-hot encoded data were generated so that they could be fed directly into the neural network models, giving them a consistent and well-organized input representation.
Experimental Procedure and Train‒Test Split
The dataset was divided into training, validation, and testing sets in a 60:20:20 ratio using the train_test_split method [17]. This division allowed effective training, hyperparameter tuning, and evaluation of unseen data. To address class imbalance within the training set, we used the synthetic minority oversampling technique [18]. SMOTE generated synthetic samples for the minority class, ensuring equal representation of both classes during training. This approach enhanced the model's ability to generalize across varying sequence pairs while mitigating the effects of data imbalance.
Modeling and Evaluation
We developed a model using a fully connected 10-layer neural network architecture to classify CRISPR sequences and distinguish between on- and off-target sequences. An FNN model with a 10-layer architecture was trained on the proposed data, and the model obtained 95.49% accuracy on the testing data. This high degree of accuracy indicates that the model can successfully discriminate between on-target and off-target effects, which is important in the training stage. The model was trained on a benchmark dataset and performed consistently with various sequence pairings. Table 2 presents the classification results for this model.
TABLE 2 Performance metrics of the FNN model.
| Model | Class | Precision | Recall | F1 -score | Support |
| FNN model with 10 fully connected layers | On-target | 0.94 | 0.96 | 0.95 | 42659 |
| Off-target | 0.96 | 0.95 | 0.96 | 44535 | |
| Accuracy | 0.95 | 87194 | |||
| Macro avg | 0.95 | 0.96 | 0.95 | 87194 | |
| Weighted avg | 0.95 | 0.95 | 0.95 | 87194 |
The hyperparameters for training were selected through cross-validation on the training set. Specifically, the model used a learning rate of 0.0015, a batch size of 32, and between 50 and 100 epochs, with early stopping applied to prevent overfitting. Early stopping was triggered by a plateau in validation loss, typically stopping training at approximately 50 epochs. Regularization techniques, such as L2 regularization (0.0025) [19] and a dropout rate of 0.75 [20], were applied to prevent overfitting, as illustrated by the effective alignment between the training and validation curves in Figure 5a. Also, Figure 5b,c show the ROC curve and confusion matrix of the FNN model.
[IMAGE OMITTED. SEE PDF]
The overall model performance was evaluated via a dependent test set to measure its classification accuracy. Table 2 summarizes the precision, recall, and F1 -scores for the on-target and off-target classifications of this test set, with the model achieving a final accuracy of 95.49%. Training accuracy, validation accuracy, and loss demonstrated an overall accuracy of 95% (Figure 5a). The confusion matrix plot (Figure 5b) illustrates the test data matrix, highlighting the performance of the model on the test data. Additionally, the final ROC-AUC curves were plotted for each label, showing values of 0.92 for on-target sequences and 0.85 for off-target sequences (Figure 5c). Future work will focus on validating the model using independent datasets to assess its generalizability and further refine hyperparameter selection. This validation helps ensure the model's robustness beyond the test data used here.
Discussion
In the field of genomics, CRISPR techniques are advancing with the help of computational resources [21]. Predicting off- and on-target activities of CRISPR technology is crucial for precise genome editing and to ensure minimal unintended alterations to the target organism [22]. The estimation of on target is essential for accurate gene editing to avoid unintended modifications. Several studies have applied machine learning and deep learning techniques to increase the efficiency of CRISPR systems, as reviewed previously [22]. Dual-action models predict CRISPR activity by combining CRISPR target sequences and gRNA inputs with features such as GC content, mismatch patterns, and thermodynamic properties. These models are often built as separate frameworks for on-target, off-target, and gRNA efficiency predictions, offering detailed insights into CRISPR behavior [23–25]. However, integrating all these elements into a single unified model for dual prediction (on-target and off-target effects) remains challenging because of the increased complexity. This study aimed to bridge this gap by developing a cohesive model that simultaneously addresses both on- and off-target predictions to offer a more streamlined and comprehensive approach to CRISPR activity analysis. Our study employed an FNN model approach to provide a comprehensive prediction framework that addresses the shortcomings of existing models. A comparison of the proposed model with established models is presented in Table 3.
TABLE 3 Comparison of existing models for predicting on- and off-target measures.
| Reference | Prediction | Dataset | Encoding | Model | Metrics | Accuracy |
| Lin et al. [26] | Off-target | GUIDE-seq, DoenchV2, CRISPOR, CIRCLE-seq, and Site-seq | One-hot | CRISPR-Net (LRCN) | AUROC: 0.995 | — |
| Niu et al. [9] | Off-target | GUIDE-seq, CRISPOR, CIRCLE-seq, and Site-seq | One-hot | R-CRISPR (bidirectional LSTM) | AUROC: 0.991 | — |
| Wen et al. [6] | Off-target | CIRCLE-Seq, SITE-Seq | One-hot | CRISPR-IP (CNN, BLSTM) | AUROC: 0.982, 0.990; | — |
| Xiang et al. [27] | Off-target & on-target | 12000 gRNA oligos targeting human protein-coding genes (CRISPRon database) | One-hot + GC content + position specific, position-independent features + thermodynamic features | CRISPRon and CRISPRoff: gradient boosting, regression trees, CNN | Spearman's R = 0.80 | — |
| Our model | On-target and off-target | SITE-Seq | One-hot | FNN (10 fully connected dense layers) | AUROC: 0.920 | 0.95 |
Several notable studies have used the SITE-Seq dataset for predictive modeling, primarily focusing on off-target effects. For example, CRISPR-IP [28] combines CNN and bidirectional long short-term memory (BLSTM) architectures trained on CIRCLE-Seq and SITE-Seq data and achieves an AUROC of 0.982 and an area under a precision-recall curve (AUPRC) of 0.751. The proposed method enhances prediction accuracy by integrating convolutional layers for feature extraction and BLSTM layers to capture sequence dependencies.
Similarly, graph convolutional network (GCN)-CRISPR using CRISPOR data achieved an AUROC of 0.987 [10]. This approach models sgRNA‒DNA interactions via the structural properties of DNA sequences, providing robust predictions of off-target effects. The modular optimal frequency Fourier (MOFF) correlator [29] employs dual CNN regression models trained on the GUIDE-seq, CHANGE-seq, and TTISS datasets and has a Spearman correlation coefficient of 0.5. This method leverages multiple datasets for comprehensive model training. Additionally, CRISPR-OFFT [28] was developed via a 1D-CNN with attention mechanisms using Digenome-seq, GUIDE-seq, and BLESS data, achieving an AUROC of 0.97, and an AUPRC of 0.79.
Numerous studies with a particular focus on deep learning methods have employed a variety of models, such as 2D-CNN [30], transformer CNN [28], LSTM + CNN, 1D-CNN, gradient boosting, and regression trees (27, 28), to predict both on- and off-target effects in CRISPR systems. However, these studies primarily focus on off-target effects and use diverse datasets without integrating SITE-Seq for comprehensive modeling. In contrast, our work introduces a dual-prediction model that leverages the same deep learning framework on SITE-Seq data, enabling simultaneous prediction of both on- and off-target effects.
Although these models are effective in automating feature identification and prediction, they have limitations. For example, Xiang et al. highlighted the limitations of using lentiviral surrogate vectors, which primarily detect small indels but fail to capture larger deletions or chromosomal rearrangements, potentially affecting the accuracy of gene-editing predictions [27]. These variability in on- and off-target datasets pose challenges because these datasets may not fully capture the diversity of CRISPR–Cas9 editing outcomes.
Our study also encountered challenges when applying the SITE-Seq dataset to an FNN architecture with fully connected dense layers. Various neural network architectures, including CNNs and recurrent neural networks, were tested to address overfitting and false predictions [14]. However, the FNN with 10 layers consistently demonstrated higher accuracy in predicting both off- and on-target effects. This result emphasizes the importance of achieving high-accuracy outcomes in CRISPR classification tasks, particularly for complex biological data that require robust and scalable models.
Our FNN model predicts CRISPR/Cas9 outcomes using the SITE-Seq dataset. Future studies will expand on the datasets examined; for example, GUIDE-seq and CIRCLE-seq will be used for their applicability to variations such as Cas12 and Cas13. Building dual-action models with separate inputs for gRNA and target sequences may result in a better approach for using additional predictive variables, such as GC percent, mismatch, and thermodynamics, which can provide nuanced details of the prediction; however, this was beyond the scope of our investigation. Finally, establishing an easy-to-use interface or application will improve the process of gRNA selection and reduce undesired targeting, enhancing the gene editing technique.
Conclusions
We developed a model trained exclusively on the SITE-Seq dataset that advances existing methodologies for predicting both on- and off-target CRISPR effects. By leveraging the PAM motif as a key determinant, the model adheres to a rigorous framework to predict CRISPR outcomes. While the model demonstrates strong performance within the confines of the dataset, its accuracy may diminish when applied to broader contexts. This proof-of-concept establishes a foundational framework for exploring sequence-guided CRISPR activity, highlighting the critical role of PAM morphology in Cas9 functionality. Compared with existing models, it exhibits superior predictive accuracy for target sequences without relying on additional data inputs. Future research will focus on integrating biological and genomic factors to develop more comprehensive models that can optimize predictions through the use of expanded datasets and methodologies. Specifically, efforts will be directed toward refining sgRNA design to minimize off-target effects and improve the overall reliability of genome editing technologies for practical applications.
Author Contributions
Pavithra Nagendran: conceptualization (equal), data curation (equal), formal analysis (equal), methodology (equal), software (equal), validation (equal), visualization (equal), writing – original draft (equal), writing – review and editing (equal). Gowtham Murugesan: conceptualization (supporting), data curation (equal), formal analysis (equal), methodology (equal), software (equal), validation (equal), visualization (equal), writing – original draft (supporting), writing – review and editing (equal). Jeyakumar Natarajan: conceptualization (lead), formal analysis (supporting), investigation (lead), project administration (lead), resources (supporting), supervision (lead), validation (lead), writing – review and editing (lead).
Acknowledgments
The authors have nothing to report.
Ethics Statement
The authors have nothing to report.
Consent
The authors have nothing to report.
Conflicts of Interest
The authors declare no conflicts of interest.
Data Availability Statement
All data generated or analyzed during this study are included in this published article. The link of the public SITE seq dataset that is used in this study is ().
M. Redman, A. King, C. Watson, and D. King, “What Is CRISPR/Cas9?,” Archives of Disease in Childhood Education and Practice Edition 101, no. 4 (2016): 213–215, https://doi.org/10.1136/archdischild‐2016‐310459.
R. Sorek, C. Martin Lawrence, and B. Wiedenheft, “CRISPR‐Mediated Adaptive Immune Systems in Bacteria and Archaea,” Annual Review of Biochemistry 82, no. 1 (2013): 237–266, https://doi.org/10.1146/annurev‐biochem‐072911‐172315.
M. Asmamaw and B. Zawdie, “Mechanism and Applications of CRISPR/Cas‐9‐Mediated Genome Editing,” Biologics: Targets & Therapy 15 (2021): 353–361, https://doi.org/10.2147/BTT.S326422.
X. Wu, A. J. Kriz, and P. A. Sharp, “Target Specificity of the CRISPR‐Cas9 System,” Quantitative Biology 2, no. 2 (2014): 59–70, https://doi.org/10.1007/s40484‐014‐0030‐x.
C. Guo, X. Ma, F. Gao, and Y. Guo, “Off‐Target Effects in CRISPR/Cas9 Gene Editing,” Frontiers in Bioengineering and Biotechnology 11 (2023): 1143157, https://doi.org/10.3389/fbioe.2023.1143157.
W. Wen and X.‐B. Zhang, “CRISPR–Cas9 Gene Editing Induced Complex On‐Target Outcomes in Human Cells,” Experimental Hematology 110 (2022): 13–19, https://doi.org/10.1016/j.exphem.2022.03.002.
J. Tycko, V. E. Myer, and P. D. Hsu, “Methods for Optimizing CRISPR‐Cas9 Genome Editing Specificity,” Molecular Cell 63, no. 3 (2016): 355–370, https://doi.org/10.1016/j.molcel.2016.07.004.
P. Cameron, C. K. Fuller, P. D. Donohoue, et al., “Mapping the Genomic Landscape of CRISPR‐Cas9 Cleavage,” Nature Methods 14, no. 6 (2017): 600–606, https://doi.org/10.1038/nmeth.4284.
R. Niu, J. Peng, Z. Zhang, and X. Shang, “R‐CRISPR: A Deep Learning Network to Predict Off‐Target Activities With Mismatch, Insertion and Deletion in CRISPR‐Cas9 System,” Genes 12, no. 12 (2021): 1878, https://doi.org/10.3390/genes12121878.
P. K. Vinodkumar, C. Ozcinar, and G. Anbarjafari, “Prediction of sgRNA Off‐Target Activity in CRISPR/Cas9 Gene Editing Using Graph Convolution Network,” Entropy 23, no. 5 (2021): 608, https://doi.org/10.3390/e23050608.
J. Lin and K.‐C. Wong, “Off‐Target Predictions in CRISPR‐Cas9 Gene Editing Using Deep Learning,” Bioinformatics 34, no. 17 (2018): i656–i663, https://doi.org/10.1093/bioinformatics/bty554.
G. Lo Bosco and M. A. Di Gangi, “Deep Learning Architectures for DNA Sequence Classification,” in Fuzzy Logic and Soft Computing Applications (Springer International Publishing, 2017), 162–171, https://doi.org/10.1007/978‐3‐319‐52962‐2_14.
A. C. H. Choong and N. K. Lee, “Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences Using Ordinal Versus One‐Hot Encoding Method,” in 2017 International Conference on Computer and Drone Applications (IConDA), (2017), 60–65, https://doi.org/10.1109/ICONDA.2017.8270400.
J. Charlier, R. Nadon, and V. Makarenkov, “Accurate Deep Learning Off‐Target Prediction With Novel SgRNA‐DNA Sequence Encoding in CRISPR‐Cas9 Gene Editing,” Bioinformatics 37, no. 16 (2021): 2299–2307, https://doi.org/10.1093/bioinformatics/btab112.
M. Alagözlü, “Stochastic Gradient Descent Variants and Applications,” (2022), https://rgdoi.net/10.13140/RG.2.2.12528.53767.
M. Hossin and M. N. Sulaiman, “A Review on Evaluation Metrics for Data Classification Evaluations,” International Journal of Data Mining & Knowledge Management Process 5, no. 2 (2015): 1–11, https://doi.org/10.5121/ijdkp.2015.5201.
F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit‐Learn: Machine Learning in Python,” Journal of Machine Learning Research 12, no. Feb. 1 (2011): 2825–2830.
J. H. Joloudari, A. Marefat, M. Ali Nematollahi, S. S. Oyelere, and S. Hussain, “Effective Class‐Imbalance Learning Based on SMOTE and Convolutional Neural Networks,” Applied Sciences 13, no. 6 (2023): 4006, https://doi.org/10.3390/app13064006.
C. Cortes, M. Mohri, and A. Rostamizadeh, “L2 Regularization for Learning Kernels,” arXiv (2012): [cited 2024 Nov 27], https://arxiv.org/abs/1205.2653.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks From Overfitting,” JMLR.org 15, no. 1 (2014): 1929–1958.
C. Li, W. Chu, R. A. Gill, et al., “Computational Tools and Resources for CRISPR/Cas Genome Editing,” Genomics, Proteomics & Bioinformatics 21, no. 1 (2023): 108–126, https://doi.org/10.1016/j.gpb.2022.02.006.
Z. Sherkatghanad, M. Abdar, J. Charlier, and V. Makarenkov, “Using Traditional Machine Learning and Deep Learning Methods for On‐ and Off‐Target Prediction in CRISPR/Cas9: A Review,” Briefings in Bioinformatics 24, no. 3 (2023): bbad131, https://doi.org/10.1093/bib/bbad131.
H. Zhang, J. Yan, Z. Lu, et al., “Deep Sampling of gRNA in the Human Genome and Deep‐Learning‐Informed Prediction of gRNA Activities,” Cell Discovery 9, no. 1 (2023): 48, https://doi.org/10.1038/s41421‐023‐00549‐9.
Y. Wan and Z. Jiang, “TransCrispr: Transformer Based Hybrid Model for Predicting CRISPR/Cas9 Single Guide RNA Cleavage Efficiency,” IEEE/ACM Transactions on Computational Biology and Bioinformatics 20, no. 2 (2023): 1518–1528, https://doi.org/10.1109/TCBB.2022.3201631.
G. Chuai, H. Ma, J. Yan, et al., “DeepCRISPR: Optimized CRISPR Guide RNA Design by Deep Learning,” Genome Biology 19, no. 1 (2018): 80, https://doi.org/10.1186/s13059‐018‐1459‐4.
J. Lin, Z. Zhang, S. Zhang, J. Chen, and K.‐C. Wong, “CRISPR‐Net: A Recurrent Convolutional Network Quantifies CRISPR Off‐Target Activities With Mismatches and Indels,” Advanced Science 7, no. 13 (2020): 1903562, https://doi.org/10.1002/advs.201903562.
X. Xiang, G. I. Corsi, C. Anthon, et al., “Enhancing CRISPR‐Cas9 gRNA Efficiency Prediction by Data Integration and Deep Learning,” Nature Communications 12, no. 1 (2021): 3238, https://doi.org/10.1038/s41467‐021‐23576‐0.
G. Zhang, T. Zeng, Z. Dai, and X. Dai, “Prediction of CRISPR/Cas9 Single Guide RNA Cleavage Efficiency and Specificity by Attention‐Based Convolutional Neural Networks,” Computational and Structural Biotechnology Journal 19 (2021): 1445–1457, https://doi.org/10.1016/j.csbj.2021.03.001.
R. Fu, W. He, J. Dou, et al., “Systematic Decomposition of Sequence Determinants Governing CRISPR/Cas9 Specificity,” Nature Communications 13, no. 1 (2022): 474, https://doi.org/10.1038/s41467‐022‐28028‐x.
Q. Liu, D. He, and L. Xie, “Prediction of Off‐Target Specificity and Cell‐Specific Fitness of CRISPR‐Cas System Using Attention Boosted Deep Learning and Network‐Based Gene Feature,” PLoS Computational Biology 15, no. 10 (2019): e1007480, https://doi.org/10.1371/journal.pcbi.1007480.
© 2025. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.