Content area
Arabic abstractive summarization presents a complex multi-objective optimization challenge, balancing readability, informativeness, and conciseness. While extractive approaches dominate NLP, abstractive methods—particularly for Arabic—remain underexplored due to linguistic complexity. This study introduces, for the first time, ant colony system (ACS) for Arabic abstractive summarization (named AASAC—Arabic Abstractive Summarization using Ant Colony), framing it as a combinatorial evolutionary optimization task. Our method integrates collocation and word-relation features into heuristic-guided fitness functions, simultaneously optimizing content coverage and linguistic coherence. Evaluations on a benchmark dataset using
Full text
1. Introduction
In our data-driven digital era, automatic text summarization has become indispensable for efficient information management. This technology aims to produce concise summaries from source texts without human intervention, offering significant time savings across various domains. Beyond facilitating rapid content digestion, it addresses information storage challenges by compressing documents. Its applications span numerous fields including news aggregation, medical record synthesis, educational material condensation, and web content browsing.
The field traces its origins to the pioneering work on technical paper abstracts by Luhn [1]. Modern summarization systems handle both single-document and multi-document inputs, producing outputs that are either extractive (selecting key sentences) or abstractive (generating novel formulations). These outputs range from headlines to full summaries [2]. While extractive methods dominate current research, abstractive approaches—particularly for morphologically rich languages like Arabic—remain underdeveloped due to their inherent complexity involving paraphrasing, sentence fusion, and novel word generation. This challenge is compounded by the slow progress in developing appropriate evaluation metrics.
Current approaches to abstractive summarization employ diverse techniques, including deep learning, discourse analysis, graph-based methods, and hybrid systems. Notably absent from this landscape is the application of Swarm Intelligence (SI) algorithms, despite their proven success in other NLP tasks. While SI methods like Ant Colony Optimization [3,4], Particle Swarm Optimization [5], and Cat Swarm Optimization [6] have shown promise in extractive summarization, their potential for abstractive summarization remains largely unexplored—particularly for Arabic.
We address this gap by framing abstractive summarization as a multi-objective optimization problem and proposing AASAC (Arabic Abstractive Summarization using Ant Colony). Our approach builds upon the ant colony system’s (ACS) proven effectiveness in pathfinding problems like the Traveling Salesman Problem (TSP), adapting it for linguistic optimization.
A key advantage of nature-inspired optimization methods—and non-deep learning approaches in general—is their transparency and credibility, in contrast to the opaque nature of deep neural networks. While deep learning models often suffer from interpretability issues and hallucination [7], AASAC provides full traceability, allowing step-by-step scrutiny of its summarization process. This explainability is particularly valuable in abstractive summarization, where transparency ensures credible and controllable output generation. Key contributions include the following: Nature-Inspired Algorithm: We introduce AASAC (Arabic Abstractive Summarization using Ant Colony), a novel approach for Arabic abstractive summarization that leverages the ant colony system (ACS), a nature-inspired algorithm. This innovative technique leads to superior summarization results. Expanded Dataset: We have expanded the dataset introduced in [8] by incorporating human-generated abstractive summaries. This dataset expansion facilitates a more comprehensive evaluation process, and it is readily accessible to fellow researchers. Semantic Feature Integration: To enhance the efficacy of summarization, we incorporate semantic features as the foundation for fitness functions. This approach significantly enhances the capacity to generate high-quality summaries. Linguistically Aware Evaluation: Recognizing the unique linguistic characteristics of the Arabic language, we advocate the use of the
The rest of the paper is organized as follows: In Section 2, we provide an overview of related work in the field of abstractive text summarization. Section 3 details the formulation of the ATS problem. Our proposed summarizer is explained in Section 4. Experimental results and discussion are presented in Section 5. Finally, in Section 6, we conclude the paper and discuss potential directions for future research.
2. Related Works
Within the literature, researchers have recognized the limitations of abstractive text summarization systems due to the complexities associated with natural language processing. As a result, this has attracted the interest of researchers, prompting the exploration of various methods to obtain abstractive summaries, such as graph-based and semantic-based techniques.
Graph-based methods have traditionally dominated the field. These methods entail representing the text by using a graph data structure and determining an optimal path for generating a summary. For instance, Opinosis [9] and Kumar et al. [10] adopted this approach, with Opinosis incorporating shallow NLP techniques and the latter employing a bigram model to identify significant bigrams for summary generation. These techniques excel in extracting essential information and producing concise summaries. It is important to note that this approach does not involve sentence paraphrasing or the use of synonyms. Some summarization systems initially utilize extractive methods and then transition to generating abstractive summaries, as demonstrated by COMPENDIUM [11,12].
Another approach involves employing a semantic graph reduction technique, as demonstrated in [13]. In their summarization method, they initiate the process by creating a rich semantic graph (RSG), which serves as an ontology-based representation. Subsequently, this RSG is transformed into a more abstracted graph, culminating in the generation of the abstractive summary from the final graph. The utilization of RSGs for Arabic text representation was further explored by the same authors in ongoing work related to Arabic summarization [14,15]. Additionally, another study applied this technique to summarize Hindi texts [16]. In their model, the authors harnessed Domain Ontology and Hindi WordNet to facilitate the selection of diverse synonyms, thus enriching the summary generation process.
Furthermore, Le and Le [17] introduced an abstractive summarization method for the Vietnamese language that distinguishes itself from [9,18] by incorporating anaphora resolution. This innovative approach effectively tackles the challenge of obtaining diverse words or phrases to represent the same concept, even when they exist in different nodes within the graph. The summarizer uses Rhetorical Structure Theory (RST) to streamline sentences. It achieves this by removing less important and redundant clauses from the beginning of a sentence and then reconstructing the refined sentence based on syntactic rules. This summarization technique is elegantly straightforward, as it amalgamates multiple sentences represented within a word graph, employing three predefined cases.
One of the earliest Arabic abstractive summarization systems, presented by Azmi and Altmami [8], was developed on the foundation of a successful RST-based Arabic extractive summarizer [19,20]. In this approach, the sentences generated from the original text are first shortened by removing certain words, such as position names and days. Subsequently, sentence reduction rules are applied to create an abstractive summary. However, it is important to note that this method may result in non-coherent summaries due to the absence of paraphrasing. For the Malayalam language, Kabeer and Idicula [21] employed a semantic graph based on POS (Part Of Speech) tagging. They used a set of features to assign weights to the relationships between two nodes representing the subject and object of a sentence. This process culminated in the generation of a reduced graph, from which an abstractive summary was derived. In the case of Kannada, a guided summarizer called sArAmsha [22] relied on lexical analysis, Information Extraction (IE), and domain templates to generate sentences. A similar approach has also been implemented for the Telugu language [23].
Another relevant line of work emphasizes the role of text segmentation in improving summarization quality. SEGBOT [24] introduced a neural end-to-end segmentation model that leverages bidirectional recurrent networks and a pointer mechanism to detect text boundaries without hand-crafted features. It addresses key challenges in segmenting documents into topical sections and sentences into elementary discourse units (EDUs), which are crucial to structuring coherent summaries. SEGBOT’s outputs have also been shown to enhance downstream applications such as sentiment analysis, suggesting its broader utility in discourse-aware summarization pipelines.
In a related direction, Chau et al. [25] introduced DocSum, a domain-adaptive abstractive summarization framework specifically designed for administrative documents. This work addresses key challenges in processing such texts, including noisy OCR outputs, domain-specific terminology, and scarce annotated data. DocSum employs a two-phase training strategy: (1) pre-training on noisy OCR-transcribed text to enhance robustness, followed by (2) fine-tuning with integrated question–answer pairs to improve semantic relevance. When evaluated on the RVL-CDIP dataset, DocSum demonstrated consistent improvements over a BART-base baseline, with ROUGE-1 scores increasing from 49.52 to 50.72 (+1.20%). Smaller but statistically significant gains were observed in ROUGE-2 (+1.14%) and ROUGE-L (+0.96%). These results highlight the framework’s ability to handle domain-specific nuances while maintaining summary coherence.
Sagheer and Sukkar [26] introduced a hybrid system that combines knowledge base and fuzzy logic techniques for processing domain-specific Arabic text. This system operates by leveraging predefined concepts associated with the domain. Within this framework, the knowledge base serves the purpose of identifying concepts within the input text and extracting semantic relations between these concepts. The resulting sentences generated by the system comprise three essential components: subject, verb, and object. Multiple sentences are produced based on the identified concepts and their relations, and a fuzzy logic system is then employed. This fuzzy logic system computes a fuzzy value for each word in a sentence, utilizing fuzzy rules and defuzzification techniques to rank the summary sentences in descending order based on their fuzzy values. It is worth noting that the system’s performance was evaluated on texts sourced from the Essex Arabic Summaries Corpus. However, it is important to mention that no specific evaluation method was applied to systematically assess and compare the system against other techniques.
In the realm of machine learning and deep learning, Rush et al. [27] introduced a neural attention-based summarization model that utilizes a feed-forward neural network language model (NNLM) [28] and an attention-based model [29] to generate headlines with fixed word lengths. However, this model has certain limitations. It tends to summarize each sentence independently, relies on the source text’s vocabulary, and sometimes constructs sentences with incorrect syntax, as it focuses on reordering words. To address some of these issues, Chopra et al. [30] developed the recurrent attention summarizer (RAS), which incorporates word positions and their word-embedding representations to handle word ordering challenges. Additionally, the encoder–decoder recurrent neural network (RNN) [29] has been a fundamental component in many abstractive summarization models. Nallapati et al. [31] enhanced [29]’s model by adding an attention mechanism and applying the large vocabulary trick (LVT) [32]. They also tackled the problem of out-of-vocabulary (OOV) words by introducing a switching generator–pointer model [33]. However, a drawback of this model was the generation of repetitive phrases, which was mitigated to some extent by employing a Temporal Attention model [34]. Another approach improved handling OOV words by learning when to use the pointer and when to generate a word [35], a technique that enhanced [31]’s model.
The issue of repetition was further addressed by incorporating the coverage model [36] and by implementing an intra-decoder attention mechanism [37]. In Arabic headline summarization [38], a pointer–generator model [35] with an attention mechanism [29] served as a baseline, and a variant with a copy mechanism [33] was developed. The latter model demonstrated improved results compared with the baseline. Notably, the model with a copy mechanism and a length penalty outperformed other variants that incorporated coverage penalties or length and coverage penalties, largely due to considerations related to summary length limitations. To evaluate these models, an Arabic headline summary (AHS) dataset was created. Additionally, another study in Arabic explored sequence-to-sequence models with global attention for generating abstractive headline summaries [39]. They examined the impact of the number of encoder layers for three types of networks: gated recurrent units (GRUs), LSTM, and bidirectional LSTM (BiLSTM). Evaluation using ROUGE and BLUE measures, employing the AHS dataset [38] and Arabic Mogalad_Ndeef (AMN) [40], indicated that the two-layer encoder for GRUs and LSTM achieved better results than the single-layer and three-layer configurations. Conversely, the three-layer BiLSTM encoder outperformed the single-layer and two-layer configurations. Notably, utilizing AraBERT [41] in the data preprocessing stage contributed to improved results.
Furthermore, the RNN architecture initially proposed by [31] underwent modifications for a multi-layer encoder and single-layer decoder summarization model tailored to Arabic [42]. The encoder incorporates three hidden state layers for input text, keywords, and text name entities. These layers employ bidirectional LSTM and feature a global attention mechanism for enhanced performance.
Pre-trained language models, including BERT (Bidirectional Encoder Representations from Transformers) [43] and BART (Bidirectional and Auto-Regressive Transformers) [44], have found applications in abstractive summarization tasks. BART, in particular, distinguished itself from BERT by pre-training both the bidirectional encoder and the auto-regressive decoder. To harness the power of BERT for text summarization,
For Arabic abstractive summarization, Elmadani et al. [46] utilized multilingual BERT [47] to train
Another noteworthy development is AraBART [49], an Arabic pre-trained BART model. AraBART underwent fine-tuning for Arabic abstractive summarization tasks, utilizing datasets from Arabic Gigaword [50] and XL-Sum [51]. The evaluation results revealed that AraBART outperformed the pre-trained Arabic BERT-based model [52], the multilingual mBART model [53], and the mT5 model [54].
Additive Manufacturing (AM) constructs objects layer by layer from digital models, generating extensive unstructured textual data such as design rationales, process parameters, and material specifications. Efficient organization of this knowledge is essential to Design for Additive Manufacturing (DFAM), where traceability and interpretability are critical to informed decision making. Abstractive summarization offers a scalable solution by condensing complex AM content into coherent and actionable insights. Recent advances enhance factual consistency in summarization by integrating structured knowledge extraction methods, including triple classification and knowledge graphs (KGs). For example, AddManBERT [55] employs dependency parsing to extract semantic relations between AM entities (e.g., material–process dependencies) and encodes them as vector representations. Complementary work utilizes neural models with meta-paths to capture hierarchical semantics between entities and relations, while KG-based methods support scalable triple classification from multi-source Fused Deposition Modeling data. These techniques have demonstrated superior classification accuracy and computational efficiency compared with rule-based systems, underscoring the value of KG-augmented summarization in AM knowledge management.
Table 1 provides an overview of abstractive text summarization studies, including details about the corpus used and the scope of the summary. The summary scope can fall into one of three categories: headline, sentence level (where a single sentence serves as the summary), or document level (which generates multiple or a few lengthy sentences).
Today, deep learning forms the backbone of most abstractive summarization models [7]. However, its effectiveness typically assumes access to large-scale training data and substantial computational resources—conditions often unmet for Arabic and other morphologically rich languages. Our ACS-based approach offers a strategically compelling alternative by addressing four critical limitations of neural methods. First, it operates effectively with limited labeled data, making it suitable for specialized Arabic domains where annotated corpora are scarce. Second, its explicit modeling of Arabic root patterns and collocations through interpretable fitness functions introduces morphological awareness, which is often lacking in transformer-based systems without extensive pre-training. Third, the framework inherently supports multi-objective optimization, enabling precise trade-offs between competing priorities such as content density and readability—a capability that requires complex architectural modifications in neural models. Finally, ACS achieves competitive performance without reliance on GPUs, thereby democratizing access to abstractive summarization for Arabic NLP researchers and practitioners with constrained infrastructure. Rather than opposing the neural paradigm, this work expands the methodological repertoire for Arabic NLP, particularly in low-resource, high-interpretability scenarios. The success of AASAC further suggests promising directions for hybrid systems combining neural fluency with nature-inspired optimization.
3. Problem Formulation
The ant colony system (ACS) algorithm, an enhanced variant of Ant Colony Optimization (ACO) [56], provides an effective framework for addressing our multi-objective Arabic abstractive summarization challenge. As a population-based metaheuristic, ACS mimics the emergent intelligence of natural ant colonies, where artificial ants collaboratively explore solution paths while dynamically updating pheromone trails to guide subsequent searches toward optimal solutions. This biologically inspired approach is particularly suited for our task, as it efficiently balances multiple competing objectives—content coverage, linguistic coherence, and summary conciseness—through its distributed optimization mechanism.
We formulate the abstractive summarization task as a graph-based optimization problem, where the source document is represented as a connected network of word nodes. Unlike previous ACO applications in extractive summarization [3] that treated entire sentences as nodes, our AASAC approach operates at the word level to enable finer-grained abstractive generation. Each node encapsulates lexical, collocational, and semantic features that collectively inform our multi-component fitness function. The ACS agents navigate this linguistic landscape, with pheromone dynamics reflecting both local heuristic information (word relations) and global summary quality metrics.
This formulation advances prior work in three key aspects: (1) the graph representation preserves Arabic-specific morphological and syntactic relationships critical to abstractive generation; (2) the optimization process simultaneously considers semantic preservation and linguistic fluency through specialized fitness functions; and (3) the solution path directly feeds into a generation module that produces human-like summaries rather than extracted fragments. Table 2 summarizes the mathematical notation for our ACS adaptation to this novel domain.
Our approach can be outlined as follows: Consider a set of words representing the words in the original document. Within this set, each word is associated with a cost that takes into account factors like its position in the document and frequency of occurrence. These words are interconnected through edges denoted by , with each edge carrying a cost that signifies the sequential relationship between the connected words. Importantly, a word can be linked to multiple other words. Our ultimate objective is to construct a summary by identifying a set of words that maximizes the following expression:
(1)
(2)
where assumes a value of 1 when word is chosen and 0 otherwise. |W| signifies the total number of words in the document, E represents the count of edges in the document, denotes the cost attributed to the combination of word and edge j, and serves as the constraint that limits the overall length of the selected summary words.The ACS algorithm consists of three main steps in each iteration. Initially, every ant constructs a solution path, essentially creating a word summary. Subsequently, it identifies the best path among all those generated by the ants up to that point. Lastly, there is a global update of the pheromone level for this best path.
In the process of constructing a solution, unlike in the TSP, where all nodes are explored, each ant, denoted by k, adds an edge labeled j to its path and adjusts the edge’s pheromone level. This process continues until the path reaches a predefined threshold, represented by , which limits the length of the summary. The selection of edge j over another edge follows a pseudo-random-proportional rule described by Equation (3):
(3)
where represents the pheromone level of edge , denotes the heuristic information value for edge , is a parameter that determines the relative importance of the heuristic information value, while q and are real values ranging from 0 to 1. Additionally, represents the set of nearest-neighbor nodes that have not been selected by ant k, which essentially comprises the n-gram words originating from the current word. The value of is given by Equation (4),(4)
where signifies that j belongs to the set of nearest-neighbor nodes not chosen by ant k and U represents the count of available nodes that have not yet been selected by ant k. It is important to note that the denominator of the sum is not zero. When an ant selects an edge, the local update of the edge’s pheromone level takes place using Equation (5). In this equation, the evaporation rate is a real value within the range of 0 to 1.(5)
In the ACS algorithm’s second step, the aim is to identify the best-so-far path among the set of solutions created by the ants, and this is determined based on a fitness function. Finally, the pheromones associated with the best-so-far path are updated globally using Equation (6),
(6)
Here, represents the global pheromone evaporation rate, and signifies the fitness value for the best-so-far solution. We will introduce and define two functions relevant to this process in Section 4.3.
4. Proposed Approach
To generate an abstractive summary for a document, the AASAC system consists of four stages: preprocessing, representation, modeling, and text generation (see Figure 1). The following is a detailed description of the individual stages.
4.1. Preprocessing
The preprocessing stage is essential to any NLP task to make the text ready for the next stage of generating a summary. Special characters such as document formatting and diacritics are removed. Exclamation and question marks are replaced with dots as used in ending a sentence. By using the STANZA library [57] and PADT treebank (
If the two consecutive words have UPOS tag “X”, which means Other (see Example 1 in Table 3).
If a word is tagged “NOUN”, followed by a word tagged “PUNC”, followed by a word tagged “NOUN” (see Example 2 in Table 3).
If a word is tagged “X”, followed by a word tagged “PUNC”, followed by a word tagged “X” (see Example 3 in Table 3).
Text filtering on the sentence (القاهرة - الوطن أكد د. مصطفى علوي رئيس الهيئة العامة لقصور الثقافة بالقاهرة: “Cairo—Alwatan Dr. Mustafa Alawi, head of the General Authority for Cultural Palaces in Cairo, confirmed”).
| Token | القاهرة | - | الوطن | أكد | د | . | مصطفي | علوي | رئيس | الهيئة |
| Cairo | – | Alwatan | confirmed | Dr | . | Mostafa | Alawi | head of | Authority | |
| UPOS | NOUN | PUNCT | NOUN | VERB | X | PUNCT | X | X | NOUN | NOUN |
| XPOS | N–S1D | G– | N–S1D | VP-A-3MS | Y– | G– | U– | U– | N–S1R | N–S2D |
| | | |||||||||
| | ||||||||||
Once the words are filtered, lemmatization [57] is applied on each word to extract its root. The aim is to normalize the words and simplify language processing. The reason behind applying lemmatization over stemming is the importance of a word’s meaning for analysis. Lemmatization takes into account the context and converts the word into its meaningful base form.
4.2. Representation
We create a graph using Neo4j, a graph database platform (
We incorporate additional features at both the node and edge levels to enhance the selection of important words. One of these features is the word-relation feature, which identifies semantic relationships based on the words in a text block. This helps establish connections between various entities, such as people, places, and organizations mentioned in the text. We obtain this feature from the IBM Watson Natural Language Understanding text analytics service (
Additionally, we introduce a collocation property, which refers to the frequent occurrence of two or more words together. This property is added to an edge connecting two collocated words, such as (الأمم المتحدة: “United Nations”). The collocation feature helps identify significant words or phrases in the text that are commonly used together, offering a stronger indication of essential information in the text. To set the edge’s property value (0 or 1), indicating the absence or presence of a collocation between edges and nodes, we utilize Maskouk’s Arabic Collocation Dataset [58]. Figure 4 provides an overview of the properties associated with nodes and edges.
Figure 5 shows a sample Arabic text, whose graph representation is shown in Figure 6.
4.3. Modeling
Once the preparation is completed, the ACS algorithm is applied as follows.
We begin by specifying the number of ants and the number of iterations. Additionally, we mark all nodes and edges as unexplored or unvisited, and we set the pheromone parameter for all edges.
Furthermore, we calculate the candidate list for each node. This list is composed of n-gram adjacent nodes to the given node. To illustrate, Figure 7 provides examples of 1-gram, 2-gram, and 3-gram candidate lists for the lemma node (دراسة: “a study”). When forming the 1-gram list, the candidate list includes the end nodes of the red edges. For the 2-gram list, it encompasses the end nodes of both red and green edges. Finally, the 3-gram list comprises the end nodes of the red, green, and blue edges.
Subsequently, we calculate the heuristic information, considering different features. Lastly, we select a start node and mark it as explored.
The process continues for a set number of rounds while following these steps: (1) Construct a solution for each ant using Equations (3) and (4). (2) Select the best path based on a fitness function using either Equation (9) or Equation (10). (3) Update the global pheromone using Equation (6).
We display the final best path, which corresponds to a single path with the highest weight.
For each word , we define two heuristic information functions, denoted by and , as follows:
(7)
(8)
where is the frequency of word , represents the number of edges connected to word , E is the total number of edges, is the edge’s collocation weight, and is the word’s relation weight.Additionally, we propose two fitness functions, and ; one considers the heuristic information values of all nodes in the best path, given by Equation (9). The other utilizes the relation and collocation features (Equation (10)).
(9)
(10)
4.4. Text Generation
In the previous stage, the best path, which contains a set of lemmas that will be used to generate a summary, is generated. Since many words can refer to one lemma, a series of steps are conducted to retrieve preserved words and generate a final summary. As mentioned before, the original word for each lemma is stored in the Token node.
4.4.1. Forward Position Filtering
We extract all tokens related to a lemma. Some Token nodes have multiple words that refer to different positions in the original text. So, we perform filtering by position. For each node (called start node), we check the next node (referred to as the end node). If the start and end nodes are in the same sentence, then the start node is ignored if its position is after the end node, since reversing their positions is not acceptable. If the start node is in a sentence that is before the end-node sentence, then the start node is added to the list of nodes.
4.4.2. Backward Position Filtering
For each end node, if the start and end nodes are in the same sentence, then the end node is added to the list of nodes if there is a position for the end token after any position of the start node. If the end node occurred in any sentences after the start node sentence, then it is added to the list (word list). Another filtering occurs when the start node has only one word. In this case, the end token words are filtered by removing all words that are far away from the start token word. This will decrease the number of words that belong to the same lemma.
4.4.3. Processing Tokens
In this step, further processing is performed on the tokens that are filtered in the previous step. This processing is performed at the token level.
If the start token has multiple words and the end token has only one word ending in (هـ), then we remove any start token word that starts with (ال). Once the start node has only one token, then the start and end nodes are concatenated to form one word.
Otherwise, when the start node token has multiple words, (a) if a token ends in (ا) and there is another one having the same letters but not ending in (ا), then the token is removed; e.g., in (رئيسا, رئيس), the word (رئيسا) will be removed, since diacritics are not our concern in the summary and to minimize the suggested words. (b) Ff the previous token starts with (ال) and there is a start token beginning with (ال), then any word from the start token that does not start with (ال) is removed. Otherwise, the word that does not start with (ال) is removed.
4.4.4. Processing Final Summary
In this step, processing is performed on the list of tokes from the previous step. Each Token node having more than one word is concatenated in one string enclosing with brackets and separated with commas. This string will be displayed in the final summary to give the reader options for appropriate words. For example,
يظهر هذا البحث ان شبكية العين (خاصة, الخاصة) المتصلة مباشرة بمنطقة المخ.
If a Token node has only one word, then a set of conditions are checked as follows: If the word token starts with the letter (و) and the next token has only one letter, then the next token is migrated to the next round check as the current token. If the word token is one of the prefix letters (ب، س، ف، ل) and the next token is a preposition, then the token is ignored. If the next token is one of the suffixes (هـ، ها، هم), then it is appended with the current token after some modifications as follows: (a) If the current token is a preposition that ends in (ى), then the ending letter is changed to (ي), e.g., (على + هـ →عليه). (b) If the current token is one of the prepositions (حتى، كي، منذ، مذ، متى، رب), then the next token is ignored, since it cannot be a valid Arabic word, after it has been concatenated with the current token. (c) If the current token has only one letter or starts with (ال), then the next token is ignored; e.g., (س + هم سهم), becomes a different word or not a valid one. (d) If the current token ends in (ة) and does not start with (ال), then (ة) is replaced with (ت), e.g., (طبيعة + هـ →طبيعته). (e) If the current token ends in (ى), then (ى) is replaced with (ا), e.g., (أجرى + ها → أجراها). If the current token is the letter (ل) and the next one starts with (ال), then the letter (ا) is removed from the next token word, and the tokens are concatenated, e.g., (ل + السماء → للسماء). If the current token is one of the prefixes (س، ب، ف، ل), then it is concatenated with the next one. Otherwise, the next token is migrated to the next iteration as the current token.
Any sentence with less than four words is removed. If a sentence ends with a preposition, then it is removed. And, any one-letter word is removed unless it is the letter (و) or a full-stop mark.
5. Results and Discussion
A set of experiments were conducted to evaluate the AASAC system. This section showcases setting up the ant colony system (ACS) parameters, the dataset used for evaluation, the evaluation metric, the experimental results, human evaluation, and the discussion.
To facilitate the integration of ACS with Neo4j, we developed a custom Cypher procedure using the Java programming language. This procedure was then called from Python 2.7 for executing our experiments. All experiments were conducted on a Mac OS X 11.7.3 system equipped with a 2.3 GHz Quad-Core Intel Core i7 processor and 32 GB of RAM.
5.1. Experimental Setup
ACS employs a range of parameters, each with specific values. The termination condition, denoted by the number of iterations, was set to . Given that ACS incorporates local search, the number of ants, m, was deliberately chosen as a relatively small value, i.e., . Following the guidelines presented in [56], the evaporation rate was established at 0.1, while was set to 0.1. The weight assigned to heuristic information, denoted by , was configured to 2.0, with set to 0.1 and to 0.7. Finally, the initial pheromone trail, , was calculated as , where represents the number of nodes (words) in the graph and d denotes the nearest-neighbor distance.
We performed several experiments to generate summaries at 30% and 50% of the original text. One of the experiments, called H1FF1, utilized the heuristic information function along with the fitness function . Another experiment, named H2FF2, employed the heuristic information function with the fitness function . Table 4 lists all the experiments.
We employed three different settings for the candidate list, namely, 1-gram, 2-gram, and 3-gram, to generate both H1FF1 and H2FF2 summaries. To indicate the appropriate n-gram, the experimental names are further marked by -n. Consequently, the variations are denoted by HiFFj-n to indicate ACS variations when using the heuristic information function with fitness and n-gram, with . For instance, H1FF1-1, H1FF1-2, and H1FF1-3 are our ACS variations when using the heuristic information function with fitness and 1-gram, 2-gram, and 3-gram candidate lists, respectively.
5.2. Dataset
Due to the absence of a standardized Arabic single-document abstractive summary dataset, we utilized a subset of data shared by [8]. The subset comprises 104 documents of varying lengths, with an average of 239 words each. These documents were accompanied by system-generated summaries of 30% and 50% of the original document size, which we considered a baseline.
The documents were collected from different sources, including 79 documents from Saudi Arabian newspapers Al-Riyadh (
Figure 8 displays the distribution of the document lengths based on the number of words. The x-axis depicts the document word counts grouped into categories, while the y-axis represents the number of documents falling into each category. The majority of documents contained between 200 and 300 words.
Since the dataset lacked human-authored summaries for comparison, we sought human professionals to summarize the documents to 30% and 50% of their original size. The complete dataset comprising all 104 documents, alongside their corresponding 30% and 50% human summaries, is freely accessible to interested parties.
5.3. Evaluation Metric
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [59] is a common evaluation measure for summaries. It compares a generated summary with one or more reference summaries according to a set of metrics; ROUGE-N compares the N-gram, ROUGE-L compares the longest word sequence, and ROUGE-SU skip-bigram counts unigrams and any pair of words between them. For example, ROUGE-N is computed as follows:
(11)
(12)
where S is a reference summary, RS is the set of reference summaries, N (in ROUGE-N) is the length of the n-gram, and is the maximum number of n-grams co-occurring in a candidate summary and a set of reference summaries. However, Arabic is a more morphological language. Therefore, using ROUGE to evaluate Arabic texts will not result in a valid comparison.To solve this problem, the
(13)
where lemma-n is a sequence of consecutive n words’ lemmas from a given text.The use of a lemma-based ROUGE metric for evaluating Arabic text summarization systems proves advantageous due to the morphological complexity of the Arabic language. By considering the lemma form of words, which captures the root meaning while disregarding variations in inflections and derivations, the metric provides a more accurate measure of semantic similarity between system-generated summaries and the original text. This approach better accounts for the unique linguistic characteristics of Arabic and improves the evaluation and development of abstractive text summarization systems in this or other Semetic languages. An example highlighting the advantages of using
5.4. Evaluation Results
As most Arabic abstractive summarizers generate one sentence, we compared the performance of our system against the results obtained from the summarizer in [8], which we considered a baseline and we called ANSum. We also applied lemmatization to ANSum summary texts and reference summary texts.
The
The results show that the H2FF2-1 variant achieved the highest recall, outperforming all other variants and the ANSum system. It also attained superior scores across all metrics except for
When these features were excluded, the H1FF1-2 variant—using a 2-gram candidate list—performed better than 1-gram or 3-gram lists in terms of
The
5.5. Human Evaluation
ROUGE, originally designed for extractive summaries, falls short in assessing abstractive summaries due to their divergence from the source text wording. Abstractive summaries focus on conveying meaning rather than verbatim representation, making ROUGE’s word-matching approach inadequate. Alternative evaluation methods are needed to capture the semantic and conceptual aspects of abstractive summarization accurately.
To minimize the human effort required for manual evaluation, we selected 20 random documents from the dataset and enlisted three human evaluators to assess our summarizer, AASAC. Given that the H2FF2 variant achieved higher
Following that, the evaluators provided assessments of the H2FF2 summary using a 2-gram candidate list, answering four questions: (Q1) Does the summary effectively cover the document’s most important issues? (Q2) Does the summary enable readers to understand the main points of the article? (Q3) How would you rate the summary’s readability? (Q4) What is your overall assessment of the summary’s quality? Each question was answered on a scale of 1 to 5, with 1 indicating strong disagreement, 3 for a neutral response, and 5 for strong agreement. Figure 10b summarizes the questionnaire results, indicating that the summary effectively captures the document’s most important aspects, enabling readers to understand its content with average scores of 3.7 and 3.8, respectively. Additionally, the summary received ratings of 2.9 and 3.1 for readability and overall quality, respectively.
5.6. Discussion
The experimental results show the ability of ACS to select salient words to generate an informative summary. These results show the potential of considering relation and collocation features in the heuristic information and the fitness function to boost the results. Moreover, the results indicate that setting 3-gram candidate lists will not improve the summary results for any size even when adding relation and collocation features.
Nevertheless, there are occurrences in which the word segmentation performed by the tokenizer is inaccurate, resulting in an adverse impact on the generated summary. For instance, the term (بقايا: “remains”) is incorrectly split into two tokens, namely, (ب) and (قايا). While the character (ب) is a valid Arabic letter, the token (قايا) does not correspond to a valid Arabic word. Similarly, the word (بغداد: “Baghdad”) is divided into (ب) and (غداد). Although both the character (ب) and the token (غداد) exist in the Arabic dictionary, the token (غداد) conveys a different meaning than (بغداد). Utilizing a robust tokenizer is anticipated to address this issue effectively.
Another limitation of our AASAC summarizer manifests when there is an ambiguous selection between more than one word, as the summarizer shows all words’ possibilities. This can be solved by incorporating a grammar model. Repeated words are scarcely generated, and they can be addressed by adding a penalty or cost to the fitness function.
The human evaluators expressed positivity regarding the content of the summary, indicating that our AASAC summarizer effectively captured the main points of the document. However, they remained neutral when evaluating the readability and overall quality of the summary. This suggests that our summarizer may benefit from incorporating language guidance to enhance these aspects during the summary generation process.
6. Conclusions and Future Work
The growing volume of Arabic text content necessitates efficient summarization methods that can quickly distill valuable information. Prior research reveals three critical limitations that this work addresses: (1) existing nature-inspired approaches (ACO, PSO, etc.) remain exclusively extractive, merely selecting sentences rather than rewriting content; (2) few systems optimize for Arabic’s morphological complexity; and (3) none simultaneously address content coverage and linguistic fluency as interdependent objectives. Our AASAC framework breaks these barriers as the first method to (a) apply swarm intelligence (ACS) for true abstractive generation, (b) incorporate Arabic-specific collocations and word relations with
Future work should focus on three key enhancements: first, improving tokenization accuracy through grammar modeling to resolve ambiguous word selections; second, optimizing the fitness function to penalize word repetition; and third, incorporating knowledge-based techniques like Named Entity Recognition, Coreference Resolution, and Sentiment Analysis to deepen semantic understanding. These advancements would further strengthen AASAC’s ability to handle Arabic’s morphological complexity while maintaining the computational efficiency that makes swarm intelligence approaches particularly valuable for real-world applications.
Conceptualization, A.M.A.; methodology, A.M.A.-N. and A.M.A.; software, A.M.A.-N.; validation, A.M.A.-N.; formal analysis, A.M.A.-N.; investigation, A.M.A.-N.; resources, A.M.A.; data curation, A.M.A. and A.M.A.-N.; writing—original draft, A.M.A.-N.; writing—review and editing, A.M.A.; Supervision, A.M.A. Both authors have read and agreed to the published version of the manuscript.
The dataset described in this work is available at
The authors declare no conflicts of interest.
The following abbreviations (ordered alphabetically) are used in this manuscript:
| AASAC | Arabic Abstractive Summarization using Ant Colony |
| AI | Artificial Intelligence |
| ACO | Ant Colony Optimization |
| ACS | Ant colony system |
| CSO | Cat Swarm Optimization |
| NLP | Natural Language Processing |
| OOV | Out-Of-Vocabulary |
| POS | Part Of Speech |
| PSO | Particle Swarm Optimization |
| RSG | Rich Semantic Graph |
| RST | Rhetorical Structure Theory |
| SI | Swarm intelligence |
| TSP | Traveling Salesman Problem |
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Diagram showing the four stages of the AASAC summarization system.
Figure 2 Graph representation.
Figure 3 Example of AgentOf relation between the entity “United Nations” and the event “indicated”.
Figure 4 Properties of nodes and edges.
Figure 5 A sample Arabic news text, the first few lines only, and its English translation. For convenience we split and colored the first sentence for easy referral in its graphical representation shown in
Figure 6 Graphical representation of the sample text in
Figure 7 A 3-gram candidate list example for the lemma (دراسة: “a study”), indicated by the black node. Red arrows indicate 1-gram connections, green arrows represent 2-gram connections, and blue arrows indicate 3-gram connections.
Figure 8 The distribution of the documents based on their size (number of words) in our dataset.
Figure 9 Comparison of recall calculation using standard ROUGE-1 and
Figure 10 Results of human assessment of 20 randomly selected summaries produced by the AASAC system using H2FF2. (a) Frequency distribution of evaluator selections for 1-gram, 2-gram, or 3-gram summaries. (b) Average ratings provided by evaluators in response to four questions assessing the quality of 2-gram H2FF2 summaries.
A sample list of abstractive summarizers in different languages. Entries are sorted by the language.
| Ref. | Language | Summary Scope | Corpus/Dataset |
|---|---|---|---|
| [ | Arabic | Document level | Dedicated dataset (newspapers) |
| [ | Arabic | Sentence level | KALIMAT |
| [ | Arabic | Headline | Dedicated dataset (Arabic headline summary) |
| [ | Arabic | Document level | Dedicated dataset (newspapers) |
| [ | Arabic | Sentence level | Arabic Gigaword and XL-Sum |
| [ | Arabic | Headline | Arabic headline summary and Arabic Mogalad_Ndeef |
| [ | Arabic | Sentence level | Dedicated dataset |
| [ | English | Sentence level | Dedicated dataset (reviews from Tripadvisor, Amazon, and Edmunds) |
| [ | English | Document level | Dedicated dataset (50 medical research articles) |
| [ | English | Document level | DUC-2001 and DUC-2002 |
| [ | English | Document level | GNU eTraffic archive |
| [ | English | Sentence level | Gigaword for training and testing, and DUC-2004 for testing |
| [ | English | Sentence level | Gigaword and DUC-2004 |
| [ | English | Sentence level | Gigaword, DUC-2004, and CNN/Daily Mail |
| [ | English | Document level | CNN/Daily Mail |
| [ | English | Document level | CNN/Daily Mail and New York Times |
| [ | English | Sentence level | CNN/Daily Mail, New York Times, and Xsum |
| [ | English | Document level | Unstructured Additive Manufacturing texts |
| [ | Hindi | Document level | N/A |
| [ | Kannada | Sentence level | N/A |
| [ | Malayalam | Document level | Dedicated dataset (25 documents from Malayalam newspapers) |
| [ | Vietnamese | Document level | Dedicated dataset (50 documents collected from newspapers) |
List of symbols and their definition.
| Symbol | Definition |
|---|---|
| m | Number of ants. |
| W | Set of words w in the original document. |
| |W| | Number of words in the original document. |
| | Word |
| E | Total number of edges in the document. |
| (i, j) | Direct edge linking word |
| | Cost related to edge |
| | Threshold value which controls the length of the summary (in words). |
| q | Random variable ∈ [0, 1] ⊂ |
| | Set of nearest-neighbor nodes not yet selected by ant k. |
| U | Number of available nodes that have not yet been selected by ant k. |
| | Initial value of the pheromone trail. |
| | Pheromone value for edge |
| | Heuristic information value of edge |
| | Parameter determining the relative importance of the values of the |
| heuristic information. | |
| | Best sub-optimal solution. |
| | Local pheromone evaporation rate ∈ (0,1). |
| | Global pheromone evaporation rate. |
| | Fitness function value for a solution. |
A list summarizing all the experiments.
| Experiment | Heuristic Information Function | Fitness Function |
|---|---|---|
| H1FF1 | | |
| H1FF2 | | |
| H2FF1 | | |
| H2FF2 | | |
List of
| System | LROUGE-1 | LROUGE-2 | LROUGE-L | LROUGE-SU4 | LROUGE-1 | LROUGE-2 | LROUGE-L | LROUGE-SU4 |
|---|---|---|---|---|---|---|---|---|
| ANSum | 0.2795 | 0.1852 | 0.3252 | 0.1928 | 0.3838 | 0.2537 | 0.4207 | 0.2633 |
| H1FF1-1 | 0.4347 ± 0.05 | 0.1980 ± 0.05 | 0.3836 ± 0.06 | 0.2313 ± 0.04 | 0.5099 ± 0.05 | 0.2301 ± 0.06 | 0.4460 ± 0.07 | 0.2666 ± 0.05 |
| H1FF1-2 | 0.4349 ± 0.06 | 0.1891 ± 0.05 | 0.3881 ± 0.07 | 0.2279 ± 0.04 | 0.5102 ± 0.05 | 0.2202 ± 0.06 | 0.4537 ± 0.07 | 0.2638 ± 0.05 |
| H1FF1-3 | 0.4173 ± 0.06 | 0.1839 ± 0.06 | 0.3752 ± 0.07 | 0.2187 ± 0.05 | 0.4957 ± 0.06 | 0.2168 ± 0.06 | 0.4438 ± 0.07 | 0.2562 ± 0.05 |
| H2FF2-1 | 0.4451 ± 0.08 | 0.2102 ± 0.06 | 0.4121 ± 0.08 | 0.2390 ± 0.06 | 0.5161 ± 0.1 | 0.2421 ± 0.07 | 0.4682 ± 0.1 | 0.2725 ± 0.07 |
| H2FF2-2 | 0.4315 ± 0.06 | 0.1946 ± 0.05 | 0.3929 ± 0.07 | 0.2279 ± 0.05 | 0.5086 ± 0.05 | 0.2275 ± 0.06 | 0.4565 ± 0.07 | 0.2645 ± 0.05 |
| H2FF2-3 | 0.4237 ± 0.06 | 0.1916 ± 0.05 | 0.3850 ± 0.07 | 0.2222 ± 0.05 | 0.5020 ± 0.06 | 0.2253 ± 0.06 | 0.4511 ± 0.07 | 0.2593 ± 0.05 |
The
| System | LROUGE-1 | LROUGE-2 | LROUGE-L | LROUGE-SU4 | LROUGE-1 | LROUGE-2 | LROUGE-L | LROUGE-SU4 |
|---|---|---|---|---|---|---|---|---|
| ANSum | 0.4109 | 0.3240 | 0.4704 | 0.3289 | 0.5383 | 0.4255 | 0.5803 | 0.4321 |
| H1FF1-1 | 0.5573 ± 0.10 | 0.3410 ± 0.08 | 0.5353 ± 0.09 | 0.3576 ± 0.08 | 0.6250 ± 0.11 | 0.3805 ± 0.09 | 0.5987 ± 0.08 | 0.3968 ± 0.09 |
| H1FF1-2 | 0.5637 ± 0.07 | 0.3331 ± 0.06 | 0.5393 ± 0.08 | 0.3598 ± 0.06 | 0.6332 ± 0.07 | 0.3725 ± 0.07 | 0.6036 ± 0.07 | 0.4004 ± 0.07 |
| H1FF1-3 | 0.5548 ± 0.05 | 0.3327 ± 0.06 | 0.5355 ± 0.06 | 0.3532 ± 0.05 | 0.6309 ± 0.05 | 0.3764 ± 0.06 | 0.6027 ± 0.05 | 0.3978 ± 0.05 |
| H2FF2-1 | 0.5550 ± 0.07 | 0.3429 ± 0.07 | 0.5463 ± 0.08 | 0.3574 ± 0.06 | 0.6351 ± 0.07 | 0.3902 ± 0.07 | 0.6089 ± 0.07 | 0.4046 ± 0.06 |
| H2FF2-2 | 0.5717 ± 0.05 | 0.3394 ± 0.05 | 0.5523 ± 0.06 | 0.3630 ± 0.05 | 0.6456 ± 0.04 | 0.3813 ± 0.06 | 0.6126 ± 0.06 | 0.4057 ± 0.05 |
| H2FF2-3 | 0.5611 ± 0.05 | 0.3385 ± 0.05 | 0.5486 ± 0.06 | 0.3569 ± 0.05 | 0.6375 ± 0.04 | 0.3827 ± 0.05 | 0.6126 ± 0.05 | 0.4015 ± 0.05 |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.