Content area
This paper analyzes a finite-capacity
Full text
1. Introduction
Modern service systems routinely face finite buffers, rework loops, energy-aware operation, and behavioral customer responses. Treating these features jointly and without restricting arrivals to be Poisson is essential for credible design. Queuing theory (In 1909, Agner Krarup Erlang, a Danish engineer employed at the Copenhagen Telephone Exchange, published the pioneering paper that laid the foundation for what is now known as queuing theory [1]. Erlang modeled the arrival of telephone calls at an exchange as a Poisson process and subsequently provided solutions to the M/D/1 queue in 1917 and the M/D/k queuing model in 1920 [2])—aptly characterized as the mathematics of waiting lines—furnishes a rigorous probabilistic language for analyzing congestion, delay, and capacity allocation in service systems driven by stochastic demand. Its core analytic tools include Markovian and renewal models, embedded chains, transform and generating-function techniques, matrix-analytic methods for quasi-birth–death (QBD) structures, and diffusion/heavy-traffic limits; together, these enable the derivation of stability conditions, stationary distributions, and performance descriptors such as waiting times, loss probabilities, utilizations, and throughput (see the classics [3,4,5,6,7,8,9,10] and Little’s law [11]). For recent references, see [12,13,14,15,16,17,18,19,20,21]. This apparatus is not merely descriptive: it supports normative decisions in mission-critical environments—including emergency response, communication networks, cloud-computing platforms, healthcare delivery, transportation, and logistics—where service quality and economic efficiency must be optimized jointly under uncertainty (cf. [22,23,24]). By articulating principled trade-offs between limited resources and stochastic variability, queuing theory has become a cornerstone for designing resilient, cost-effective service systems.
A salient evolution in the literature is the explicit modeling of customer impatience, which manifests primarily as (i) balking, whereby an arrival observing unfavorable conditions (e.g., long visible queues or high predicted delay) elects not to join, and (ii) reneging, whereby a customer who has joined subsequently abandons after an excessive waiting time. From an operational perspective, impatience fundamentally reshapes effective load, stability, and welfare: it removes demand endogenously, introduces state- and history-dependent losses, and alters equilibrium congestion levels in ways that are highly sensitive to information structure and behavioral assumptions. Classical and modern studies have established its impact across telecommunications, web services, healthcare, and manufacturing [25,26,27,28,29]. From an analytical standpoint, impatience complicates standard embeddings (e.g., arrival-epoch Markov chains) and undermines the convenience of PASTA outside the Poisson regime; even under Poisson input, the state dependence of abandonment and balking may require careful Palm-calculus distinctions between time-average and arrival-average measures [4].
A complementary stream of work studies working-vacation (WV) policies, under which servers reduce—rather than suspend—their service rate during vacation periods. The model introduced by [30] demonstrated that partial service during vacations can yield favorable sojourn-time and queue-length distributions relative to hard shutdowns, by tempering transient congestion while preserving energy or labor savings. Subsequent generalizations have spanned bulk arrivals [31,32], Erlang and general service distributions [33,34,35], multi-server configurations [36,37,38,39], and discrete-time analogues [40,41]. Methodologically, WV policies induce a modulated service mechanism that couples vacation phases with the service process; tractability often follows from phase-type representations or supplementary variables that capture the WV phase jointly with the queue state.
Real traffic is frequently over-dispersed, bursty, or correlated, rendering Poisson arrivals an inadequate approximation. Renewal-input models capture such features parsimoniously. For and finite-capacity variants with WV, notable contributions include explicit sojourn-time characterizations, interruption mechanisms, and phase-type vacation structures [31,42,43,44,45,46]. The passage from Poisson to general renewal input removes PASTA and necessitates a careful separation of time-stationary quantities from arrival-stationary (Palm) quantities. In particular, performance evaluation at pre-arrival epochs typically requires either (i) Markovianization by tracking residual inter-arrival time as a supplementary variable, or (ii) matrix-analytic constructions on an augmented state space; passage between Palm and time averages demands size-biasing and residual-life arguments.
Combining impatience with WV dynamics raises subtle questions about (i) state-dependent stability and ergodicity, (ii) the structural properties of stationary laws, and (iii) optimal control under explicit economic objectives. For Markovian baselines, explicit probability-generating functions across vacation and busy periods have been obtained in [47], while finite buffers, policy comparisons, and cost optimization under impatience/WV interactions appear in [48,49,50,51,52,53,54,55]. A related and practically pervasive mechanism is feedback: after service completion, a customer returns to the queue with fixed probability, modeling rework loops or packet retransmissions. Originating with Takács [56], feedback has been analyzed in Markovian and renewal settings [57,58,59,60,61,62,63,64]. Analytically, feedback alters the effective arrival rate in a state-dependent manner and couples departure streams to subsequent congestion, generating nontrivial interactions with vacation phases and impatience.
Most multi-server studies posit homogeneous servers, an assumption natural for automated systems but frequently violated in human-service or mixed-technology environments. Heterogeneity () destroys exchangeability, induces state-dependent departure rates tied to server occupancy configurations, and generally breaks product-form heuristics. Early work on heterogeneity includes [65,66,67,68,69], with focused developments for (including WV) and transient analyses under impatience [70,71,72]. In contrast, renewal-input queues with heterogeneous servers remain comparatively underexplored; to our knowledge, [73] addresses performance metrics but leaves open a broader structural and economic treatment that accommodates impatience, feedback, and WV control simultaneously.
This paper: model, analytic approach, and scope. We consider a finite-buffer system with two heterogeneous exponential servers, a working-vacation policy, customer impatience in the form of balking and reneging, and Bernoulli feedback. The system captures operational features that co-occur in practice: (i) non-Poisson variability in demand (renewal inter-arrivals), (ii) endogenous demand attrition (balking, reneging), (iii) flexible, partially productive capacity modulation (working vacations), (iv) rework/retransmission loops (feedback), and (v) finite-capacity constraints (blocking). Formally, let be i.i.d. inter-arrival times with cdf , density , and mean ; service times are exponential with heterogeneous rates . A WV policy modulates the active server’s rate according to a finite-state vacation phase (standard variants include single and multiple vacation schemes). Balking is captured by a state-dependent acceptance probability when i customers are present upon arrival (with under finite capacity), while reneging is modeled via exponential patience with hazard that may depend on the queue length. Feedback is Bernoulli with probability , applied to each service completion. Customers are served under FCFS; ties in server selection are resolved by a fixed priority or randomized rule. The system state records the queue length (including any in service), server-occupancy configuration (which server(s) are busy), WV phase, and remaining inter-arrival time. Because , the augmented process is irreducible and aperiodic on a finite state space, hence admitting a unique stationary distribution for any fixed parameter vector; all performance measures below are computed in this stationary regime. We emphasize that finite capacity and exponential impatience together preclude null recurrence even under heavy traffic; positive recurrence is automatic in our augmented finite-state description. Please see Table 1 for details.
Contributions and organization. Our analysis proceeds via the supplementary-variable technique [77], which Markovianizes the renewal input by tracking the remaining inter-arrival time and couples it with the WV phase and occupancy configuration. The resulting piecewise-deterministic Markov process admits a system of linear balance equations whose structure allows a recursive solution for (i) pre-arrival (Palm) stationary probabilities and (ii) time-stationary probabilities at arbitrary epochs, together with explicit Palm–time conversions to reconcile performance metrics that require one or the other viewpoint. This framework facilitates the computation of loss probability, mean queue length, waiting- and sojourn-time descriptors, and server utilizations, while preserving the heterogeneity of and and the modulation induced by WV phases. We then embed the stationary descriptors into an economic objective that trades off (a) capacity costs (including heterogeneous service rates and WV control parameters), (b) congestion costs (delay and queue-length penalties), (c) abandonment costs (reneging-induced losses), and (d) rework costs arising from feedback. The resulting design problem is nonconvex due to the nonlinear dependence of performance measures on , WV parameters, and impatience/feedback primitives; we therefore adopt the Bat Algorithm [74], a metaheuristic with robust exploration–exploitation behavior and favorable convergence in applied queuing design [75,76]. To our knowledge, this is the first study to combine renewal arrivals, heterogeneity, working vacations, impatience (balking and reneging), and Bernoulli feedback in the same finite-capacity two-server model with a full Palm–time reconciliation.
Unified state augmentation under renewal input. By including the remaining inter-arrival time in the state, we retain exactness for general input without resorting to Poissonization or diffusion scaling. This preserves compatibility with balking and reneging rules that depend on instantaneous congestion and the WV phase, while enabling Palm-compatible pre-arrival descriptors. Heterogeneity-aware service dynamics. We treat explicitly and distinguish occupancy configurations—(0), (1 on server 1), (1 on server 2), or (2 in service)—which generate distinct departure hazards and interact with WV states; this non-exchangeability is retained in all performance expressions. Impatience and feedback coupling. Balking modifies the effective arrival stream in a state-dependent way at pre-arrival epochs; reneging introduces additional departures from the queue that depend on both the waiting population and the WV phase; and feedback couples completions to subsequent arrivals. Our formulation keeps these mechanisms explicit and compatible, rather than approximating by constant effective rates. Palm–time reconciliation. Many design metrics (e.g., loss on arrival, experienced waiting time) are naturally Palm quantities, whereas costs tied to time-average congestion require time-stationary laws. We make these distinctions explicit and provide the necessary conversions, avoiding implicit PASTA assumptions that fail under input. Economic design under WV control. We formalize a cost functional that transparently prices (i) heterogeneity in service effort, (ii) responsiveness under WV modulation, and (iii) behavioral losses due to impatience; the resulting optimization illustrates how calibrated heterogeneity and WV tuning jointly improve responsiveness and cost.
Let denote the pre-arrival stationary law on the augmented state space and the time-stationary law. We first establish that, for any fixed parameter vector with finite buffer , the augmented process forms an irreducible and aperiodic Markov chain with a unique stationary distribution; hence, both and exist and are unique. For completeness, we further delineate Foster–Lyapunov conditions under which these conclusions extend to , while our numerical investigations focus on the finite-buffer setting. Next, we obtain closed recursive characterizations of and by balancing over renewal cycles and WV phases, and we develop Laplace–Stieltjes transform representations that render computations tractable for finite N. Building on these representations, we derive explicit performance formulas: (i) the loss probability arising from balking or blocking, (ii) the mean queue length together with the utilizations of heterogeneous servers, (iii) the waiting-time and sojourn-time distributions from both Palm and time perspectives, and (iv) sensitivities with respect to , the WV parameters, and the impatience/feedback primitives. Finally, within a convex–nonconvex composite cost framework, we demonstrate that a Bat Algorithm search identifies operating points that outperform homogeneous-server baselines and hard-vacation controls across broad regions of the parameter space [74,75,76].
Structure of the paper. Section 2 formalizes the model state space, WV mechanism, balking/reneging rules, feedback, and supplementary-variable augmentation for renewal input, with a precise definition of pre-arrival (Palm) versus time-average observables. Section 3 develops the stationary analysis: recursive balance equations, Laplace–Stieltjes transform solutions, and Palm–time conversions. Section 4 reports performance characteristics and sensitivity analyses, including heterogeneity-induced asymmetries in utilization and their interaction with WV phases. Section 5 formulates the economic objective and presents the optimization methodology. Section 6 provides numerical experiments and design maps that expose managerial trade-offs and robustness. Section 7 illustrates the implications on a flexible production facility benchmark. Section 8 concludes with research directions, including diffusion approximations for large N, control under partial state information, and learning-based calibration of impatience and feedback.
2. Model Mathematical Description
This study investigates a single-class queue with two heterogeneous servers, a common FIFO queue, state-dependent balking, exponential reneging, Bernoulli feedback, and multiple working vacations taken by the slower server. A schematic of the system appears in Figure 1. The mathematical description relies on the following notation and assumptions.
Inter-arrival times. Inter-arrival times of successive arrivals form an i.i.d. sequence with cumulative distribution function (density for ), Laplace–Stieltjes transform
and mean inter-arrival timeFinite capacity. The total system capacity is , counting customers both in service and waiting. Arrivals that would exceed capacity are blocked. Finite N ensures positive recurrence of the augmented Markov chain.
State-dependent balking. Upon arrival to a system containing customers, a customer is admitted with probability and balks with probability . We assume
so no one joins a full system and admission propensities are nonincreasing in congestion. (Feedback customers are subject to the same rule; see below) Alternative information structures (delayed or noisy observations) can be incorporated by redefining ; our analysis is unchanged.Service structure and discipline. There is a single common FIFO queue feeding two heterogeneous servers. Server 1 and Server 2 provide exponential service with rates and , respectively, with . Server 1 is always available (no vacations).
Working vacations of Server 2. If Server 2 becomes idle when the queue is empty, it immediately enters a working vacation, whose duration is exponential with rate . During a working vacation, Server 2 remains active but at a reduced exponential service rate . At the vacation’s end, Server 2 switches to regular service if at least one customer is waiting; otherwise, it initiates a new working vacation. This induces “multiple” working vacations separated by possible service epochs.
Reneging (impatience). Whenever both servers are busy and the system contains customers, the queue length equals . Each waiting customer runs an independent exponential impatience timer . A customer whose service has not started before their timer expires abandons the system permanently. Impatience clocks are independent of the queue length process and of all other primitives.
Bernoulli feedback. Upon service completion (either during regular operation or while Server 2 is on a working vacation), a customer departs the system with probability and, with complementary probability , instantaneously feeds back to the input stream and behaves as a fresh arrival (i.e., is subjected to balking and capacity constraints in the current state). This mechanism models rework/retx loops; its coupling with balking is crucial in finite capacity.
All primitive sources of randomness—inter-arrival times, regular and working vacation service times, vacation durations, impatience timers, and feedback decisions—are mutually independent.
3. Steady-State Solution
This section analyzes the queuing system in steady-state using the supplementary variable technique to derive the system’s long-term behavior. The state of the system at time t is modeled as a continuous-time Markov process , with the state space defined as
where the system variables are: The number of customers in the system at time t, including those in service. : The state of the server at time t, defined as
: The remaining inter-arrival time for the next customer arrival at time t.
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
where and are the respective rate probabilities with the remaining inter-arrival time equal to zero denoting that an arrival is about to occur. Then, we introduce the following Laplace–Stieltjes transforms of the steady-state probabilities as follows:Let for and for and denote the steady–state probabilities of having i customers in the system when the server is in state , observed at an arbitrary epoch. For , define
Multiplying Equations (1)–(9) by and integrating with respect to u over , we obtain the transformed relations(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
Summing (10)–(18) and simplifying yields
(19)
Differentiating (19) with respect to s, letting , and using the normalization condition
we obtain(20)
Equation (20) expresses conservation of flow: the long–run average inflow to the system per unit time equals the mean arrival rate . For , define as the steady–state probability (at a pre–arrival epoch) that an arriving customer sees i customers in the system and the server in state . For and , define analogously as the steady–state probability that an arrival sees i customers and server state j. Our first goal is to connect these pre-arrival probabilities to the stationary descriptors at an arbitrary time.Let denote the number of customers in the system at time t, let denote the server state, and let denote the remaining inter-arrival time at t. By Bayes’ formula for conditional probabilities, for each admissible , we have
Consequently,(21)
where is the arrival rate and is the corresponding boundary value at of the transform (defined below). To determine the pre-arrival probabilities, it suffices to compute recursively in terms of (details below). Throughout, denotes the relevant transform and its kth derivative.Step 1: Eliminating .
Setting in (15) yields in terms of :
(22)
Step 2: An expression for .
Substituting (22) back into (15) gives
(23)
where we set . Hence, for ,(24)
Step 3: The value at and higher derivatives.
Differentiating (23) with respect to s and then setting yields
(25)
Differentiating (23) a total of l times with respect to s gives, for ,(26)
Step 4: Compact representation.
Combining (24)–(26), we obtain
(27)
where and with denoting the lth derivative of with respect to s. Equations (21)–(27) provide the desired bridge from arbitrary-epoch quantities to pre-arrival probabilities and furnish the recursive ingredients needed to express all required terms in . For , taking in Equation (14) and using Equation (22), we obtain in terms of as follows:(28)
with By applying Equation (28) to Equation (14) for , we can express in terms of as follows:(29)
where where For in Equation (14), for and for are expressed in terms of as follows:(30)
where and(31)
where where Substituting into Equation (13) yields in terms of and as follows:(32)
where Using Equation (32) in Equation (13), we obtain in terms of as follows:(33)
where and From Equation (12), we have(34)
where where Putting in Equation (18) and using the relation of , we obtain in terms of and as follows:(35)
whereUsing Equation (35) in Equation (18), we get(36)
where where where and . Following the procedure carried out for obtaining and , the probabilities and can be obtained from Equation (17) in terms of and as(37)
where By using Equation (37) in Equation (17), we get(38)
where where Taking in Equation (11), we get in terms of , , and as(39)
where Taking in Equation (12), using the expressions of and we get in terms of as follows:(40)
where Taking into Equation (16) and using the expressions for and along with Equation (40), we obtain in terms of as follows:(41)
whereFrom Equations (22), (28) and (30), and by applying Equation (40) into Equation (32), Equation (41) into Equations (35) and (37), and Equations (40) and (41) into Equation (38), we can express all in terms of The only unknown can be determined from Equation (21). Now, the steady-state probabilities at a pre-arrival epoch can be simply obtained from the rate probabilities through Equation (21). Finally, we formulate the steady-state probabilities at an arbitrary epoch in terms of the corresponding probabilities at a pre-arrival epoch. In particular, setting in Equations (10)–(18), and subsequently invoking relation (21) together with the requisite algebraic reductions, yields where can be computed by using the normalization condition, that is,4. Performance Measures
To guide the design and optimization of the queuing system, we evaluate standard performance indices computed from the steady-state probabilities at arbitrary epochs obtained in the previous section. Throughout, let denote the external arrival rate, N the system capacity, the per-customer reneging rate when applicable, and the probability that an arriving customer decides to join upon seeing i customers in the system (so that is the balking probability). We write for the probability of being in a state with i customers while the server is idle on a working vacation, for a state with i customers and the server in a (reduced-rate) working–vacation service mode, and for a state with i customers and the server in the regular busy mode. The loss probability upon arrival is denoted by
i.e., the probability that an arriving customer finds the system full (pre-arrival probabilities).(i). Mean system size and mean sojourn time.
By definition and Little’s law applied to the effective throughput , we have
(42)
(43)
(ii). Server-state occupancy probabilities.
The long-run fractions of time that the server is (a) idle during a working vacation, (b) busy during a working vacation, and (c) busy during a normal busy period are
(44)
(iii). Flow rates of joining, balking, and reneging.
Let denote the average rate of customers that actually enter service (i.e., join the system), the average balking rate, and the average reneging rate. Then,
(45)
(46)
(47)
The indices (42)–(43) together with and provide a coherent basis for performance evaluation and, in particular, for selecting service-rate parameters that balance congestion, abandonment, and utilization under capacity constraints.
Section 4 reports deterministic evaluations of the stationary distribution and associated indices via the exact recursions and transform-based formulae developed earlier; hence, no stochastic simulation or hardware-dependent averaging was involved. For completeness, we will specify the computational environment (standard workstation complexity per evaluation) and provide a clear rationale for input choices: service and vacation rates were selected to satisfy the structural ordering ; balking and reneging rates were set within ranges consistent with classical impatience models; and buffer sizes were varied to capture light, moderate, and heavy traffic regimes while ensuring stability.
5. Cost Model and Optimization Study
Building on the performance indices derived in the previous section, we now formulate the steady-state expected total cost per unit time for the proposed queuing system. The decision vector comprises three continuous, nonnegative service/vacation rates,
where and are the normal service rates of the two servers (with Server 1 faster than Server 2), and is the working-vacation service rate of Server 2. In practice, queuing managers aim to reduce operating costs subject to stability and operational constraints. Our objective is to determine optimal values where F denotes the steady-state cost function defined below.Unit cost elements.
Let denote the following unit costs: Cost per unit time when Server 2 is idle during a working-vacation period; Cost per unit time when Server 2 is busy during a working-vacation period; Cost per unit time when Server 2 is busy during a normal (non-vacation) busy period; Holding cost per unit time per customer present in the system; Penalty cost per unit time due to balking or reneging; Penalty cost per unit time per lost customer when the system is blocked; Cost per unit of normal service effort; Cost per unit of feedback service effort; Fixed purchase (or capacity) cost per server unit.
Performance indices (given).
Let denote, respectively, the steady-state probabilities that Server 2 is idle on working vacation, and busy on working vacation, busy in normal operation, and the system is blocked; let be the steady-state expected system size; let and be the steady-state rates of balking and reneging, respectively; let be the external arrival rate; and let denote the mean feedback probability. These indices are determined by the stochastic model specified in the previous section.
Total expected cost.
The steady-state expected total cost per unit time is
(48)
Optimization problem.
We seek the cost-minimizing rates under the natural ordering and feasibility (stability) constraints:
(49)
Solution approach.
Because F in (48) is a highly nonlinear function of through the embedded performance indices, closed-form optimality conditions are generally intractable. We therefore adopt a numerical approach based on the Bat algorithm, a metaheuristic inspired by echolocation, to efficiently explore the feasible region in (49) and obtain high-quality approximations to .
Comments
The selection of the decision variables and the cost components is motivated by both modeling fidelity and practical relevance. We briefly elaborate on their roles.
Service and vacation rates.
The triplet forms the natural decision vector because service rates constitute the most direct levers for system performance. In particular,
and represent the normal operating capacities of two heterogeneous servers. Empirically, such asymmetry is common in practice, where one server is technologically superior or operated by a more skilled resource. Distinguishing them allows us to capture realistic differences in throughput and workload distribution.
models the effective rate during a working-vacation period, a concept increasingly relevant in systems where partial service continues during maintenance, energy-saving modes, or off-peak operation. Including in the optimization enables one to quantify the trade-off between reduced productivity during vacations and the associated cost savings.
Cost components.
The decomposition of the total cost into nine elements ensures that all critical operational aspects are explicitly represented: – differentiate between idle and busy states of Server 2 across normal and vacation regimes, thereby capturing the utilization-dependent expenditure structure. – encode congestion and dissatisfaction costs: holding cost for customers in queue, penalties for impatience (balking/reneging), and loss penalties under blocking. This reflects the service quality dimension and its direct impact on customer retention. – measure the marginal effort of normal versus feedback service, thereby penalizing overuse of resources and accounting for the additional burden of reprocessing tasks. accounts for fixed capacity investments, ensuring that expansion or contraction of server capability is consistently weighed against operational benefits.
Modeling rationale.
This parameterization balances analytical tractability with operational richness. By separating performance-determining rates from cost-incurring states, the model provides a transparent mapping from managerial decisions to economic outcomes. The chosen structure also aligns with the literature on cost optimization in heterogeneous and vacation-based queuing systems, ensuring comparability and theoretical robustness.
The proposed framework can, in principle, be extended to accommodate multi-class customer populations or priority-based service policies. In the multi-class setting, the state descriptor must be augmented to reflect both the number and the class of customers in service or waiting, leading to vector-valued probability generating functions and Laplace–Stieltjes transforms, together with class-dependent recursive relations. Similarly, under priority-based disciplines (preemptive or non-preemptive), the balance equations and boundary conditions must be reformulated to incorporate class-specific service rules, which naturally give rise to block-structured transition dynamics. Analytically, such extensions often require the use of matrix-analytic or quasi-birth–death techniques to preserve tractability. While feasible, these generalizations substantially increase the model’s dimensional and computational complexity, and their rigorous treatment lies beyond the scope of the present work.
6. Numerical Results
In this section, we provide a detailed numerical investigation of the multiple working vacation queuing model. The inter-arrival process is examined under a range of canonical distributions, namely deterministic, exponential, and Erlang-2, so as to capture a spectrum of arrival-time variability. The numerical experiments, presented through carefully constructed graphs and tables, are intended to elucidate the influence of key queuing parameters on both the principal performance measures and the associated cost-optimization criteria. To this end, we have developed an R program, (R version 4.5.2), implemented by the authors, which computationally validates and illustrates the analytical formulas derived in the preceding sections. For the entire numerical analysis, representative values of the system parameters and cost coefficients are chosen in an illustrative yet systematic manner, ensuring that the essential qualitative behaviors of the model are faithfully exhibited.
In Table 2, we report the steady-state joint distributions observed at both pre-arrival and arbitrary epochs for three representative inter-arrival time laws—deterministic, exponential, and Erlang–2—all calibrated with rate parameter . The remaining system parameters are specified as , , , , , and . Customer balking is modeled through the state-dependent function for , where the total system capacity is fixed at . A noteworthy feature emerges in the exponential case: because of the memoryless property of the exponential distribution, the pre-arrival probabilities coincide exactly with the arbitrary-epoch probabilities , reflecting the fundamental Markovian nature of this setting.
6.1. Numerical Results of System Performance Measures
Table 3 and Figure 2, Figure 3 and Figure 4 report a sensitivity analysis of the principal performance measures with respect to the service rates during normal and working-vacation regimes, namely , , and . Throughout, inter-arrival times are taken to be deterministic with mean . Unless otherwise stated, the baseline parameter vector is
and the balking function isMain findings. The numerical results show that increasing any of the service rates , , or leads to a monotone decrease in the congestion and impatience indicators:
where is the mean system size, the mean sojourn time, the average balking rate, and the average reneging rate. In parallel, the effective joining rate increases with faster service, reflecting improved responsiveness and reduced anticipated delay.Regarding server-state utilization, the probability that Server 2 is busy during normal (non-vacation) periods, denoted , decreases as each of , , and increases. During working vacations, the busy probability of Server 2, denoted , decreases with and with , but increases with . This asymmetric response is consistent with the heterogeneity of the two servers and the division of work between normal and vacation regimes.
Effect of balking. Comparisons between systems with and without balking reveal that
while These patterns are qualitatively in line with queuing intuition: allowing balking filters out customers facing long anticipated waits, thereby lowering congestion measures and shifting occupancy from normal periods to working vacations.Secondly, in Table 4 and Figure 5, Figure 6 and Figure 7, we analyze the impact of feedback probability arrival rate working vacation rate , and impatience rate on various performance measures. Here, we assume that the inter-arrival times follow an Erlang-2 distribution. The balking function is taken as for , where The parameters are taken as , , , , , and From Table 4 and Figure 5, Figure 6 and Figure 7, we observe the following: With fixed values of , , and an increase in leads to increases in and as expected. This in turn increases , , and Consequently, the probability of customer loss due to system size limitation significantly rises. Notably, the probability that Server 2 is idle during the working vacation period decreases with With fixed values of , and when the working vacation rate increases, Server 2 rapidly switches to the normal busy period at which the customers are served at a higher rate. Therefore, the characteristics , , , , and decrease. This implies an increase in and in because the mean working vacation time decreases. This trend matches absolutely with the realistic situation. With fixed values of , and when the impatience rate increases, the system characteristics , , and increase, while , , , and decrease. This relationship highlights that higher impatience rates lead to smaller system sizes on average and higher average reneging rates. For fixed values of , , and , an increase in the feedback probability leads to higher values of and as intuitively expected. Consequently, this increase results in elevated values of , , and the probability of customer loss due to system size limitation . Conversely, and the probability that Server 2 is idle during the working vacation period monotonically decrease.
Effects of , , , and on system characteristics.
| 0.3649478 | 0.5285393 | 0.5156073 | 0.7485630 | 0.8706453 | 1.2585121 | |
| 0.5213540 | 0.5285393 | 0.7365818 | 0.7485630 | 1.2437789 | 1.2585122 | |
| 0.6995574 | 0.9984455 | 0.6989834 | 0.9965871 | 0.6967290 | 0.9899939 | |
| 0.0004426 | 0.0015545 | 0.0010166 | 0.0034129 | 0.0032710 | 0.0100061 | |
| 0.0003104 | 0.0013975 | 0.0011433 | 0.0047919 | 0.0068955 | 0.0251020 | |
| 0.5285393 | 0.5253591 | 0.7485630 | 0.7430044 | 1.2585121 | 1.2486714 | |
| 0.5285393 | 0.5253591 | 0.7485630 | 0.7430044 | 1.2585122 | 1.2486714 | |
| 0.9984455 | 0.9984870 | 0.9965871 | 0.9966743 | 0.9899939 | 0.9901889 | |
| 0.0015545 | 0.0015130 | 0.0034129 | 0.0033257 | 0.0100061 | 0.0098111 | |
| 0.0013975 | 0.0013191 | 0.0047919 | 0.0045670 | 0.0251020 | 0.0243441 | |
| 0.5288857 | 0.5285393 | 0.7501344 | 0.7485630 | 1.2703945 | 1.2585121 | |
| 0.5288857 | 0.5285393 | 0.7501344 | 0.7485630 | 1.2703948 | 1.2585122 | |
| 0.9984379 | 0.9984455 | 0.9965519 | 0.9965871 | 0.9897164 | 0.9899939 | |
| 0.0015621 | 0.0015545 | 0.0034481 | 0.0034129 | 0.0102836 | 0.0100061 | |
| 0.0010108 | 0.0013975 | 0.0035746 | 0.0047919 | 0.0199789 | 0.0251020 | |
Effects of , and on and .
[Figure omitted. See PDF]
Figure 6Effects of , and on and .
[Figure omitted. See PDF]
Figure 7Effects of , and on and .
[Figure omitted. See PDF]
6.2. Numerical Results of Cost Optimization
Our aim in this subsection is to determine the optimal service rates , , and that minimize the total expected cost function under various operational scenarios. The unit cost coefficients are prescribed as
Customer balking is modeled by the quadratic function with the system capacity fixed at . Unless otherwise specified, the parameters remain constant throughout the analysis. For the numerical optimization, we implement the Bat Algorithm, a population-based metaheuristic, with a population size of and a maximum number of iterations .In Figure 8 and Figure 9, the convexity and optimality of the total expected cost function are clearly observed. These figures illustrate that there exist specific values of the service rates , and that minimize the total expected cost function for the selected model parameters.
According to Table 5, as the arrival rate increases, we observe that the optimal service rates at the primary and secondary stations, and , increase, whereas the auxiliary-rate decreases. Consequently, the optimal expected cost
is strictly increasing in . This behavior is natural: higher inflow intensifies congestion, enlarging the expected system size and—holding the optimality conditions fixed—raising the total expected cost. By contrast, when the impatience (reneging) rate increases, all three optimal rates , , and rise, and so does the objective . This accords with intuition: greater impatience penalizes delay more heavily, incentivizing faster service to mitigate reneging, but the higher service intensities themselves entail larger operating costs, thereby elevating the overall optimum. Moreover, the presence of balking amplifies these effects. In models with balking, the optimal rates , , and , together with the resulting optimal cost , exceed their counterparts in otherwise identical systems without balking. Thus, classical impatience behaviors—balking and reneging—unambiguously worsen total expected cost by compelling more aggressive (and costlier) service policies. The comparative statics reported in Table 5 are consistent with operational practice and provide actionable guidance for real-time implementations: appropriate tuning of the service rates , , and can hedge against higher arrival intensities and customer impatience, while explicitly accounting for balking to avoid systematic under-provision of capacity.6.3. Discussions
On the compact feasible set , the mapping is continuous (indeed locally Lipschitz) since it is a smooth composition of linear recursions in the stationary probabilities. Hence, a global minimizer exists. The terms price server phases (idle/WV/regular), penalizes congestion, penalizes behaviorally lost work, accounts for blocked demand, and proxies energy/labor with rework overhead.
We initialize bat uniformly over the feasible cone , project after each move to enforce ordering, and use logarithmic step sizes for rates to improve exploration across scales. A small amount of Gaussian jitter on the best incumbent accelerates escape from flat basins. We report the best feasible incumbent after T iterations and confirm robustness through five independent runs.
We adapted the Bat algorithm in three essential ways to reflect the mathematical structure of our queuing system. First, the decision variables are constrained by the intrinsic service-rate ordering
and rather than introducing penalty terms, every candidate solution is projected directly onto this feasible cone (after proposing moves in log-rate space to preserve positivity and scale-invariance). Second, fitness evaluation is based not on simulation but on the exact recursion formulas derived from the Laplace–Stieltjes transform of the system, with special analytic limits at the singular points to maintain numerical stability and continuity. Third, because the resulting objective is deterministic, piecewise–smooth, and nonconvex, we modified the Bat dynamics to add controlled jitter around the incumbent best solution in order to escape plateaus, while keeping acceptance and loudness schedules less aggressive than in the stochastic case. These adaptations ensure that the search respects the system’s structural constraints, that each evaluation is mathematically exact and stable, and that the algorithm remains effective on the nonconvex landscape induced by the working-vacation queue with balking, reneging, and feedback.We would like to stress that the numerical results in Section 6 are based on exact recursive formulations of the stationary distribution and therefore are not subject to sampling variability as in simulation-based studies. Consequently, conventional statistical tests are not directly applicable. To ensure robustness, however, we performed several independent optimization runs with different initializations and obtained highly consistent solutions, indicating that the reported outcomes are stable with respect to the metaheuristic search. The performance measures themselves are deterministic functionals of the stationary probabilities, so their accuracy hinges on numerical stability rather than statistical fluctuations, which has been carefully verified in the implementation.
The convergence of the proposed optimization procedure must be understood in a numerical rather than purely analytical sense. Because the underlying cost functional is evaluated exactly (through the backward recursions and analytic limits at singular points), the sequence of best solutions generated by the Bat algorithm is deterministic and exhibits monotone improvement, eventually stabilizing within a narrow tolerance. However, as is well known for population-based metaheuristics, global optimality cannot be guaranteed theoretically; what can be claimed is convergence to a numerically stable and high-quality local optimum, with robustness reinforced by projection into the ordered feasible cone and by independent replications of the search. From a computational perspective, the recursion structure makes each fitness evaluation linear in the buffer size, so the optimization remains tractable: in practice, for realistic system dimensions, the entire optimization completes within seconds to minutes on standard hardware. This ensures that the solution time is fully acceptable for decision-support and design purposes, even though a theoretical guarantee of global optimality is absent.
In practical environments such as call centers, healthcare units, or flexible manufacturing, the model would be implemented by first calibrating its structural parameters—arrival intensity, patience (reneging) distributions, balking probabilities, service-time laws under regular versus working-vacation modes, and feedback frequencies—directly from operational data. Concretely, this requires access to high-resolution transactional logs (arrival timestamps, abandonment times, service completions, switch-overs to reduced-capacity states, and post-service return behavior). Standard statistical procedures (maximum likelihood, nonparametric hazard estimation, survival analysis) would then be used to fit the inter-arrival and service-time distributions as well as the patience and balking functions. With these empirically derived inputs, the queuing model can be parameterized and the optimization framework applied to support staffing, scheduling of reduced-capacity periods, and service-level decisions tailored to the specific application domain.
The adoption of the Bat Algorithm was motivated by the structure of our optimization problem, which involves a deterministic yet piecewise-smooth objective derived from LST-based recursions with removable singularities. Such a landscape is nonconvex and may exhibit flat regions, making derivative-free methods particularly suitable. The Bat Algorithm, with its built-in balance of global exploration and adaptive local exploitation, proves efficient when coupled with our log-space parameterization and projection scheme enforcing . Although alternative metaheuristics (e.g., GA, PSO, DE) could also be applied and are expected to produce comparable optima, the difference would primarily concern convergence speed and robustness. Our choice therefore reflects adequacy and computational efficiency rather than exclusivity, and the methodology remains transferable to other metaheuristic frameworks.
7. Model Application in Practice
–. We consider a flexible manufacturing facility that must concurrently support make-to-order (MTO) jobs—customer-specific production released upon demand—and opportunistic make-to-stock (MTS) runs that replenish standard inventory. The shop comprises two heterogeneous machines (servers) operating under a first-in, first-out (FIFO) discipline. Both machines primarily process MTO work. Incoming MTO orders that find both machines busy enter a finite buffer of size N; an arrival that encounters a full buffer is blocked and lost; otherwise, it joins the queue.
–. To sustain high utilization during light-load periods while preserving responsiveness to MTO demand, one machine (Server 2) may enter a working vacation (WV) whenever it becomes idle. During a WV, Server 2 continues to process jobs at a deliberately reduced service rate (for example, under a low-power or maintenance mode). The other machine (Server 1) remains fully available for MTO work at rate . This mechanism reflects the widely used modeling idea that a server need not switch completely off; instead, it can operate at a lower productive rate rather than halt altogether [30,78,79]. At the completion of a WV, Server 2 adapts to the system state: if the system is empty, it immediately initiates a new WV, capturing extended low-load spells; otherwise, it returns to normal operation and serves at its regular MTO rate . This interruption/continuation logic, standard in the WV literature, captures practical rules that preserve responsiveness when backlog exists [78,79].
–. Customer impatience is represented through two complementary behaviors. Balking allows a newly arriving order to decline entry when congestion is visible; in a finite-capacity setting, this is modeled by a state-dependent joining probability that depends on the observed system size i, generalizing the classical economics-of-balking perspective [80]. Reneging captures abandonment by customers who have joined but lose patience while waiting; we posit an exponential patience (hazard) rate while in the queue, following standard models of queues with abandonment [81,82,83]. Finally, we incorporate Bernoulli feedback at service completion: with probability , a job is routed back to the queue for rework (e.g., remedial processing or quality corrections), whereas with probability , it departs permanently. This abstraction is classical for rework loops and repeated service attempts [84,85].
–. In aggregate, the model integrates four operational realities—finite capacity, working vacations with interruption, impatience (balking and reneging), and rework via Bernoulli feedback—within a unified and tractable framework. The purpose is to support design decisions such as sizing the buffer N, tuning the aggressiveness of WV operation (choice of and the vacation policy), and quantifying the trade-offs among loss, abandonment, throughput, and cycle time in a two-server, heterogeneous MTO–MTS environment [86].
While the present work is developed in a purely theoretical framework, the incorporation of empirical operational logs would offer a natural and valuable extension. Such data could serve to calibrate the impatience and feedback parameters—typically idealized in analytical models—by confronting them with observed inter-arrival, patience, and retry patterns. This would not only validate the adequacy of the exponential-type assumptions but also reinforce the practical relevance of the theoretical findings without altering their mathematical consistency.
In large-scale systems, the cumulative cost of repeated fitness evaluations may pose challenges; however, this can be overcome through closed-form recursions, analytic treatment of singularities, and projection-based feasibility enforcement, which together ensure numerical stability and preserve scalability without simulation overhead.
8. Conclusions and Perspectives
We developed a steady-state analysis for a finite-buffer queue with two heterogeneous servers, impatience in the form of balking and reneging, Bernoulli feedback, and multiple working vacations. Leveraging the supplementary-variable technique and a tailored recursion, we derived balance equations and system-size distributions at both pre-arrival and arbitrary epochs. From these, we obtained standard performance measures, including blocking and reneging probabilities, mean queue length, mean sojourn time, throughput, and server utilizations. On the managerial side, we formulated an economic objective that trades off congestion, abandonment, and capacity costs, and we optimized the decision variables using the Bat Algorithm. A numerical study implemented in R demonstrated how service heterogeneity, vacation parameters, buffer size, feedback probability, and impatience intensities jointly shape delay, loss, and total cost, thereby providing actionable guidance for the design of call centers, flexible manufacturing facilities, and telecommunication access networks.
Our analysis assumes exponential service times and a renewal input process. While these choices encompass a broad range of applications, richer service-time models and strongly correlated arrivals would require additional structure such as phase-type or Markov-modulated input. A further limitation is the assumption of FCFS with no routing asymmetry; incorporating priority or skill-based dispatching under heterogeneity would improve fidelity in many service contexts.
Perspectives. Several extensions are both impactful and technically tractable. One avenue replaces exponential service with phase-type distributions, yielding a counterpart amenable to matrix-analytic and matrix-geometric methods while retaining the WV/feedback/impatience structure [66,87]. Another introduces nonstationarity and burstiness through Markov-modulated or batch Markovian arrivals, or via time-varying staffing, following classical MAP/MMPP/BMAP frameworks and time-dependent staffing models [88,89,90,91]. Reliability considerations can be incorporated by adding breakdown and repair states coupled with WV control and impatience/feedback, connecting to network performance bounds and product-form insights [92,93]. Endogenizing dispatching and staffing with heterogeneous speeds—for example, via skill-based or priority routing—opens the door to the joint design of routing and WV control, building on established service-system modeling [94,95]. Multi-class demand with class-dependent balking, reneging, and feedback would enable analysis of fairness and service-level guarantees using scheduling and prioritization tools [95]. On the data side, renewal laws and impatience hazards can be estimated from operational logs, and feedback/WV parameters can be calibrated by likelihood or Bayesian approaches that align with the supplementary-variable framework [96,97]. Learning-augmented control—combining analytical models with online policy learning to adapt service rates, admission, or vacation durations under stability constraints—is a promising direction, with index policies and reinforcement learning furnishing principled mechanisms [98,99]. Risk-sensitive and robust design objectives, such as CVaR or chance constraints, would address tail performance and model uncertainty [100,101], while rare-event techniques and large deviations can quantify extreme blocking and delay beyond average-case metrics [102,103,104]. Finally, computational acceleration exploiting sparsity and Toeplitz structure in the balance equations—together with state-of-the-art matrix-analytic libraries and fast matrix-exponential routines—would scale the numerics to large buffers and richer phase structures [87,105].
Palm–time viewpoint. Across these extensions, maintaining an explicit Palm–time conversion under non-Poisson input is essential, since performance targets often mix arrival-based and time-based metrics. This separation prevents inadvertent PASTA assumptions and leads to correct comparisons of customer-experienced versus system-averaged quantities.
On the case . When the buffer is unbounded, positive recurrence is no longer guaranteed. A Foster–Lyapunov function yields
so stability obtains whether the effective service capacity—accounting for the WV mix and reneging—dominates the effective admission rate. Our focus on finite N ensures positive recurrence without additional assumptions.Closing remark. By unifying heterogeneity, impatience, feedback, and working vacations within a renewal-input framework, this study provides analytical foundations and deployable levers for modern service operations. The outlined directions connect the model to established methodological pillars and point toward robust, data-informed, and learning-enabled designs.
Conceptualization, A.G.; methodology, A.G.; validation, A.G. and S.B.; formal analysis, A.G.; investigation, A.G.; writing—original draft preparation, A.G. and S.B.; writing—review and editing, A.G. and S.B. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Not applicable.
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
The authors gratefully acknowledge the Editor-in-Chief, the Associate Editor, and the three anonymous referees for their insightful comments and valuable suggestions, which greatly enhanced both the clarity and the overall quality of this work.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Notation for the queuing model with working vacations, balking, and reneging.
| Primitives and Parameters | |
| External arrival rate; mean inter-arrival time | |
| Inter-arrival cdf, density, and Laplace–Stieltjes transform. | |
| N | Total capacity (in service + waiting). |
| Join/balking probability on seeing i customers. | |
| Exponential impatience (reneging) rate per waiting customer. | |
| Departure vs. feedback probability at service completion. | |
| Regular service rates of Server 1 and Server 2 | |
| Server 2 service rate during a working vacation. | |
| Working-vacation termination rate. | |
| State processes and stationary objects | |
| Total customers in system at time t (queue + in service). | |
| Server 2 state: 0 = idle on WV; 1 = busy on WV; 2 = busy (regular). | |
| Remaining inter-arrival time at t. | |
| Stationary version of | |
| LST of | |
| Boundary (rate) probability at | |
| Arbitrary-epoch stationary probability of | |
| Pre-arrival (Palm) probability; | |
| Performance measures | |
| Fractions of time: Server 2 idle on WV/busy on WV/busy regular. | |
| Loss probability on arrival: | |
| Mean system size: | |
| Mean sojourn time: | |
| Effective joining rate. | |
| Average rates of balking and reneging. | |
| Cost model | |
| Unit cost coefficients used in the economic objective. | |
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Schematic representation of the
Figure 2 Effects of
Figure 3 Effects of
Figure 4 Effects of
Figure 8 Effects of
Figure 9 Effects of
State-of-the-art summary.
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.