Full Text

Turn on search term navigation

1. Introduction

In the era of Industry 4.0, rapid advancements in computer and next-generation information technologies have led to the maturation of intelligent factories, smart vehicles, and advanced aerospace systems. Traditional sectors like maritime transport have also begun embracing intelligent development, resulting in the emergence of smart ships [1]. Smart ship refers to a modernized ship that obtains real-time shipping data of the ship through sensor technology, communication technology, and other intelligent means and realizes intelligent operation in navigation and management by using computer technology, automatic control technology, etc., and possesses the characteristics of safety, environmental protection, economy, and reliability.

However, compared to smart cars and the aerospace industry, the progress of ship intelligence has been relatively slow. This is mainly due to the unique characteristics of ships, such as complex sailing environments, limited communication conditions, and the high specialization of traditional systems. On one hand, ship electronic systems have long relied on specialized industry protocol standards [2]. Although these standards offer a high degree of standardization, the protocols are ill-suited to meet the connectivity requirements of modern, general-purpose intelligent devices [3]. During the process of ship digitization, intelligent systems need to integrate diverse data and functional modules. However, communication between systems continues to depend on heterogeneous protocols, resulting in inefficient information exchange and inadequate system interoperability [4]. Therefore, a novel ship communication network architecture is necessary to support heterogeneous data interactions among intelligent terminals [5]. On the other hand, the marine environment presents significant challenges to ship information systems. Especially under harsh sea conditions, the reliability of information transmission becomes particularly crucial. Therefore, smart ships urgently need highly reliable communication network protocols to guarantee their safe operation [6].

Traditional ships employ communication technologies such as the National Marine Electronics Association (NMEA) protocol based on the Controller Area Network (CAN) [2], as well as network protocols like the Modbus protocol. These protocols cannot meet the demands of intelligent operations. In advanced ships, a large amount of data from complex devices create high demands for bandwidth and various transmission capabilities. Ethernet technology, known for its high bandwidth and affordable transmission, is now being used in intelligent vehicles, smart factories, aerospace, and many other smart applications. However, standard Ethernet cannot handle real-time and critical data streams and is devoid of high-reliability redundancy fault-tolerant mechanisms. Consequently, Ethernet technology in existing ship information systems is mainly used for non-critical business data transmission [7]. Therefore, intelligent ships urgently need Ethernet protocol technologies that support real-time services and redundant transmission.

To tackle these challenges, new networking technologies have emerged. Software-Defined Networking (SDN) is a novel network architecture that decomposes the traditional distributed hardware network into a centralized control plane and a data forwarding plane, representing a new generation of network technology programmable via software [8]. The software-programmable control plane can be applied to large networks for unified path planning and dynamic scheduling. Its programmable data plane supports custom data structures, enabling the unification and transformation of data from different protocols and facilitating communication between heterogeneous systems [9].

Additionally, Time-Sensitive Networking (TSN), based on the Medium Access Control (MAC) of Ethernet technology, has been developed to address the lack of real-time processing and traffic scheduling in traditional Ethernet [10]. TSN is a collection of protocols proposed by the IEEE 802.1 working group for industrial internet applications. Originating from the Audio Video Bridging (AVB) protocol responsible for real-time audio and video transmission, TSN now includes time-sensitive transmission protocols supporting various data streams, such as high-precision time synchronization, traffic shaping and scheduling, redundancy fault tolerance, and network configuration protocols [11]. TSN technology has been widely applied in autonomous driving systems of intelligent vehicles [12] and shows promising prospects in advanced aerospace [13].

In summary, to meet the needs of smart ships for heterogeneous communication protocol integration and a high real-time, reliable transmission of high-priority data, this paper proposes a network architecture for smart ship communication systems based on SDN and TSN technologies, termed Ship Software-Defined Time-Sensitive Networking (SSDTSN). Among them, the control plane adopts SDN technology to realize the convergence of heterogeneous protocols of multi-devices on ships as well as configurations such as path selection, and the data plane adopts TSN-based Ethernet technology to ensure the reliability and real-time nature of the data transmission process.

Currently, TSN’s time synchronization and traffic scheduling protocols guarantee a real-time communication performance. Meanwhile, its redundancy fault-tolerance protocol enhances transmission reliability by replicating high-priority traffic across multiple paths, ensuring seamless redundancy [14]. However, the TSN protocol itself does not provide guidelines for selecting these reliable multipaths. Therefore, an efficient path selection algorithm is required to realize the redundancy fault tolerance mechanism of TSN. Given that TSN is a protocol standard for the MAC layer in the data link layer in the Ethernet model, the selection of multiple paths in the redundancy mechanism of a data plane TSN switch depends on the path selection algorithms implemented in the SDN control layer. The realization of highly reliable redundant transmission over TSN in the SSDTSN architecture requires the selection of multiple data transmission paths based on various link characteristics. Therefore, this paper focuses on data link layer path selection based on the SDN data plane.

Existing research primarily utilizes shortest path algorithms like Shortest Path First (SPF) for SDN network path selection, with limited studies employing deep reinforcement learning (DRL) for the optimal selection of multiple redundant paths in basic SDN architectures. Therefore, within the proposed SSDTSN framework, this paper designs a ship redundant path selection algorithm based on a Double Duel Deep Q-Network (D3QN) and Graph Convolutional Network (GCN). The GCN is a type of neural network designed for processing and analyzing graph-structured data [15]. GCNs leverage the structural details of network topology to accurately model the relationships and dependencies among various nodes, such as network switches and links, within a communication network. Integrating the GCN with the D3QN enables the algorithm to learn the optimal path selection strategy through reinforcement learning and utilize the inherent graph structure of ship communication networks for more informed and efficient decision making. The algorithm selects multiple redundant paths for transmitting high-priority traffic to meet smart ships’ low-latency and high-reliability requirements.

The main contributions of this paper are as follows:

1.. By integrating SDN and TSN technologies, we propose a novel network architecture for smart ship information systems.
2.. We establish a switch path selection model for ship software-defined networks and design a path selection algorithm based on a D3QN and GCN, along with a ship redundant multipath selection algorithm.
3.. Through simulations, we validate the convergence and effectiveness of the proposed algorithm. Experimental results indicate that the algorithm surpasses existing methods in terms of latency, packet loss, and bandwidth utilization in the simulated network topology.

The remainder of this paper is organized as follows: Section 2 reviews the current status of path selection algorithms under software-defined networks. Section 3 presents the optimized architecture of ship software-defined time-sensitive networks and the modeling of switch path selection. Section 4 details the design of the path selection algorithm based on a D3QN and the ship redundant multipath selection algorithm. Section 5 provides an experimental evaluation of the proposed algorithm. Finally, Section 6 concludes this paper and discusses future work.

2. Related Work

The mainstream path selection algorithms for SDN and TSN can be broadly categorized into the following types: traditional shortest path optimization algorithms, algorithms based on deep learning and machine learning, algorithms based on reinforcement learning, and algorithms based on DRL [16].

A.
Traditional Shortest Path Optimization Algorithms

In Ethernet path selection, the predominant algorithms are route optimization methods based on Shortest Path First (SPF) [17]. These algorithms primarily use a single metric, such as bandwidth utilization or transmission latency, as the link cost and employ Dijkstra or Bellman–Ford algorithms to select the shortest path [18]. However, due to the reliance on a single metric, the selected path may not be optimal under comprehensive conditions and can easily lead to link congestion. In recent years, extensions of SPF that have aimed to reduce congestion have incorporated multiple metrics in their research. Heuristic algorithms [19], genetic Ant Colony (ACO) algorithms [20], and Simulated Annealing algorithms [21] can effectively select better paths and alleviate network congestion. Nevertheless, intelligent optimization algorithms introduce path selection latencies owing to their computational complexity. Therefore, traditional shortest path optimization algorithms exhibit limitations when addressing complex, multi-objective network optimization problems. There is a necessity to explore more efficient algorithms to meet the performance demands of contemporary networks.

B.
Routing Algorithms Based on Deep Learning and Machine Learning

In communication systems based on SDN networks, deep learning, and machine learning have been implemented to improve performance in areas such as routing selection and scheduling. Ampratwum et al. [22] proposed a framework based on deep neural networks to identify the quality of service requirements of flows and provide the necessary routing strategies; tests showed that it met latency requirements faster than heuristic algorithms. Awad et al. [23] developed a machine-learning-based multipath routing framework that selects routing schemes based on network states and routing requests, addressing the multipath routing problem in SDN with link constraints and flow rule space constraints. Azzouni et al. [24] utilized Long Short-Term Memory (LSTM) recurrent neural networks and deep neural networks to learn traffic characteristics. They used supervised learning methods to choose routing paths, which helped to improve network throughput and lower routing costs. However, routing optimization algorithms that rely on supervised learning need large-scale datasets to be collected and labeled, leading to high implementation costs and an inability to meet the performance needs of data transmission [25]. Consequently, although deep learning and machine learning methodologies offer theoretical advantages, their widespread adoption in practical network environments is constrained by high implementation costs and a dependence on large-scale datasets.

C.
Routing Algorithms Based on Reinforcement Learning

Reinforcement learning is a dynamic optimization algorithm where agents make decisions through continuous interactions with the environment, making it suitable for solving routing optimization problems.

Q-learning is a widely adopted reinforcement learning algorithm that identifies optimal strategies through environmental interactions. The Q-value estimates the expected return of taking a specific action in a given state [26]. A Q-table, structured as a two-dimensional matrix, systematically stores the Q-values for all possible state-action pairs, encompassing the full range of states and their corresponding actions. Using Q-values and the Q-table, Q-learning effectively evaluates and updates state-action pairs’ values. This iterative process enables the algorithm to converge on the optimal Q-value function, facilitating the determination of optimal decision-making policies [27].

Houda Hassen et al. [28] explored the deployment of a Q-learning algorithm to solve routing optimization from the perspective of minimizing latency. By combining greedy and SoftMax methods, they proposed an improved Q-learning algorithm based on congestion avoidance. Rischke et al. [29] introduced the QR-SDN algorithm based on Q-learning, which can be applied to multipath routing between terminal switch pairs while maintaining flow integrity. Casas-Velasco et al. [30] proposed an intelligent routing algorithm based on reinforcement learning, using link state information to make routing decisions to adapt to dynamic traffic and meet quality of service requirements. However, smart ships require fine-grained control over multi-source heterogeneous data streams, especially for the real-time and reliable transmission of critical data such as course control, emergency braking and steering, and alarm information. As intelligent maritime networks expand, the implementation of reinforcement learning algorithms requires increased storage capacity to maintain extensive information such as system states, action sets, and reward values within the Q-table. The expansion of data volumes has resulted in prolonged query times, which in turn have caused latency in path selection, elevated network latency, and an uptick in packet loss rates. Therefore, although reinforcement-learning-based routing algorithms demonstrate potential in dynamic network environments, their application is constrained in networks characterized by high complexity and stringent real-time requirements. To address these challenges and fulfill practical application needs, it is essential to optimize the algorithmic structure.

D.
Routing Algorithms Based on DRL

In complex, large-scale, or continuous environments, the state and action spaces can become exceedingly vast, causing the Q-table to expand dramatically and resulting in prohibitive storage and computational costs. The DRL employs deep neural networks to approximate the Q-value function, called Q-networks, to address this limitation [31]. These Q-networks enable efficient learning and decision-making processes within intricate environments by effectively managing the complexities associated with extensive state and action spaces.

W. Liu et al. [32] employed Deep Q-Networks (DQNs) and a Deep Deterministic Policy Gradient (DDPG) to construct deep-reinforcement-learning-based routing (DRL-R). Compared to Open Shortest Path First (OSPF), DRL-R achieved shorter flow completion times, a higher throughput, better load balancing, and improved robustness, with the DDPG outperforming the DQN. D. Xia et al. [33] proposed a routing algorithm based on DRL for heterogeneous factory networks. Utilizing a Double DQN, they implemented action selection and evaluation through different value functions to address the overestimation problem. Shinde et al. [34] introduced a DRL algorithm called Advantage Actor–Critic (A2C), where the advantage function measures how much better an action is compared to other actions in a given situation.

In DRL algorithms applied to path selection problems, the DQN and its variants, which are particularly suited for discrete action spaces, are predominantly utilized. The DQN algorithm is an algorithm based on Q-learning that replaces the traditional Q-table with a deep neural network to estimate the value function of state-action pairs [31]. However, the original DQN faces certain issues, leading to the development of several variants, such as a Double DQN and Dueling DQN [35].

The Double DQN addresses the problem of Q-value overestimation in a traditional DQN by introducing an online network and a target network. In this setup, the online network is responsible for selecting the next action. In contrast, the target network evaluates the Q-value of that action, thus avoiding the bias that arises when action selection and value evaluation use the same network. The Dueling DQN, on the other hand, optimizes the network structure by decomposing the Q-value function into a state value function and an advantage function. This decomposition enhances the policy’s decision-making capability and improves learning efficiency.

Combining the strengths of a Double DQN and Dueling DQN, the D3QN enhances the algorithm performance [36]. The D3QN architecture includes two streams: one outputs the state value, and the other outputs the advantage value for each action. By merging these two streams to compute the final Q-value, the D3QN provides a more accurate value function estimation, reduces the problem of Q-value overestimation, and results in a more stable training process, faster convergence, and better policy performance. Therefore, this paper introduces a D3QN into the smart ship network system and investigates the reliability path selection problem for SDN-converged TSN networks.

3. Optimized Architecture of Ship Networks and Modeling of Switch Path Selection

3.1. Design of Optimized Architecture for Smart Ship Networks

This section constructs an industrial Ethernet architecture for ships that support heterogeneous multi-protocols based on SDN and TSN technologies, namely the SSDTSN. In this architecture, SDN technology decomposes the network into a control plane and a data plane. The DRL algorithms are deployed in the control plane, while TSN technology is embedded into the switches in the data plane. The data plane interfaces with sensors, the Automatic Identification System (AIS), Global Positioning System (GPS), and other terminal devices through intelligent gateways. The control plane connects to upper-layer applications via the Application Programming Interface (API), enabling upper-level application control or 5G remote intelligent control.

3.1.1. Architecture of SSDTSN

In [32], the authors integrated DRL with SDN to establish the DRL-R architecture. The DRL agent, deployed on the SDN controller, continuously interacts with the network from a global perspective, thereby enabling optimized routing decisions. Building upon this framework and incorporating TSN, this paper proposes a novel SDN architecture specifically tailored for ship applications. The SSDTSN architecture is illustrated in Figure 1. First, the network controller—the SDN control plane—acquires the network topology, link information, and traffic data through data plane switches.

It obtains link latency information via the precise time synchronization protocol generalized Precision Time Protocol (gPTP) of TSN switches and gathers bandwidth and packet information of links and ports through periodic statistical reports. The DRL module in the control plane leverages the global information of the data plane obtained via SDN to learn real-time data transmission over network links. It then issues new switch flow table entries to control data forwarding paths and traffic scheduling.

After the periodic issuance of control information, the data perception layer collects navigation data and ship-specific data through intelligent terminal devices. The heterogeneous integration layer converts various ship communication protocols into Virtual Local Area Network (VLAN) Ethernet data frames supporting TSN via intelligent gateways. In the data forwarding layer, TSN switches transmit data from ship intelligent systems based on TSN’s time synchronization, traffic scheduling, and redundancy fault-tolerance mechanisms.

The proposed algorithm is primarily applied within the data forwarding layer to implement the redundancy fault-tolerance mechanism. Specifically, the SDN control plane selects and distributes both working and redundant paths. Then, utilizing TSN’s redundancy protocol IEEE 802.1CB, data streams are redundantly replicated and transmitted.

3.1.2. Example of SSDTSN Topology

The China Classification Society (CCS) has published intelligent ship standards that categorize a vessel’s systems into six primary domains: intelligent navigation, intelligent hull, intelligent machinery, intelligent energy efficiency management, intelligent cargo management, and intelligent integrated platform [1]. Building upon the proposed SSDTSN architecture, this paper introduces a novel network topology for intelligent ships. An example of the ship’s SSDTSN topology is illustrated in Figure 2, where each system is connected to the ship’s communication network through its respective TSN switch. The SDN controller centrally manages the TSN switches of each independent system, ensuring unified control and seamless integration across the vessel’s intelligent systems.

3.2. Modeling of Path Selection Problems

Based on the optimized smart ship network architecture, this section systematically models the path selection problem by introducing key parameters such as transmission latency, bandwidth utilization, and packet loss rate to ensure efficient and reliable data transmission.

3.2.1. Parameter Definition

In smart ship communication networks, based on an existing study, the network topology model is an undirected graph $G = (V, E)$ [37], where the set of TSN switch nodes is represented as $V = {v_{1}, v_{2}, . . ., v_{M}}$ , with M being the number of nodes. The set of links between nodes is represented as $E = {e_{1}, e_{2}, . . ., e_{N}}$ , where N is the number of links. For data flow $d_{i}$ , the source node is $d_{s}$ and the destination node is $d_{d}$ . The set of J reachable paths is $P = {p_{1}, p_{2}, . . ., p_{j}}$ , where the probability of transmitting via path $p_{j}$ is $w_{j}$ . Additional definition parameters are shown in Table 1.

Definition 1

(average path transmission latency). During path transmission, if packet loss occurs, data retransmission is required, and the path transmission latency is related to the packet loss rate. Therefore, the path latency includes the initial transmission latency plus the retransmission latency associated with the packet loss rate.

For any single data flow $d_{i}$ , the link latency it experiences from $d_{s}$ to $d_{d}$ is $L_{i}$ . After traversing h links, the total path latency $L_{h}$ is

(1) $L_{h} = \sum_{i = 1}^{h} L_{i}$

The average transmission latency of K data flows is

(2) $\bar{L} = \sqrt{\frac{\sum_{h \in K} L_{h}^{2}}{|K|}} = \sqrt{\frac{\sum_{h \in K} {(\sum_{i \in h} L_{i})}^{2}}{|K|}}$

Definition 2

(average packet loss rate). The SDN controller obtains link packet loss rates through statistical data. For any data flow $d_{i}$ , the packet loss rates of the h links it traverses from $d_{s}$ to $d_{d}$ are $R_{i}$ . The packet loss rate of the transmission path for $d_{i}$ is

(3) $R_{h} = \prod_{i \in h} (1 - R_{i})$

The average packet loss rate of K data flows is

(4) $\bar{R} = \sqrt{\frac{\sum_{h \in K} R_{h}^{2}}{|K|}} = \sqrt{\frac{\sum_{h \in K} {(\prod_{i \in h} (1 - R_{i}))}^{2}}{|K|}}$

Definition 3

(bandwidth load coefficient of variation). The SDN controller obtains statistical information from switch node ports, including the number of bytes received $R x$ and the number of bytes transmitted $T x$ during the statistical time $Δ t$ . The used bandwidth is calculated as

(5) $U_{u s e d} = \frac{R x + T x}{|Δ t|}$

Let B be the total bandwidth; then, the bandwidth utilization is

(6) $U_{e} = \frac{U_{u s e d}}{B}$

The average bandwidth utilization of the K data flows is

(7) $\bar{U} = \sqrt{\frac{({\sum_{i \in K} (\sum_{e \in p_{j}} U_{e})}^{2})}{K}}$

In [33], the authors employ the coefficient of variation to quantify the degree of data dispersion. To assess the load balance of all paths, this paper uses the coefficient of variation (CV) to measure load balance. The standard deviation σ and the mean μ of the bandwidth occupied by the paths determine the average load rate coefficient of variation, $U_{c v}$ . The lower the value of $U_{c v}$ , the more balanced the load. The average load coefficient of variation rate of network bandwidth is

(8) $\begin{matrix} U_{c v} = \frac{σ_{\forall U_{u s e d}}}{μ_{\forall U_{u s e d}}} & = \frac{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} (U_{u s e d, i}^{2} - μ_{\forall U_{u s e d}}^{2})}}{μ_{\forall U_{u s e d}}} \\ μ_{\forall U_{u s e d}} & = \frac{1}{N} \sum_{i = 1}^{N} U_{u s e d, i} \end{matrix}$

3.2.2. Problem Modeling

In SDN integrated with a TSN network architecture, it is necessary to select two paths to achieve the real-time and reliable transmission of critical service flows. Both the working path and the redundant path should choose links with low latency, a low packet loss rate, and low bandwidth utilization for transmission:

(9) $\begin{matrix} min (μ \bar{U} + ν \bar{L} + λ \bar{R}) \end{matrix}$

(10) $\begin{matrix} s . t . \sum_{p_{k} \in P} w_{k} = 1, \forall k \in J, \end{matrix}$

(11) $\begin{matrix} w_{k} \geq 0, \forall p_{k} \in P, \forall k \in J, \end{matrix}$

(12) $\begin{matrix} L_{h} < L_{max}, \end{matrix}$

(13) $\begin{matrix} U_{used} < B, \end{matrix}$

(14) $\begin{matrix} μ + ν + λ = 1, 0 \leq μ, ν, λ \leq 1 . \end{matrix}$

Among these, Equation (9) is the optimization objective, satisfying the requirements for different network latency and packet loss rates.

Equation (10) represents all forwarding paths of data flow between the source node and the destination node.

Equation (11) indicates that the probability of selecting a forwarding path is positive.

Equation (12) stipulates that the transmission latency of data flow does not exceed the threshold of data transmission latency.

Equation (13) specifies that the link bandwidth utilization of data flow cannot exceed the total link bandwidth.

Equation (14) represents the weight values for network bandwidth utilization, network transmission latency, and network packet loss rate, respectively.

4. DRL of Redundant Multipath Selection Algorithms

In this section, a detailed design method of the redundant multipath selection algorithm is designed based on a D3QN and the proposed network model and problem modeling.

4.1. Problem Description

In the case of limited network bandwidth and a fixed network architecture, finding the best transmission path with the minimum latency and the minimum packet loss rate is an NP-complete problem. To address this problem, the D3QN design optimization algorithm is used to model reinforcement learning as a Markov Decision Process (MDP) (S, A, P, R, $γ$ ) [26].

(1) State: In this paper, we utilize the data plane switch network topology, switch port data, traffic information, and other data obtained by the SDN controller to identify all transmission paths $P_{t}$ , transmission traffic $D_{t}$ , bandwidth utilization $U_{t}$ , packet loss $R_{t}$ , and path latency $L_{t}$ . This is performed in order to establish the composition of the graph eigenvalue raw information. Subsequently, the GCN is employed to process the network topology and features, thereby enhancing the quality of state representation and optimizing the efficacy of policy implementation. The resulting state space is as follows:

(15) $S = \{s_{t} | s_{t} = G C N [D_{t}, P_{t}, L_{t}, U_{t}, R_{t}] = G C N [\begin{matrix} d_{1}, d_{2}, \dots, d_{k} \\ \begin{matrix} p_{d_{1}}, p_{d_{2}}, \dots, p_{d_{k}} \\ U_{e_{1}}, U_{e_{2}}, \dots, U_{e_{N}} \\ L_{d_{1}}, L_{d_{2}}, \dots, L_{d_{k}} \end{matrix} \\ R_{d_{1}}, R_{d_{2}}, \dots, R_{d_{k}} \end{matrix}]\}$

(2) Action: The D3QN senses the network state information, then obtains the real-time state of each link in the network, and calculates the optimal forwarding path of the data flow, denoted as $a_{t}$ . The transfer path action of each data stream either remains unchanged or a new path P is selected:

(16) $A = \{a_{t} | a_{t} = π (s_{t}) = [\begin{matrix} d_{1}, d_{2}, \dots, d_{k} \\ p_{d_{1}}, p_{d_{2}}, \dots, p_{d_{k}} \end{matrix}]\}$

(3) Policy: after taking action $a_{t}$ , the network state will change with new network topology feature information, and the new network information is processed using the GCN to obtain the next state $s_{t + 1}$ :

(17) $P (s_{t + 1} ∣ s_{t}, a_{t}) = P r (s_{t + 1} = G C N [D_{t + 1}, P_{t + 1}, L_{t + 1}, U_{t + 1}, R_{t + 1}] ∣ s_{t}, a_{t})$

(4) Reward: the D3QN takes into account the network average transmission latency and network packet loss rate and expresses the reward function $r_{t}$ as follows:

(18) $R (s_{t}, a_{t}) = \frac{1}{e x p (μ U_{c v}) + e x p (ν {\bar{L}}_{t + 1}) + e x p (λ {\bar{R}}_{t + 1})}$

$μ, ν, λ \in [0, 1]$ is an adjustable weight and $μ + ν + λ = 1$ . When $μ = 1$ , only transmission latency for optimized data streams is considered; when $ν = 1$ , only the transmission packet loss of optimized data streams is considered.

(5) Discount: a constant value between 0 and 1 that weighs the importance of current rewards against future rewards, defined as follows:

(19) $γ \in [0, 1]$

4.2. Path Selection Algorithm Based on D3QN Fusion GCN

To optimize path selection within smart ship communication systems, this section introduces an algorithm that integrates a D3QN with a GCN, thereby enabling highly reliable and real-time path selection.

4.2.1. The Optimal Path Selection Process Based on D3QN

The specific learning and training process of the D3QN optimization algorithm are as follows.

To experience playback and state selection, the D3QN uses an experience playback pool (Experience Replay Buffer) for random sampling to break the correlation between data and improve the stability of training. Meanwhile, the maximum number of the jump limit is set to eliminate inappropriate path selection.

For the action selection strategy, the Q-network uses the $ϵ$ -greedy strategy to select actions, where $ϵ$ is the exploration rate and is used to balance exploration and utilization. The $ϵ$ -greedy strategy is a fundamental action selection method in reinforcement learning, designed to balance exploration and exploitation. Within the framework of a Q-network, the $ϵ$ -greedy strategy assists the agent in deciding whether to select the best-known action or to explore new actions that might yield higher rewards [38]:

(20) $a_{t} = \{\begin{matrix} Random actions ϵ \\ arg max_{a} Q (s_{t}, a; θ) 1 - ϵ \end{matrix}$

The Q-network of a D3QN adopts the Dueling structure [39], which divides the network into the value flow and the advantage flow and estimates the value $V (s)$ of each state and the advantage $A (s, a)$ of each action, respectively. The formula for the Q-value is as follows:

(21) $Q (s, a; θ, α, β) = V (s; θ, β) + (A (s, a; θ, α) - \frac{1}{|A|} \sum_{a^{'}} A (s, a^{'}; θ, α))$

where

θ

represents the shared network parameters;

α

and

β

are independent parameters of the advantage and value streams, respectively; and

|A|

is the size of the action space.

According to the Bellman criterion [40], the target Q-value $Y_{t}$ is estimated as

(22) $Y_{t} = r_{t} + γ m a x_{a^{'}} Q^{'} (s_{t} + 1, a^{'}; θ^{'})$

where

r_{t}

is the immediate reward,

γ

is the discount factor,

Q^{'}

is the Q-value of the target network, and

θ^{'}

is the parameter of the target network. The loss function uses the Mean Square Error (MSE) to measure the difference between the predicted Q-value and the target Q-value:

(23) $L (θ) = E [{(Y_{t} - Q (s_{t}, a_{t}; θ))}^{2}]$

Update the parameter $θ$ of the Q-network by the gradient descent method:

(24) $θ \leftarrow θ - α \cdot \nabla_{θ} L (θ)$

where

α

is the learning rate.

Regarding the update of the target network, in order to reduce the correlation between the main network and the target network, the parameter $θ^{'}$ of the target network is copied from the main Q-network with certain training intervals: $θ^{'} \leftarrow θ$ . It ensures the stability of the target Q-value and promotes iterative learning from the Q-value to the final target Q-value.

4.2.2. Optimal Path Selection Algorithm Based on D3QN Fused with GCN

In [41], the authors introduced a D3QN into vehicular networks to address resource allocation challenges. Building upon their network model, this study leverages the D3QN as a foundational framework and incorporates the GCN to extract network feature values. Consequently, we propose a novel optimal path selection algorithm tailored for SSDTSN, named Graph-enhanced D3QN (G-D3QN). The workflow of the proposed algorithm is depicted in Figure 3. The proposed algorithm is detailed through its pseudocode in Algorithm 1.

It acquires the original state information of the network topology through the SDN controller and subsequently processes the network using a GCN. Topology and path features are extracted, and high-dimensional embedded representations of the generated network are outputted as the state space of the D3QN. The optimal forwarding paths of the data streams are then computed by using the D3QN algorithm.

4.3. Redundant Multipath Selection Algorithm for Smart Ship

Utilizing the previously established SSDTSN framework in conjunction with the G-D3QN algorithm, this section proposes an innovative redundant multipath selection technique for intelligent ship communication networks. This technique seeks to increase fault tolerance and secure seamless communication, thereby elevating the overall reliability and robustness of data transmissions in smart maritime environments.

4.3.1. Smart Ship Data Flow Priority Classification

Building on existing research in autonomous driving traffic classification [12] and ship traffic studies [37], this paper thoroughly classifies data flows. The traffic priorities are detailed in Table 2.

In ship information systems, diverse data streams impose varying transmission performance requirements, necessitating specialized transmission strategies. Additionally, the redundancy mechanisms inherent in TSN require allocating supplementary communication paths, which mandates that data flows be prioritized based on their significance. Consequently, the redundant transmission paths examined in this study are primarily allocated to high-priority traffic, ensuring the reliability of critical data flows. This classification considers the unique characteristics of marine vessels, the specific attributes of data streams, and their respective performance demands. Doing so ensures that the transmission processes within intelligent ships adhere to the traffic priority standards established for time-sensitive networks.

4.3.2. Introduction to Redundant Multipath Selection Algorithms

In networks implementing TSN technology, data flows can be sent through multiple redundant pathways, enabling uninterrupted communication [28]. This study presents a novel redundant multipath selection algorithm specifically designed for maritime communication systems. Within the SSDTSN architecture, TSN switches periodically provide updates on flow characteristics and transmission processes. The SDN control plane employs the G-D3QN algorithm to select both primary and alternative paths. These path flow tables are then disseminated to the TSN switches, which oversee the redundant transmission of high-priority traffic. This approach enhances fault tolerance and ensures continuous communication, thereby boosting the reliability and robustness of data transmission in intelligent maritime settings. The structure of the algorithm is shown in Figure 4.

The redundant multipath selection algorithm starts by configuring the SDN controller and TSN switches to collect real-time network topology information, such as path delays and switch port statuses; this step is crucial because it provides a clear understanding of the current network conditions and forms the foundation for optimal path selection. After gathering this information, the SDN controller uses Algorithm 1 to determine the working path by selecting the route with the lowest latency that also meets acceptable packet loss rates, ensuring the efficient and reliable transmission of critical data services. Next, the controller chooses a redundant path to enhance fault tolerance, which overlaps with the working path as little as possible, ideally using different physical links or nodes to avoid single points of failure. Once both paths are determined, they are loaded into flow tables and distributed to the TSN switches in the data plane. Finally, high-priority traffic is simultaneously sent over both the working and redundant paths, ensuring that communication remains uninterrupted even if one of the paths fails or experiences degradation. Algorithm 2 outlines the pseudocode of the proposed algorithm.

Algorithm 1 Path Selection Algorithm Based on D3QN Fusion GCN

Require:
Network topology information, including time latency, packet loss rate, bandwidth occupancy rate
Ensure:
The end-to-end traffic transmission path

1:. GCN preprocessing: Input the related network topology information into the graph convolutional neural network to obtain the topological graph feature information
2:. D3QN initialization: Initialize the empirical replay buffer D, initialize the online network parameter $θ$ , initialize the target network parameter $θ^{'}$
3:. for episode = 1 to M do
4:. Initialize state $s_{1}$
5:. Initialize action $a_{1}$
6:. for $t = 1$ to T do
7:. Select action $a_{t}$ according to the policy $ϵ$ -greedy strategy
8:. Execute action $a_{t}$ , observe a new state $s_{t + 1}$ and receive reward $r_{t}$
9:. Store transition $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in D
10:. Update the online network
11:. Sample a batch of experiences from D
12:. Calculate the target $Y_{t} = r_{t} + γ Q^{'} (s_{t + 1}, arg {max}_{a} Q (s_{t + 1}, a; θ); θ^{'})$
13:. Calculate the loss function $L = {(Y_{t} - Q (s_{t}, a_{t}; θ))}^{2}$
14:. Minimize the loss L by gradient descent
15:. if it is time to update the target network then
16:. Update target network parameters: $θ^{'} \Leftarrow θ$
17:. end if
18:. end for
19:. end for

Algorithm 2 Redundant Multipath Selection Algorithm for Smart Ships

Require:
Network topology, including node and link information, and end-to-end node information
Ensure:
Working path and redundant path

1:. SDN controller and TSN switch are initialized to obtain path latency and switch port data
2:. Use Algorithm 1 to determine the working path: select a working path with the best time latency and better packet loss rate
3:. Use Algorithm 1 to determine the redundant path: select a redundant path with the minimum association with the working path
4:. The SDN controller sends the traffic transfer instructions to the data plane TSN switch
5:. Key services flow through two paths simultaneously
6:. end

5. Experimental Evaluation

In this section, we establish a virtual simulation environment to validate the effectiveness of the proposed algorithm and compare its performance with other algorithms. By configuring experimental parameters such as the simulated topology of the ship communication system, flow latency, bandwidth, and packet loss, we first test the convergence and effectiveness of our algorithm. Subsequently, using this simulation environment, we compare the performance of our algorithm with the DQN algorithm, the ACO algorithm, and the OSPF algorithm in terms of latency and the packet loss rate under both single-path and dual-path scenarios.

5.1. Experimental Configuration

In this section, we developed a virtual simulation environment utilizing the Python-based NetworkX library [42], drawing on existing studies [43,44]. This environment facilitated the flexible implementation of network topologies and accelerated the training process. The experimental topology streamlines the intelligent maritime network system by reducing it to essential switching nodes, resulting in a network structure composed of multiple switching nodes and data terminals. This simplification allows us to concentrate on the network’s core functionalities without the computational burden of modeling the entire complexity of real-world intelligent ship systems. By decreasing complexity, we can more effectively analyze and interpret the results, a common strategy in network simulation research to balance realism with computational feasibility [45].

Experimental Hardware Configuration: GPU: NVIDIA GeForce GTX 4090; CPU: i7-14700KF; Memory: 128GB; Operating System: Windows 11 64-bit.

Experimental Environment Configuration: The simulation environment is developed using the PyTorch framework, which offers dynamic computation graphs and GPU acceleration, making it highly suitable for deep learning applications [46]. We employed a GCN to extract feature values for each path, specifically targeting latency, bandwidth utilization, and packet loss rate. GCNs are exceptionally effective for processing graph-structured data because they can capture the complex relationships and interactions between nodes and edges within the network [47]. By leveraging a GCN, we achieve comprehensive representations of the network topology and its associated metrics, enhancing our ability to analyze and optimize network performance.

5.2. Learning Parameter Settings

Following the experimental configuration, this section details the specific settings of the learning parameters.

The GCN is designed with a two-layer fully connected structure, each comprising 16 neurons. This architecture effectively captures and processes the graph-structured data inherent in the network topology while maintaining model simplicity. Similarly, the D3QN utilizes a two-layer fully connected structure with 64 neurons in the hidden layer to accommodate the larger state-action spaces typical in reinforcement learning tasks, thereby enabling more precise policy learning.

The Rectified Linear Unit (ReLU) is employed as the activation function due to its ability to introduce non-linearity and mitigate vanishing gradient issues, enhancing the stability and efficiency of the training process. For optimization, the Adam optimizer is chosen for its adaptive learning rate capabilities, which improve the optimization efficiency and convergence speed, especially in deep learning models with numerous parameters.

To balance exploration and exploitation, an $ϵ$ -greedy strategy is implemented. The initial exploration rate $ϵ$ is set to one, encouraging extensive exploration of the action space during the early training stages. This rate decays exponentially at a factor of 0.999, eventually approaching 0.01. This approach ensures that the model thoroughly explores to discover optimal policies initially and gradually shifts toward exploiting the learned strategies, thereby enhancing convergence effectiveness.

Additional hyperparameters are listed in Table 3, [33,48].

Each experiment uses a single random seed to interact with the environment in 6000 steps. In this paper, the effectiveness and convergence of the proposed algorithm are verified by setting different weight values $(μ, ν, λ)$ and learning rate $α$ .

5.3. Algorithm Validation

As shown in Figure 5, when using the same $μ$ , $ν$ , $λ$ values but different learning rates, a learning rate of 0.05 leads to more significant network latency optimization.

In Figure 6, we observe the reward values for different $μ$ , $ν$ , $λ$ weight settings at a learning rate of 0.05. Good optimization can be achieved when $μ = 0.4$ , $ν = 0.4$ , and $λ = 0.2$ .

In the initial stage of training, the data transmission in the network leads to a poor network latency performance. The reward value of the proposed algorithm decreases. With the increase in training steps, the proposed algorithm gradually gained experience, responding to network changes and making dynamic adjustments. The average network latency performance gradually increases, and the reward value increases and eventually converges.

Figure 7 shows the results of the two tests for the D3QN at a learning rate of 0.05 with weight values of $μ = 0.4$ , $ν = 0.4$ , and $λ = 0.2$ compared to the DQN. It can be concluded that the D3QN converges faster and achieves a better reward value.

5.4. Algorithmic Performance Evaluation

In this paper, based on the simulation topology diagram, the proposed algorithm (a single path based on the G-D3QN selects one path for transmission, and the dual path is to select two paths for transmission using the G-D3QN-based ship-redundant multipath selection algorithm, and all the following are simplified representations of the G-D3QN) is compared with the DQN, ACO, and OSPF. The performance comparison is conducted for the four algorithms of a single path and dual path, respectively, to reflect the proposed algorithm’s advantages. To demonstrate the proposed algorithm’s advantages, the four algorithms’ performance is compared for single and dual paths, respectively. Table 4 summarizes the average latency, average packet loss, and average load factor for data stream transmission.

Average delay analysis: Single-path scenario: The G-D3QN method exhibits the lowest average delay among all approaches when operating on a single path. The DQN and ACO show a comparable performance, with slightly higher delays than the G-D3QN, while OSPF records the highest average delay. This indicates that the G-D3QN achieves an optimal delay performance in single-path conditions. Dual-path scenario: the G-D3QN outperforms the other methods in a dual-path setup, maintaining the lowest average delay. The DQN follows, with ACO exhibiting marginally higher delays and OSPF again showing the highest delays. These results demonstrate that the G-D3QN consistently provides a superior delay performance even when utilizing dual paths.

Average packet loss rate analysis: Single-path scenario: the G-D3QN achieves the lowest packet loss rate in single-path transmission, with the DQN and ACO presenting similar, slightly higher rates and OSPF recording the highest loss rate. This suggests that the G-D3QN is particularly effective in minimizing packet loss when using a single path. Dual-path scenario: In dual-path conditions, the G-D3QN continues to lead with the lowest packet loss rate, followed by the DQN and ACO, which have comparable rates slightly above the G-D3QN, and OSPF, which maintains the highest loss rate. These findings further confirm the efficacy of the G-D3QN in enhancing network reliability through reduced packet loss.

Load balancing analysis: Single-path scenario: The G-D3QN method demonstrates the lowest load CV in single-path transmissions, indicating a more uniform load distribution. The DQN and ACO exhibit similar CV values, while OSPF records the highest CV. A lower CV signifies better load balancing, highlighting the G-D3QN’s superior performance. Dual-path scenario: Even in dual-path scenarios, the G-D3QN maintains the lowest CV value, with the DQN and ACO showing similar, slightly higher CVs and OSPF again having the highest CV. This also underscores the G-D3QN’s capability to achieve enhanced load balancing under dual-path conditions.

Overall performance: The G-D3QN approach consistently outperforms all other methods across all evaluated metrics, demonstrating its comprehensive advantages in reducing delay and packet loss rates while improving load balancing. The DQN method ranks second, surpassing both ACO and OSPF, which underscores the effectiveness of deep reinforcement learning in network optimization. Although ACO shows marginal improvements over OSPF in specific metrics, its overall performance remains inferior to the G-D3QN and DQN. OSPF performs the worst across all metrics, highlighting the limitations of traditional routing protocols in meeting real-time and reliability requirements.

Performance comparison between single and dual paths: For all evaluated methods, the average delay and average packet loss rates are significantly lower in dual-path configurations than in single-path setups. This indicates that adopting a dual-path strategy can substantially enhance network performance. Additionally, the load CV values are reduced in dual-path scenarios, reflecting a more balanced load distribution and more efficient utilization of network resources.

The G-D3QN method, based on deep reinforcement learning, demonstrates an exceptional performance in enhancing network metrics by effectively reducing delay and packet loss rates and achieving superior load balancing. Implementing dual-path redundant transmissions further improves the network performance. These results validate the proposed G-D3QN-based redundant multipath selection algorithm’s capability to facilitate a low-latency and highly reliable transmission of high-priority data flows within the SSDTSN architecture.

5.4.1. G-D3QN Single- and Dual-Path Performance Comparison

As shown in Figure 8, the latency, packet loss rate, and load factor of G-D3QN-based single-path transmission are compared with multipath transmission using this algorithm, and it can be seen that the average latency of the dual path is reduced by about 12.39%, the average packet loss rate is reduced by about 98.72%, and the average load factor is reduced by about 4.68%, compared with that of a single path.

5.4.2. Multi-Algorithm Latency Performance Comparison

The single- and dual-path transmission latency performance comparison of the four algorithms is shown in Figure 9. For single-path transmission, the G-D3QN algorithm reduces the latency by about 1.04% on average compared with the DQN algorithm, reduces the latency by 0.62% on average compared with the ACO algorithm, and reduces the latency by 5.80% on average compared with the OSPF algorithm. For two-path transmission, the G-D3QN algorithm reduces the latency by about 0.93% on average over the DQN latency, 3.10% on average over the ACO algorithm, and 7.85% on average over OSPF.

5.4.3. Multi-Algorithm Packet Loss Performance Comparison

The packet loss performance comparison of the four algorithms for single- and dual-path transmission is shown in Figure 10. For single-path transmission, the G-D3QN algorithm reduces the latency by about 0.68% on average compared with the DQN algorithm, reduces the latency by 0.38% on average compared with the ACO algorithm, and reduces the latency by about 11.30% on average compared with the OSPF algorithm. For two-path transmission, the G-D3QN algorithm reduces the latency by about 0.58% on average over the DQN latency, 0.57% on average over the ACO algorithm, and about 16.63% on average over OSPF.

5.4.4. Multi-Algorithm Load Performance Comparison

The comparison of the negative bandwidth coefficients of the four algorithms for single- and two-path transmission is shown in Figure 11. It can be seen that, for single-path transmission, the G-D3QN algorithm reduces the load factor by about 1.71% on average compared to the DQN load factor, reduces the load factor by 1.05% on average compared to the ACO algorithm, and reduces the load factor by about 18.40% on average compared to the OSPF. For two-path transmission, the G-D3QN algorithm reduces about 2.35% on average over the DQN load factor, 0.98% on average over the ACO algorithm, and about 15.81% on average over OSPF.

Therefore, the path selection algorithm based on the G-D3QN significantly outperforms the DQN, ACO, and OSPF algorithms in both single-path and dual-path configurations. It optimizes the transmission latency, enhances the transmission reliability, and improves bandwidth resource utilization. The ship redundant multipath selection algorithm proposed in this paper can help the smart shipboard communication system reduce network latency and packet loss and promote network load balancing.

6. Conclusions

Smart ships require extensive data interaction between various intelligent sensors and multiple intelligent systems. Among these, high-priority data transmissions require guarantees of high real-time performance and high reliability. This paper proposes a novel network architecture integrating SDN and TSN to meet the communication needs of smart ships. Additionally, a redundant multipath selection algorithm based on a D3QN is developed to generate the working path and redundant path that meet the latency and packet loss rate, distributed through the SDN control plane to the data plane TSN switch. By utilizing TSN’s inherent redundancy mechanisms and the selection of multiple redundant paths, high-priority traffic is transmitted simultaneously across multiple paths. This approach guarantees high-reliability transmission. Experiments show that the proposed redundant multipath selection algorithm can optimize time latency and packet loss rate and has a certain reference value for solving reliability problems in smart ships.

However, this study has several limitations. The experimental evaluations were primarily conducted in simulated environments without validation in actual maritime communication networks, which may not fully capture the complexities and dynamic nature of real-world settings. Additionally, integrating wireless communication technologies such as 5G and 6G within maritime communication infrastructures was not extensively explored, potentially limiting the algorithm’s applicability across a broader range of maritime scenarios. Furthermore, although the algorithm effectively reduces latency and packet loss rates, its computational complexity and resource demands pose challenges for scalability in more extensive and complex networks. Future research should focus on enhancing the algorithm’s scalability and efficiency to better handle intricate network environments. Lastly, the algorithm’s performance improvements are sensitive to specific weighting parameters, indicating that its optimization effectiveness may vary depending on different contexts and requirements. This sensitivity underscores the necessity for greater flexibility in the algorithm’s design and additional validation to ensure adaptability to diverse operational conditions and demands.

In this study, simulations were predominantly utilized to conduct experimental analyses. The subsequent phase will focus on developing a hardware simulation environment to facilitate verification and deployment testing on actual marine vessels. Furthermore, with the integration of advanced wireless communication technologies such as 5G and 6G alongside SDN and TSN, future models will seek to establish a unified architecture for both wireless and wired network transmissions. This unified framework is designed to address the increasingly diverse communication requirements of smart ships, thereby enhancing the efficiency, flexibility, and reliability of maritime communication systems.

Author Contributions

Y.X.: conceptualization, formal analysis, methodology, supervision, and writing—review and editing. S.H.: conceptualization, data curation, investigation, methodology, project administration, resources, software, writing—original draft, and writing—review and editing. Z.Z.: conceptualization, investigation software, and validation. J.X.: investigation, resources, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This article simulates the online access address of the code: https://github.com/xiutest/SSDTSN (accessed on 30 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NMEA	National Marine Electronics Association
CAN	Controller Area Network
SDN	Software-Defined Networking
TSN	Time-Sensitive Networking
D3QN	Double Dueling Deep Q-Network
GCN	Graph Convolutional Network
API	Application Programming Interface
VLAN	Virtual Local Area Network
AIS	Automatic Identification System
GPS	Global Positioning System
SSDTSN	Ship Software-Defined Time-Sensitive Networking
ACO	Ant Colony Optimization
LSTM	Long Short-Term Memory
DQN	Deep Q-Network
DDPG	Deep Deterministic Policy Gradient
A2C	Advantage Actor–Critic
ReLU	Rectified Linear Unit
MSE	Mean Square Error
MDP	Markov Decision Process
OSPF	Open Shortest Path First
gPTP	generalized Precision Time Protocol

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. SSDTSN architecture diagram.

Figure 2. Topology diagram of SSDTSN.

Figure 3. G-D3QN schematic diagram.

Figure 4. Redundant multipath selection algorithms structure.

Figure 5. G-D3QN. (a) Reward value at different learning rates; (b) latency at different learning rates.

Figure 6. G-D3QN reward values under different weights.

Figure 7. Comparison of D3QN and DQN training reward values. (a) test1; (b) test2.

Figure 8. G-D3QN single path vs dual path. (a) Latency; (b) packet loss; (c) load CV.

Figure 9. Latency evaluation of four algorithms. (a) Single path; (b) dual path.

Figure 10. Packet loss evaluation of four algorithms. (a) Single path; (b) dual path.

Figure 11. Load CV evaluation of four algorithms. (a) Single path; (b) dual path.

Table 1

The notation definition.

Notation	Definition
$d_{i}$	Data flow
P	End-to-end transmission path
$p_{i}$	Transmission links of flow $d_{k}$ at time t
$L_{h}$	Transmission latency of flow $d_{k}$ at time t
$\bar{L}$	Average transmission latency of J data flows
$U_{used}$	Used bandwidth including the number of bytes received $R_{d}$ and sent $T_{d}$ during statistical time $Δ t$
B	Total path bandwidth
$\bar{U}$	Average bandwidth utilization of J data flows
$U_{c v}$	Bandwidth load coefficient of variation
$R_{k}$	The packet loss rate of transmission path for $d_{k}$
$\bar{R}$	Average packet loss rate of J data flows

Table 2

Communication data types and characteristics.

Data	Type	Cycle	Requirements	Reliability	Priority
Navigation data transmission: real-time radar and AIS info	Isochronous	50 µs–2 ms	Strict time limit, real-time	High	High
Engine control and monitoring: engine parameters, power system control signals	Cyclic Sync	100 µs–2 ms	Max latency, real time	High	High
Safety alarm system: fire, leakage, emergency stop notifications	Alarms/Events	Async/sudden	Max latency, real time	High	High
Sensor data collection: environmental sensors, equipment status	Cyclic Async	2–20 ms	Max latency	High	Medium
System configuration and maintenance: equipment parameters, fault diagnostics	Config/Diag	Async/sudden	Bandwidth	Medium	Medium
Network management and device control: topology management, start/stop instructions	Network control	Cyclic	Bandwidth	High	Medium
Non-critical communication: crew chat, non-critical data	Best Effort	Async/sudden	None	Low	Low
Surveillance video transmission: surveillance cameras, entertainment system	Video	Async/sudden	Max latency	Low	Low
Voice communication: intercom system, public broadcasting	Audio/Voice	Async/sudden	Max latency	Low	Low

Table 3

Algorithm parameter settings.

Parameter	Value
Learning rate	0.1, 0.01, 0.001, 0.0001
Training times	6000
Batch size (D3QNAgent)	512
$γ$ discount factor	0.99
Experience cache area	20,000
$ϵ$ -greedy	Initial 1.0, decay rate 0.999, minimum 0.01
$(u, v, λ)$	(0.2, 0.7, 0.1), (0.3, 0.5, 0.2), (0.4, 0.4, 0.2), (0.5, 0.3, 0.2), (0.7, 0.2, 0.1)

Table 4

Performance comparison of different methods.

Method	Single-Path Average Time Latency (µs)	Dual-Path Average Time Latency (µs)	Single-Path Average Packet Loss Rate (%)	Dual-Path Average Packet Loss (%)	Single-Path Load CV	Dual-Path Load CV
D3QN	134.0661	116.8674	1.3075	0.01726	0.6454	0.6174
DQN	135.9177	117.9687	1.3163	0.01736	0.6562	0.6310
ACO	135.2319	121.2392	1.3133	0.01735	0.6535	0.6216
OSPF	142.9973	126.5977	1.4780	0.02074	0.7872	0.7356

References

1. China Classification Society. Rules for Intelligent Ships 2024; China Classification Society: Beijing, China, 2023.

2. Tran, K.; Keene, S.; Fretheim, E.; Tsikerdekis, M. Marine Network Protocols and Security Risks. J. Cybersecur. Priv.; 2021; 1, pp. 239-251. [DOI: https://dx.doi.org/10.3390/jcp1020013]

3. Hao, G.; Xiao, W.; Huang, L.; Chen, J.; Zhang, K.; Chen, Y. The Analysis of Intelligent Functions Required for Inland Ships. J. Mar. Sci. Eng.; 2024; 12, 836. [DOI: https://dx.doi.org/10.3390/jmse12050836]

4. Yu, Z. Research on Multi-Source Heterogeneous Data Fusion Method for Intelligent Ship Navigation Test. Proceedings of the 2023 IEEE 3rd International Conference on Data Science and Computer Application (ICDSCA); Dalian, China, 27–29 October 2023; pp. 514-516. [DOI: https://dx.doi.org/10.1109/ICDSCA59871.2023.10393517]

5. Tang, Y.; Shao, N. Design and Research of Integrated Information Platform for Smart Ship. Proceedings of the 2017 4th International Conference on Transportation Information and Safety (ICTIS); Banff, AB, Canada, 8–10 August 2017; pp. 37-41. [DOI: https://dx.doi.org/10.1109/ICTIS.2017.8047739]

6. Kim, J.; Son, J. Design of an Integrated Telecom System for Performance Enhancement on Smart Ships. J. Theor. Appl. Inf. Technol.; 2021; 99, [DOI: https://dx.doi.org/10.21203/rs.3.rs-752288/v1]

7. Liu, S.; Xing, B.; Li, B.; Gu, M. Ship Information System: Overview and Research Trends. Int. J. Nav. Archit. Ocean Eng.; 2014; 6, pp. 670-684. [DOI: https://dx.doi.org/10.2478/IJNAOE-2013-0204]

8. Prajapati, A.; Sakadasariya, A.; Patel, J. Software Defined Network: Future of Networking. Proceedings of the 2018 International Conference on Intelligent Computing and Sustainable System (ICISS); Coimbatore, India, 19–20 January 2018; pp. 1351-1354. [DOI: https://dx.doi.org/10.1109/ICISC.2018.8399028]

9. Hauser, F.; Häberle, M.; Merling, D.; Lindner, S.; Gurevich, V.; Zeiger, F.; Frank, R.; Menth, M. A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research. J. Netw. Comput. Appl.; 2023; 212, 103561. [DOI: https://dx.doi.org/10.1016/j.jnca.2022.103561]

10. Xu, Y.; Shang, J.; Tang, H. Recent Trends of In-Vehicle Time Sensitive Networking Technologies, Applications and Challenges. China Commun.; 2023; 20, pp. 30-55. [DOI: https://dx.doi.org/10.23919/JCC.ea.2021-0888.202302]

11. Maletić, Ž.; Mlađen, M.; Ljubojević, M. A Survey on the Current State of Time-Sensitive Networks Standardization. Proceedings of the 2023 10th International Conference on Electrical, Electronic and Computing Engineering (IcETRAN); East Sarajevo, Bosnia and Herzegovina, 5–8 June 2023; pp. 1-6. [DOI: https://dx.doi.org/10.1109/IcETRAN59631.2023.10192167]

12. Xu, Y.; Huang, J. A Survey on Time-Sensitive Networking Standards and Applications for Intelligent Driving. Processes; 2023; 11, 2211. [DOI: https://dx.doi.org/10.3390/pr11072211]

13. Fiori, T.; Lavacca, F.; Valente, F.; Eramo, V. Proposal and Investigation of a Lite Time Sensitive Networking Solution for the Support of Real Time Services in Space Launcher Networks. IEEE Access; 2024; 12, pp. 10664-10680. [DOI: https://dx.doi.org/10.1109/ACCESS.2024.3353466]

14. IEEE Std 802.1CB-2017 IEEE Standard for Local and Metropolitan Area Networks–Frame Replication and Elimination for Reliability; IEEE: New York, NY, USA, 2017; [DOI: https://dx.doi.org/10.1109/IEEESTD.2017.8091139]

15. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph Neural Networks: A Review of Methods and Applications. AI Open; 2020; 1, pp. 57-81. [DOI: https://dx.doi.org/10.1016/j.aiopen.2021.01.001]

16. Let, G.; Pratap, C.; Jagannath, D.; Dolly, D.; Evangeline, L. Software-Defined Networking Routing Algorithms: Issues, QoS and Models. Wirel. Pers. Commun.; 2023; 131, pp. 1631-1661. [DOI: https://dx.doi.org/10.1007/s11277-023-10516-y]

17. Sheu, J.; Zeng, Q.; Jagadeesha, R.; Chang, Y. Efficient Unicast Routing Algorithms in Software-Defined Networking. Proceedings of the 2016 European Conference on Networks and Communications (EuCNC); Athens, Greece, 27–30 June 2016; pp. 377-381. [DOI: https://dx.doi.org/10.1109/EuCNC.2016.7561066]

18. Abe, J.; Mantar, H.; Yayimli, A. k-Maximally Disjoint Path Routing Algorithms for SDN. Proceedings of the 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery; Xi’an, China, 17–19 September 2015; pp. 499-508. [DOI: https://dx.doi.org/10.1109/CyberC.2015.45]

19. Tao, J.; Shen, Y.; Yan, Y.; Wu, Y.; Zhang, Y.; Wan, J. A Distributed Heuristic Multicast Algorithm Based on QoS Implemented by SDN. Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC); Chengdu, China, 13–16 December 2017; pp. 23-29. [DOI: https://dx.doi.org/10.1109/CompComm.2017.8322508]

20. Jing, S.; Muqing, W.; Yong, B.; Min, Z. An Improved GAC Routing Algorithm Based on SDN. Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC); Chengdu, China, 13–16 December 2017; pp. 173-176. [DOI: https://dx.doi.org/10.1109/CompComm.2017.8322535]

21. Lin, C.; Wang, K.; Deng, G. A QoS-Aware Routing in SDN Hybrid Networks. Procedia Comput. Sci.; 2017; 110, pp. 242-249. [DOI: https://dx.doi.org/10.1016/j.procs.2017.06.091]

22. Owusu, A.; Nayak, A. A Framework for QoS-Based Routing in SDNs Using Deep Learning. Proceedings of the 2020 International Symposium on Networks, Computers and Communications (ISNCC); Montreal, QC, Canada, 20–22 October 2020; pp. 1-6. [DOI: https://dx.doi.org/10.1109/ISNCC49221.2020.9297225]

23. Awad, M.; Ahmed, M.; Almutairi, A.; Ahmad, I. Machine Learning-Based Multipath Routing for Software Defined Networks. J. Netw. Syst. Manag.; 2021; 29, 18. [DOI: https://dx.doi.org/10.1007/s10922-020-09583-4]

24. Azzouni, A.; Boutaba, R.; Pujolle, G. NeuRoute: Predictive Dynamic Routing for Software-Defined Networks. Proceedings of the 2017 13th International Conference on Network and Service Management (CNSM); Tokyo, Japan, 26–30 November 2017; pp. 1-6. [DOI: https://dx.doi.org/10.23919/CNSM.2017.8256059]

25. Kato, N.; Fadlullah, Z.; Mao, B.; Tang, F.; Akashi, O.; Inoue, T.; Mizutani, K. The Deep Learning Vision for Heterogeneous Network Traffic Control: Proposal, Challenges, and Future Perspective. IEEE Wirel. Commun.; 2016; 24, pp. 146-153. [DOI: https://dx.doi.org/10.1109/MWC.2016.1600317WC]

26. Shakya, A.; Pillai, G.; Chakrabarty, S. Reinforcement Learning Algorithms: A Brief Survey. Expert Syst. Appl.; 2023; 231, 120495. [DOI: https://dx.doi.org/10.1016/j.eswa.2023.120495]

27. Szepesvári, C. Markov Decision Processes. Algorithms for Reinforcement Learning; Springer International Publishing: Cham, Switzerland, 2010; pp. 1-10. [DOI: https://dx.doi.org/10.1007/978-3-031-01551-9_1]

28. Hassen, H.; Meherzi, S.; Jemaa, Z. Improved Exploration Strategy for Q-Learning Based Multipath Routing in SDN Networks. J. Netw. Syst. Manag.; 2024; 32, 25. [DOI: https://dx.doi.org/10.1007/s10922-024-09804-0]

29. Rischke, J.; Sossalla, P.; Salah, H.; Fitzek, F.; Reisslein, M. QR-SDN: Towards Reinforcement Learning States, Actions, and Rewards for Direct Flow Routing in Software-Defined Networks. IEEE Access; 2020; 8, pp. 174773-174791. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.3025432]

30. Casas-Velasco, D.; Rendon, O.; da Fonseca, N. Intelligent Routing Based on Reinforcement Learning for Software-Defined Networking. IEEE Trans. Netw. Serv. Manag.; 2020; 18, pp. 870-881. [DOI: https://dx.doi.org/10.1109/TNSM.2020.3036911]

31. Wang, H.; Liu, N.; Zhang, Y.; Feng, D.; Huang, F.; Li, D.; Zhang, Y. Deep Reinforcement Learning: A Survey. Front. Inf. Technol. Electron. Eng.; 2020; 21, pp. 1726-1744. [DOI: https://dx.doi.org/10.1631/FITEE.1900533]

32. Liu, W.; Cai, J.; Chen, Q.; Wang, Y. DRL-R: Deep Reinforcement Learning Approach for Intelligent Routing in Software-Defined Data-Center Networks. J. Netw. Comput. Appl.; 2021; 177, 102865. [DOI: https://dx.doi.org/10.1016/j.jnca.2020.102865]

33. Xia, D.; Wan, J.; Xu, P.; Tan, J. Deep Reinforcement Learning-Based QoS Optimization for Software-Defined Factory Heterogeneous Networks. IEEE Trans. Netw. Serv. Manag.; 2022; 19, pp. 4058-4068. [DOI: https://dx.doi.org/10.1109/TNSM.2022.3208342]

34. Jinesh, N.; Shinde, S.; Narayan, D. Deep Reinforcement Learning-Based QoS Aware Routing in Software Defined Networks. Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT); Delhi, India, 6–8 July 2023; pp. 1-6. [DOI: https://dx.doi.org/10.1109/ICCCNT56998.2023.10308049]

35. Gayatri Shivani, M.; Subba Rao, S.; Sujatha, C. A Survey on Deep Recurrent Q Networks. Intelligent Systems and Machine Learning; Nandan Mohanty, S.; Garcia Diaz, V.; Satish Kumar, G. Springer: Cham, Switzerland, 2023; pp. 251-261. [DOI: https://dx.doi.org/10.1007/978-3-031-35078-8_21]

36. Özalp, R.; Varol, N.; Taşci, B.; Uçar, A. A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System. Machine Learning Paradigms: Advances in Deep Learning-Based Technological Applications; Tsihrintzis, G.; Jain, L. Springer: Cham, Switzerland, 2020; pp. 237-256. [DOI: https://dx.doi.org/10.1007/978-3-030-49724-8_10]

37. Aslam, S.; Michaelides, M.; Herodotou, H. Internet of Ships: A Survey on Architectures, Emerging Applications, and Challenges. IEEE Internet Things J.; 2020; 7, pp. 9714-9727. [DOI: https://dx.doi.org/10.1109/JIOT.2020.2993411]

38. Amin, M.; Othman, M. Re-Exploration of ϵ-Greedy in Deep Reinforcement Learning. RiTA 2020. Lecture Notes in Mechanical Engineering; Chew, E.; Majeed, A.P.P.A.; Liu, P.; Platts, J.; Myung, H.; Kim, J.; Kim, J.-H. Springer: Singapore, 2021; [DOI: https://dx.doi.org/10.1007/978-981-16-4803-8_27]

39. Wang, Z.; Schaul, T.; Hessel, M.; Van Hasselt, H.; Lanctot, M.; De Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. Proceedings of the 33rd International Conference on Machine Learning; New York, NY, USA, 20–22 June 2016; pp. 1995-2003.

40. Luong, N.; Hoang, D.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey. IEEE Commun. Surv. Tutorials; 2019; 21, pp. 3133-3174. [DOI: https://dx.doi.org/10.1109/COMST.2019.2916583]

41. Jiang, F.; Li, Y.; Sun, C.; Wang, C. Dueling Double Deep Q-Network Based Computation Offloading and Resource Allocation Scheme for Internet of Vehicles. Proceedings of the 2023 IEEE Wireless Communications and Networking Conference (WCNC); Glasgow, UK, 26–29 March 2023; pp. 1-6. [DOI: https://dx.doi.org/10.1109/WCNC55385.2023.10118937]

42. Igual, L.; Seguí, S. Network Analysis. Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications; Springer International Publishing: Cham, Switzerland, 2024; pp. 151-174. [DOI: https://dx.doi.org/10.1007/978-3-031-48956-3_8]

43. Yang, L.; Wei, Y.; Yu, F.; Han, Z. Joint Routing and Scheduling Optimization in Time-Sensitive Networks Using Graph-Convolutional-Network-Based Deep Reinforcement Learning. IEEE Internet Things J.; 2022; 9, pp. 23981-23994. [DOI: https://dx.doi.org/10.1109/JIOT.2022.3188826]

44. Xu, J.; Wang, Y.; Zhang, B.; Ma, J. A Graph Reinforcement Learning Based SDN Routing Path Selection for Optimizing Long-Term Revenue. Future Gener. Comput. Syst.; 2024; 150, pp. 412-423. [DOI: https://dx.doi.org/10.1016/j.future.2023.09.017]

45. Fujimoto, R.; Riley, G.; Perumalla, K. Wire–Line Network Simulation. Network Simulation; Springer International Publishing: Cham, Switzerland, 2007; pp. 19-25. [DOI: https://dx.doi.org/10.1007/978-3-031-79977-8_3]

46. Mishra, P. Introduction to Neural Networks Using PyTorch. PyTorch Recipes: A Problem-Solution Approach to Build, Train and Deploy Neural Network Models; Apress: Berkeley, CA, USA, 2023; pp. 117-133. [DOI: https://dx.doi.org/10.1007/978-1-4842-8925-9_4]

47. Nie, M.; Chen, D.; Wang, D. Reinforcement Learning on Graphs: A Survey. IEEE Trans. Emerg. Top. Comput. Intell.; 2023; 7, pp. 1065-1082. [DOI: https://dx.doi.org/10.1109/TETCI.2022.3222545]

48. Chen, G.; Sun, J.; Zeng, Q.; Jing, G.; Zhang, Y. Joint Edge Computing and Caching Based on D3QN for the Internet of Vehicles. Electronics; 2023; 12, 2311. [DOI: https://dx.doi.org/10.3390/electronics12102311]

Word count: 9897

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Traditional network architectures in smart ship communication systems struggle to efficiently manage the integration of heterogeneous sensor data. Additionally, conventional end-to-end transmission algorithms that rely on single-metric and single-path selection are inadequate in fulfilling the high reliability and real-time transmission requirements essential for high-priority service data. This inadequacy results in increased latency and packet loss for critical control information. To address these challenges, this study proposes an innovative ship network framework that synergistically integrates Software-Defined Networking (SDN) and Time-Sensitive Networking (TSN) technologies. Central to this framework is the introduction of a redundant multipath selection algorithm, which leverages Double Dueling Deep Q-Networks (D3QNs) in conjunction with Graph Convolutional Networks (GCNs). Initially, an optimization function encompassing transmission latency, bandwidth utilization, and packet loss rate is formulated within a software-defined time-sensitive network transmission framework tailored for smart ships. The proposed D3QN-GCN-based algorithm effectively identifies optimal working and redundant paths for TSN switches. These dual-path configurations are then disseminated by the SDN controller to the TSN switches, enabling the TSN’s inherent reliability redundancy mechanisms to facilitate the simultaneous transmission of critical service flows across multiple paths. Experimental evaluations demonstrate that the proposed algorithm exhibits robust convergence characteristics and significantly outperforms existing algorithms in terms of reducing network latency and packet loss rates. Furthermore, the algorithm enhances bandwidth utilization and promotes balanced network load distribution. This research offers a novel and effective solution for shipboard switch path selection, thereby advancing the reliability and efficiency of smart ship communication systems.

Details

Title

Redundant Path Optimization in Smart Ship Software-Defined Networking and Time-Sensitive Networking Networks: An Improved Double-Dueling-Deep-Q-Networks-Based Approach

Author

Xu, Yanli

; He, Songtao

; Zhou, Zirui; Xu, Jingxin

First page

2214

Publication year

2024

Publication date

2024

Publisher

MDPI AG

e-ISSN

20771312

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/jmse12122214

ProQuest document ID

3149661151

Redundant Path Optimization in Smart Ship Software-Defined Networking and Time-Sensitive Networking Networks: An Improved Double-Dueling-Deep-Q-Networks-Based Approach

Jump to:

Full Text

2. Related Work

3. Optimized Architecture of Ship Networks and Modeling of Switch Path Selection

3.1. Design of Optimized Architecture for Smart Ship Networks

3.1.1. Architecture of SSDTSN

3.1.2. Example of SSDTSN Topology

3.2. Modeling of Path Selection Problems

3.2.1. Parameter Definition

3.2.2. Problem Modeling

4. DRL of Redundant Multipath Selection Algorithms

4.1. Problem Description

4.2. Path Selection Algorithm Based on D3QN Fusion GCN

4.2.1. The Optimal Path Selection Process Based on D3QN

4.2.2. Optimal Path Selection Algorithm Based on D3QN Fused with GCN

4.3. Redundant Multipath Selection Algorithm for Smart Ship

4.3.1. Smart Ship Data Flow Priority Classification

5. Experimental Evaluation

5.1. Experimental Configuration

5.2. Learning Parameter Settings

5.3. Algorithm Validation

5.4. Algorithmic Performance Evaluation

5.4.1. G-D3QN Single- and Dual-Path Performance Comparison

5.4.2. Multi-Algorithm Latency Performance Comparison

5.4.3. Multi-Algorithm Packet Loss Performance Comparison

5.4.4. Multi-Algorithm Load Performance Comparison

Abstract

Details

Suggested sources