This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
As the 5th generation communication technologies are widely used, terrestrial network can provide high bandwidth and low delay communication services to the users within the coverage of base stations [1]. However, for those remote areas where base stations are not deployed or where base stations are destroyed by natural disasters, terrestrial network usually cannot meet the communication needs of users. Infrastructure of satellite network is rarely damaged by natural disasters, and it has wide coverage [2–4], so it is usually regarded as an essential component of terrestrial network. Satellite-terrestrial integrated network (STIN) has wide coverage and high flexibility and is able to compatible with the existing 5G network. Thus, it is a reliable paradigm to provide the Internet services and is receiving much attention from researchers [5–8]. In particular, when users are unable to communicate through terrestrial network in aviation or navigation, the STINs can provide them with communication services.
Satellite routing algorithm is an important technique in STINs, and there are much researches on it. Considering that the number of satellites is small and the structure of traditional satellite network is simple, existing satellite routing algorithms are developed from the routing algorithms of terrestrial network, such as OSPF [9], RIP [10], and AODV [11]. However, most algorithms are depended on the shortest path or minimum cost. Therefore, satellite network is prone to congestion in the process of routing. In addition, with the expansion of satellite network (e.g., Starlink), these routing algorithms cannot converge quickly in the limited communication time, which seriously degrades the communication performance of satellite network and wastes the communication resources of satellites at the same time. On the other hand, most of the work on satellite routing ignores the impact of user request resources on the performance of satellite network. Liu et al. [12] proposed a fragment-based load balancing route scheme to control the traffic of LEO satellite network. Qi et al. [13] improved the quality of service (QoS) of users by jointly optimizing the rate and routing in LEO satellite network. Considering the traffic distribution density in different areas, the authors in [14] proposed a distributed routing algorithm based on traffic prediction. However, the above algorithms do not take into account the impact of the current user’s routing on the subsequent user’s routing, resulting in the network performance is degraded. In addition, considering the power and computing resources of LEO satellite, it is not appropriate to deploy these algorithms on satellites. Therefore, it is very challenging to design an efficient satellite routing algorithm.
Recently, machine learning and deep learning [15, 16] have been used extensively. Some researchers began to use these methods to solve network communication problems [17–19]. The authors in [20] regarded satellite network topology as a series of snapshots and used particle swarm optimization algorithm for routing in each snapshot. However, deep learning is an approximate algorithm, which is not suitable for sequential decision problems. Reinforcement learning is a method based on trial and error. In the process of learning, the agent interacts with the environment and gets a corresponding reward. And this reward guides the agent to find the best strategy. Moreover, reinforcement learning is very suitable for dealing with sequential decision problems and achieves better results than human beings [21], and it has been applied in resource allocation [22], capacity management [23], and combinatorial optimization [24, 25]. Compared with other reinforcement learning algorithms, Q-learning is a simple and efficient reinforcement learning method, which has a fast convergence speed. In addition, it is very suitable for solving discrete problems. Inspired by the above references, we regard the satellite routing problem as a turn-base game and model it as a finite-state Markov decision process. Then, we put forward a Q-learning-based routing algorithm to solve satellite routing problems.
In this paper, we mainly investigate the satellite routing problem in STINs. Choosing a path from source node to destination node can be regarded as a turn-based game. And this is a finite-state Markov decision process. So we model the routing problem as a Markov decision process and define its state space, action space, and reward function. We propose QLRA algorithm to make full use of satellite network resources and improve the quality of service of users. In addition, in order to accelerate the convergence speed of QLRA algorithm, we propose a split-based speed-up convergence strategy and design a speed-up Q-learning-based routing algorithm (SQLRA). Moreover, we update the
(1) We model the satellite routing as a Markov decision process and define its action and state spaces and reward function. And we propose a Q-learning-based satellite routing algorithm (QLRA). QLRA algorithm can select the optimal paths according to the current states of satellite network and the users’ request resources when routing
(2) Aiming at the slow convergence speed of QLRA algorithm, we analyse the problem and propose a split based speed-up convergence strategy to accelerate the convergence speed of QLRA. Based on QLRA algorithm, we design a speed-up Q-learning-based routing algorithm (SQLRA). In addition, we update the Q-value from back to front to further improve the convergence speed of SQLRA algorithm. Experimental results show that SQLRA algorithm converges faster than QLRA algorithm
(3) Our proposed algorithm SQLRA can make full use of network resources while meeting the requirements of users. Numerical simulation results show that SQLRA algorithm effectively enhances the network performance compared with other algorithms
The remainder of this paper is organized as follows. In Section 2, the related work is reviewed. In Section 3, we introduce network model and problem formulation. In Section 4, the satellite routing algorithm based on Q-learning and the speed-up Q-learning-based routing algorithm are presented. In Section 5, we evaluate the performance of SQLRA algorithm in two different scenarios, analyse, and discuss the experimental results. Section 6 concludes this paper and gives future research issues.
2. Related Work
There are much researches on satellite network routing issues. In order to reduce the link congestion and the imbalance of load distribution, Liu et al. [26] proposed an iterative Dijkstra algorithm to optimize satellite communication path. The traditional LEO satellite network ignores the delay of links in routing, which leads to the incomplete evaluation of satellite network performance. To solve this problem, in [27], a satellite routing algorithm taking delay into account was proposed. The authors in [28] proposed a routing algorithm based on cooperative game theory to solve the problem of propagation delay and traffic load imbalance in LEO satellite network. Jiang et al. [29] designed a routing algorithm based on fuzzy theory to meet the multilevel needs of users. By leveraging orbit prediction information, Pan et al. [30] put forward a dynamic on-demand routing scheme to reduce the routing convergence and the communication overhead. Hao et al. [31] proposed a routing strategy based on energy-aware and load-balancing to meet the different communication services of users.
As a reliable communication paradigm, much work has been done on STINs. In order to improve the power utilization of satellites, the authors in [32] proposed a data offloading scheme in STINs to jointly allocate the power and resources of satellites. Zhang et al. [33] used edge computing techniques to improve the QoS of STINs. In order to reduce the cost of gateway deployment and data routing in STINs, the authors in [34] proposed a joint satellite gateway deployment and routing scheme. Xu et al. [35] proposed a hybrid routing algorithm to realize the seamless integration of STINs. The authors in [36] presented an end-to-end routing method based on heuristic strategy to improve the QoS of STINs.
Reinforcement learning is an effective method to cope with sequential decision problems, and it has been applied in many fields. Liu et al. [37] used Q-learning to implement the content caching problem in dynamic cloud content distribution network. To improve the efficiency of the Internet of Things, Pan et al. [38] used Q-learning to identify blocked links. A Q-learning method is proposed in [39] to improve the network performance and reduce the energy consumption of wireless sensor networks. Qiao et al. [40] proposed a joint optimization scheme of cache content placement and bandwidth resource allocation based on deep reinforcement learning in the Internet of Vehicles. Q-learning technique is widely used, and there are few researches on routing using Q-learning technique in satellite network. In this research, we use reinforcement learning to solve the routing problem in satellite network.
3. System Model and Problem Formulation
3.1. Network Model
The STINs used in this paper are shown in Figure 1. The STINs are composed of a terrestrial network and a LEO satellite network. The terrestrial network consists of base stations, routers, satellite gateways, and user terminals, and the satellite network consists of a large number of LEO satellites. The terrestrial network is able to connect with the satellite network with the aid of satellite gateways. Considering the high-speed mobility of the satellites in satellite constellation, each satellite is only connected to its neighbour satellites or satellites in its adjacent orbits. The communication link between satellites is bidirectional, and the specific structure is shown in Figure 2.
[figure omitted; refer to PDF]
When users communicate with their peers, the system first determines whether to reach their peers through the terrestrial network. Specifically, if their peers can be reached through the terrestrial network, then the data will be transmitted directly to their peers through the terrestrial network. Otherwise, the terrestrial network will transmit the data to the LEO satellites through the satellite gateways and then retransmit the data to their peers through the LEO satellite network. With the increasing number of satellites, existing satellite routing algorithms become unsuitable, which seriously degrade the performance of satellite network. Therefore, we focus on the satellite routing problem in this paper.
Considering that the topology of satellite network changes with time, inspired by reference [20], here we divide the whole operation time
The number of snapshots is related to the number of orbits and the number of satellites in each orbit. The time interval of the snapshot
Here, we use undirected graph
When transmitting data, it is necessary to find an optimal path from source satellite to destination satellite according to the current link states of satellite network. We assume that satellite 0 is source satellite and satellite 5 is destination satellite in Figure 3. There are multiple paths from satellite 0 to satellite 5. However, with a large number of users accessing to satellite network, satellite network resources are exhausted due to the load imbalance, which leads to the congestion of satellite links in advance. Therefore, the performance of satellite network is seriously degraded. For example, in the beginning, the optimal path from satellite 0 to satellite 5 is 0-3-4-5 in Figure 3. With the increase of users’ number, link 0-3 is congested because of the consumption of bandwidth resources, resulting in the next user cannot choose link 0-3 in the optimal path. Therefore, the path from satellite 0 to satellite 5 changes from path 0-3-4-5 to path 0-1-4-5 and finally to path 0-1-4-3-5. The specific process of path changing is shown in Figure 3.
[figure omitted; refer to PDF]
In Figure 3, the dotted lines with different colours represent different selected paths. From Figure 3, we see that the optimal path from satellite 0 to satellite 5 changes gradually with the consumption of communication resources of satellite links.
3.2. Problem Formulation
We assume that the bandwidth capacity of link
Here,
Here, we use functions
Our goal is to maximize the utility of all users by considering the bandwidth, the delay, the bit error rate, and the available time of network links in the process of routing.
4. A Satellite Routing Algorithm Based on Reinforcement Learning
Reinforcement learning is mainly composed of the agent and the environment. The agent interacts with the environment and learns the optimal strategy according to the feedback of environment. In particular, the reinforcement learning framework is shown in Figure 4. In the current state
[figure omitted; refer to PDF]
In STINs, the environment is the link states of satellite network, and it is time-varying. And the agent is deployed in ground control center. In route discovery phase, the agent chooses a valid action according to the users’ request and jumps to the satellite whose index is the valid action value. The environment gives the agent a corresponding reward, and the link states of satellites are changed simultaneously. The agent interacts with the environment until a path from source satellite to destination satellite is selected. The routing process is modelled as Markov decision processes (MDPs) and represented by
(1) State Space. The satellite link state considered in this paper includes available bandwidth, propagation delay, bit error rate, and available time. We define the state of link as
(2) Action Space. In satellite network, the action is used to describe the process of the agent moving from one satellite to another. For example, taking action
(3) Reward Value. The rewards are used to motivate the agent to search for the optimal strategy. The agent obtains the rewards by the states of satellite links. Different link states give different rewards. In order to avoid the impact of different rewards on the accuracy of results, we use min-max operation to normalize them. Function
We use state-action value
Choosing different strategy functions will get different state-action values. Our goal is to find the best strategy function to make the agent choose the appropriate action in each state.
According to Equation (25), we maximize the state-action value
4.1. Path Checking Algorithm
When the communication resources of satellite links are exhausted, the links will be congested. If there is not a path from source node to destination node in satellite network, the routing algorithm cannot find a suitable path to destination node. In order to avoid this situation, we first judge whether there is a reachable path to destination node before looking for a path. If there is no such a path, it means that links are congested or disconnected in satellite network. At this time, the routing algorithm stops looking for paths, reducing the waste of computing resources. We use pseudo code to describe the details of path checking algorithm.
Algorithm 1: Path checking algorithm (PCA).
Input: start_node, end_node, graph.
Output: the flag which indicates whether there is a valid path from start_node to end_node.
1.Initialize flag = 0, path ={}.
2.Get the neighbours of start_node based on the network structure graph, neighbours_list.
3.Let path = path∪{start_node}.
4.while neighbours_list:
5. Pop a node from neighbours _list, node.
6. if node not in path:
7. if node == end_node:
8. path = path ∪{node}.
9. flag =1.
10. end if
11. else:
12. Get the neighbours of node, node_neighbours.
13. Add the node_neighbours to the list neighbours_list .
14. path = path ∪{node}.
15. end else
16. end if
17.end while
where
4.2. Q-Learning-Based Routing Algorithm (QLRA)
Q-learning is a reinforcement learning method based on value function, and it uses a Q-table to store different state-action values. Because Q-learning is a mode-free method, it does not need prior knowledge in the process of learning. In addition, Q-learning learns the optimal strategy by trial and error, and it is very suitable for the dynamic satellite network. Therefore, here we try to use Q-learning to select the optimal route. The main idea of Q-learning algorithm is we first initialize the Q-table, then select an action from the current state by ε-greedy strategy, and the environment moves from the current state to the next state and update the state-action value
In the process of training, we use ε-greedy strategy, as shown in Equation (27), to avoid the result falling into the local optimal solution. The strategy can achieve the trade-off between exploration and exploitation, where
We first judge whether there is a feasible path. If there is a path, the optimal path is selected by means of Q-learning routing algorithm. Then, the reward matrix and structure of satellite network are updated. The specific Q-learning-based routing algorithm is described as follows:
Algorithm 2: Q-learning-based routing algorithm (QLRA).
Input: start_node, end_node, graph, R, users_req_list.
Output: the optimal paths.
1. Initialize Q-table, α, γ, ε, Episodes=M.
2. for user_req in users_req_list:
3. flag = PCA (start_node, end_node, graph).
4. if flag ==1:
5. for i=1 to Episodes:
6. current_state = start_node.
7. while current_state!= end_node:
8. Select the action a based on Eq. (27).
9. Get the corresponding reward value generated by each parameter according to Eq. (18), (19), (20) and (21).
10. Obtain the total reward r based on Eq. (22).
11. The agent move to the next state s'.
12. Update the Q-table based on Eq. (26).
13. Let current_state = s’.
14. end while
15. end for
16. Select the optimal path path from the converged Q-table based on Eq. (25).
17. Update the reward matrix R based on the consumption of link resources.
18. Update the structure of satellite network graph based on the consumption of link resources.
19. end if
20. else:
21. There is no path from start_node to end_node.
22 Break
23. end else
24.end for
where the function PCA in QLRA represents the path checking algorithm proposed above.
4.3. Speed-Up Q-Learning-Based Routing Algorithm (SQLRA)
In the process of selecting the next hop, the agent will jump from the current state to the previous state. And this operation will result in some repeated and invalid sequences in the selected path. When selecting the path from source satellite node
[figure omitted; refer to PDF]
Although QLRA algorithm uses ε-greedy strategy to select effective actions in the learning process, it still generates invalid sequences. To avoid the routing loop or ping-pong effect, we must prevent the agent from jumping from the current state to the previous visited state, when the agent selects an effective action. To solve this problem, we propose a split-based speed-up convergence strategy. The specific split process is shown in Figure 6.
[figure omitted; refer to PDF]
Similar to the broadcast mechanism, we split the satellite network according to the neighbour information of nodes. As shown in Figure 6, for destination node
The traditional reinforcement learning updates the
Algorithm 3: Speed-up Q-learning-based routing algorithm (SQLRA).
Input: start_node, end_node, graph, R, users_req_list.
Output: the optimal paths.
1. Initialize Q-table, α, γ, Episodes=M.
2. for user_req in users_req_list:
3. flag = PCA (start_node, end_node, graph).
4. if flag ==1:
5. nodes_list = BFS(graph, end_node)
6. for i=1 to Episodes:
7. for node in nodes_list:
8. if node != end_node:
9. neighbours = Neighbour(node).
10. for neighbour in neighbours:
11. Get the corresponding reward value generated by each parameter according to Eq. (18), (19), (20) and (21).
12. Obtain the total reward r based on Eq. (22).
13. Update Q(node, neighbour) based on Eq.(26).
14. end for
15. end if
16. end for
17. end for
18. Select the optimal path path from the converged Q-table based on Eq. (25).
19. Update the reward matrix R based on the consumption of link resources.
20. Update the structure of satellite network graph based on the consumption of link resources.
21. end if
22. else:
23. There is no path from start_node to end_node.
24 Break
25. end else
26.end for
[figure omitted; refer to PDF]
where the function BFS in SQLRA represents the breadth first search algorithm. We search from the last node end_node to get the traversal sequences of the whole network. And we use function Neighbour to get the neighbour information of each node from the satellite network structure.
Figure 8 shows the convergence speed of SQLRA and QLRA. We observe Figure 8 that SQLRA converges faster than QLRA. We also observe that SQLRA needs 30 episodes to converge, and QLRA needs 60 episodes to converge. The main reason is that in the process of routing, our split-based speed-up convergence strategy reduces the invalid sequences. In addition, we update the
[figure omitted; refer to PDF]
(B) Average delay analysis
We show in Figure 10 the average delay with different number of users. We observe that with the number of users increasing, the average delay obtained by all algorithms becomes larger. Moreover, we note that the curve of QLAODV grows slowly compared with other algorithms. For QLAODV, it prefers to select path with fewer hops. In addition, only when the current path is disconnected or saturated due to the consumption of users, QLAODV starts to select new path. Therefore, QLAODV does not choose new path frequently. This explains why the curve of QLAODV is mostly unchanged at the beginning. For QSR, OSPF, ACO, and SQLRA, they all consider the delay of links when calculating the cost function. Therefore, they prefer to choose the path with lower delay. As the number of users increases, the current paths cannot meet the needs of users, and these algorithms begin to select new paths. At this time, the delay of the selected paths is greater than that of the previously selected paths. This is the reason why the average delay of paths by QSR, OSPF, ACO, and SQLRA increases. Furthermore, we also observe that SQLRA performs better than ACO. The main reason is that ACO always achieves the suboptimal solution, and SQLRA mostly attains the global optimum.
[figure omitted; refer to PDF]
(C) Average bit error rate analysis
Figure 11 plots the average bit error rate changes with varying number of users. As the number of users increases, the average bit error rate obtained by all algorithms presents an upward trend. Because these algorithms consider the bit error rate characteristic of the satellite links when selecting the path, the paths with the lower bit error rate are selected at first. As the resources of links are consumed by the increasing users, these algorithms begin to choose new paths which have larger bit error rate. Compared with ACO, SQLRA has the best performance. Even if the number of users is largest, the average bit error rate of SQLRA algorithm is lowest.
[figure omitted; refer to PDF]
(D) Average visible time analysis
We show in Figure 12 the variation trend of the average visible time with varying number of users. We from Figure 12 observe that the average visible time of all algorithms shows a descend trend to varying degrees for algorithms QSR, OSPF, ACO, and SQLRA when the number of users is increasing. Considering that the visible time has an important impact on the performance of satellite network, these algorithms prefer to select paths with large visible time at the beginning. With more users accessing the network, the current paths cannot meet the requests of users. These algorithms start to select new paths with lower visible time. This explains why the average visible time of these algorithm gradually decreases. For QLAODV, the visible time of the links is not considered when selecting path. Therefore, when selecting the path, QLAODV algorithm does not prefer to choose the link with long visible time. And this explains why the average visible time curve of QLAODV algorithm does not show a downward trend.
[figure omitted; refer to PDF]
(2) Performance in communication scenario with different node pairs
The users’ requests in this scenario have different source satellite nodes and destination satellite nodes. Considering the destination nodes of users are different, we use cumulative average throughput, cumulative average delay, cumulative average bit error rate, and cumulative average visible time as metrics, which can more clearly and reasonably evaluate the overall performance of the algorithms.
(A) Cumulative average throughput analysis
Figure 13 shows the relationship between the cumulative average throughput and the number of users. We note from Figure 13 that as the number of users increases, the cumulative average throughput of all algorithms presents an upward trend to varying degrees. Moreover, we also find that SQLRA has the best performance and QLAODV has the worst performance. The main reason is that when selecting the next hop, SQLRA considers the impact of the next link on the current link, and it attains the global optimum, while QLAODV does not consider the visible time of links when selecting the next hop. Furthermore, we also know that the visible time of the satellite link has a great impact on the network performance. When selecting the next hop, OSPF considers the available bandwidth, bit error rate, delay, and visible time. However, it adopts greedy strategy to select the next hop, which is easy to fall into local optimum. Compared with OSPF, ACO algorithm is initialized by random strategy, and it achieves global optimum as much as possible in the process of routing.
[figure omitted; refer to PDF]
(B) Cumulative average delay analysis
Figure 14 presents the change of cumulative average delay with varying number of users. As the number of users increases, the cumulative average delay obtained by all algorithms is increasing. Because users have different destinations, the paths selected by all algorithms are different. With the number of paths increasing, the total delay of the selected paths in the network increases. Therefore, the cumulative delay of the paths also increases. We note that SQLRA performs better than other algorithms. The main reason is that when the user’s request is satisfied, SQLRA tends to choose the path with less delay, and it almost get the global optimum. We also observe that the performance of algorithms QSR, OSPF, and ACO is almost the same. The main reason is that although these algorithms all consider the delay of satellite links when computing the cost function, they use heuristic strategy to select the optimal path.
[figure omitted; refer to PDF]
(C) Cumulative average bit error rate analysis
Figure 15 demonstrates the cumulative average bit error rate with varying number of users. We find from Figure 15 that the performance of QSR is worst and that of OSPF is best. We observe that although the performance of SQLRA is not the best, it is close to that of OSPF. Moreover, we also note that the performance of QLAODV is not worst compared with the curve in Figure 11. The main reason is that due to the paths of users are different, QLAODV algorithm begins to choose a new path when a new user arrives. This operation reduces the probability of selecting the path with high bit error rate. In addition, QLAODV algorithm considers the bit error rate when selecting new path.
[figure omitted; refer to PDF]
(D) Cumulative average visible time analysis
Figure 16 shows cumulative average visible time with different number of users. From Figure 16, we see that as the number of users increases, the cumulative average visible time of all algorithms presents an upward trend to varying degrees. We also find that SQLRA performs better than other algorithms. When the number of users is 5, algorithms ACO and OSPF have the same performance. However, as the number of users increases, ACO performs better than OSPF. The main reason is that compared with ACO, OSPF is easier to fall into the local optimum. In the second scenario, the source nodes and destination nodes requested by users are different. As the number of paths increases, the total visible time of paths also increases. Therefore, the cumulative average visible time of the selected paths increases gradually.
[figure omitted; refer to PDF]
By testing our proposed SQLRA algorithm in two different cases, we find that SQLRA performs better than other algorithms. In addition, we also find that SQLRA not only has good performance but also has strong robustness. In practice, we train the SQLRA algorithm at the ground control center, which reduces the consumption of computing resources and storage resources of satellites. Therefore, SQLRA algorithm is very suitable for dynamic satellite networks.
6. Conclusions
In this paper, we investigated the routing problem in STINs. We considered that selecting different satellite routes has an essential impact on the QoS of users and satellite network performance. We modelled the routing problem as a finite-state Markov decision process and proposed a routing algorithm based on Q-learning (QLRA). In addition, to solve the problem of slow convergence speed of QLRA algorithm, we proposed a split-based speed-up convergence strategy and designed a speed-up Q-learning based routing algorithm (SQLRA) and adopted a back-to-front update scheme to further improve the convergence speed of SQLRA algorithm. Moreover, we evaluated the performance of SQLRA in two different scenarios. Experimental results show that SQLRA algorithm has the best communication performance compared with other routing algorithms.
Although SQLRA algorithm performs better than other routing algorithms in two different scenarios, it does not have the ability of online learning. When the number of users in satellite network changes or some satellites do not work well, SQLRA needs to retrain model. In the future research, we are going to use deep neural network to design a routing algorithm with online learning ability. When the number of users changes or the link state of satellite network changes, SQLRA algorithm is able to update the existing model and reduce the training time as much as possible. In addition, traffic scheduling and satellite handoff management are also our future research issues.
Acknowledgments
This work was supported by the National Science Foundation of China (No. 61772385).
[1] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, P. Popovski, "Five disruptive technology directions for 5G," IEEE Communications Magazine, vol. 52 no. 2, pp. 74-80, DOI: 10.1109/MCOM.2014.6736746, 2014.
[2] H. Yao, L. Wang, X. Wang, Z. Lu, Y. Liu, "The space-terrestrial integrated network: an overview," IEEE Communications Magazine, vol. 56 no. 9, pp. 178-185, DOI: 10.1109/MCOM.2018.1700038, 2018.
[3] N. U. L. Hassan, C. Huang, C. Yuen, A. Ahmad, Y. Zhang, "Dense small satellite networks for modern terrestrial communication systems: benefits, infrastructure, and technologies," IEEE Wireless Communications, vol. 27 no. 5, pp. 96-103, DOI: 10.1109/MWC.001.1900394, 2020.
[4] J. P. Choi, C. Joo, "Challenges for efficient and seamless space-terrestrial heterogeneous networks," IEEE Communications Magazine, vol. 53 no. 5, pp. 156-162, DOI: 10.1109/MCOM.2015.7105655, 2015.
[5] A. Guidotti, B. Evans, M. Di Renzo, "Integrated satellite-terrestrial networks in future wireless systems," International Journal of Satellite Communications and Networking, vol. 37 no. 2, pp. 73-75, DOI: 10.1002/sat.1292, 2019.
[6] B. Di, H. Zhang, L. Song, Y. Li, G. Y. Li, "Ultra-dense LEO: integrating terrestrial-satellite networks into 5G and beyond for data offloading," IEEE Transactions on Wireless Communications, vol. 18 no. 1, pp. 47-62, DOI: 10.1109/TWC.2018.2875980, 2019.
[7] K. An, T. Liang, "Hybrid satellite-terrestrial relay networks with adaptive transmission," IEEE Transactions on Vehicular Technology, vol. 68 no. 12, pp. 12448-12452, DOI: 10.1109/TVT.2019.2944883, 2019.
[8] A. A. Dowhuszko, J. Fraire, M. Shaat, A. Pérez-Neira, "LEO satellite constellations to offload optical terrestrial networks in placement of popular content in 5G edge nodes," 2020 22nd International Conference on Transparent Optical Networks, International Conference on Transparent Optical Networks-ICTON, .
[9] B. Fortz, M. Thorup, "Optimizing OSPF/IS-IS weights in a changing world," IEEE Journal on Selected Areas in Communications, vol. 20 no. 4, pp. 756-767, DOI: 10.1109/JSAC.2002.1003042, 2002.
[10] C. Hedrick, "Routing information protocol," Request for Comments, vol. 1058, 1988.
[11] S. Anamalamudi, A. R. Sangi, M. Alkatheiri, A. M. Ahmed, "AODV routing protocol for cognitive radio access based Internet of Things (IoT)," Future Generation Computer Systems-the International Journal of Escience, vol. 83, pp. 228-238, DOI: 10.1016/j.future.2017.12.060, 2018.
[12] W. Liu, Y. Tao, L. Liu, "Load-balancing routing algorithm based on segment routing for traffic return in LEO satellite networks," IEEE Access, vol. 7, pp. 112044-112053, DOI: 10.1109/ACCESS.2019.2934932, 2019.
[13] X. Qi, B. Zhang, Z. Qiu, "Joint rate control and load-balancing routing with QoS guarantee in LEO satellite networks," IEICE Transactions on Communications, vol. E103B no. 12, pp. 1477-1489, 2020.
[14] Z. Na, Z. Pan, X. Liu, Z. Deng, Z. Gao, Q. Guo, "Distributed routing strategy based on machine learning for LEO satellite network," Wireless Communications & Mobile Computing, vol. 2018,DOI: 10.1155/2018/3026405, 2018.
[15] M. Majd, R. Safabakhsh, "Correlational convolutional LSTM for human action recognition," Neurocomputing, vol. 396, pp. 224-229, DOI: 10.1016/j.neucom.2018.10.095, 2020.
[16] S. Ren, K. He, R. Girshick, J. Sun, "Faster R-CNN: towards real-time object detection with region proposal networks," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 39 no. 6, pp. 1137-1149, DOI: 10.1109/TPAMI.2016.2577031, 2017.
[17] C. X. Jiang, H. J. Zhang, Y. Ren, Z. Han, K. C. Chen, L. Hanzo, "Machine learning paradigms for next-generation wireless networks," IEEE Wireless Communications, vol. 24 no. 2, pp. 98-105, DOI: 10.1109/MWC.2016.1500356WC, 2017.
[18] N. Kato, Z. M. Fadlullah, B. Mao, F. Tang, O. Akashi, T. Inoue, K. Mizutani, "The deep learning vision for heterogeneous network traffic control: proposal, challenges, and future perspective," IEEE Wireless Communications, vol. 24 no. 3, pp. 146-153, DOI: 10.1109/MWC.2016.1600317WC, 2017.
[19] F. Tang, B. Mao, Z. M. Fadlullah, N. Kato, O. Akashi, T. Inoue, K. Mizutani, "On removing routing protocol from future wireless networks: a real-time deep learning approach for intelligent traffic control," IEEE Wireless Communications, vol. 25 no. 1, pp. 154-160, DOI: 10.1109/MWC.2017.1700244, 2018.
[20] R. F. Dhila, T. M. Hamdani, A. M. Alimi, "A multi objective particles swarm optimization algorithm for solving the routing pico-satellites problem," 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1402-1407, .
[21] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, "Human-level control through deep reinforcement learning," Nature, vol. 518 no. 7540, pp. 529-533, 2019.
[22] N. Zhao, Y.-C. Liang, D. Niyato, Y. Pei, M. Wu, Y. Jiang, "Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks," IEEE Transactions on Wireless Communications, vol. 18 no. 11, pp. 5141-5152, DOI: 10.1109/TWC.2019.2933417, 2019.
[23] C. Jiang, X. Zhu, "Reinforcement learning based capacity management in multi-layer satellite networks," IEEE Transactions on Wireless Communications, vol. 19 no. 7, pp. 4685-4699, DOI: 10.1109/TWC.2020.2986114, 2020.
[24] V. Kirilin, A. Sundarrajan, S. Gorinsky, R. K. Sitaraman, "RL-cache: learning-based cache admission for content delivery," IEEE Journal on Selected Areas in Communications, vol. 38 no. 10, pp. 2372-2385, DOI: 10.1109/JSAC.2020.3000415, 2020.
[25] R. Solozabal, J. Ceberio, A. Sanchoyerto, L. Zabala, B. Blanco, F. Liberal, "Virtual network function placement optimization with deep reinforcement learning," IEEE Journal on Selected Areas in Communications, vol. 38 no. 2, pp. 292-303, DOI: 10.1109/JSAC.2019.2959183, 2020.
[26] J. Liu, R. Z. Luo, T. Huang, C. W. Meng, "A load balancing routing strategy for LEO satellite network," IEEE Access, vol. 8, pp. 155136-155144, DOI: 10.1109/ACCESS.2020.3017615, 2020.
[27] S. Geng, S. Liu, Z. Fang, S. Gao, "An optimal delay routing algorithm considering delay variation in the LEO satellite communication network," Computer Networks, vol. 173, article 107166,DOI: 10.1016/j.comnet.2020.107166, 2020.
[28] S. Wei, H. Cheng, M. Liu, M. Ren, "Optimal strategy routing in LEO satellite network based on cooperative game theory," International Conference on Space Information Network, pp. 159-172, .
[29] Z. Jiang, C. Liu, S. He, C. Li, Q. Lu, "A QoS routing strategy using fuzzy logic for NGEO satellite IP networks," Wireless Networks, vol. 24 no. 1, pp. 295-307, DOI: 10.1007/s11276-016-1326-8, 2018.
[30] T. Pan, T. Huang, X. Li, Y. Chen, W. Xue, Y. Liu, "OPSPF: orbit prediction shortest path first routing for resilient LEO satellite networks," ICC 2019-2019 IEEE International Conference on Communications (ICC), .
[31] L. Hao, P. Ren, Q. Du, "Satellite QoS routing algorithm based on energy aware and load balancing," 2020 12th International Conference on Wireless Communications and Signal Processing, pp. 685-690, .
[32] Z. Ji, S. Wu, C. Jiang, D. Hu, W. Wang, "Energy-efficient data offloading for multi-cell satellite-terrestrial networks," IEEE Communications Letters, vol. 24 no. 10, pp. 2265-2269, DOI: 10.1109/LCOMM.2020.3003671, 2020.
[33] Z. Zhang, W. Zhang, F.-H. Tseng, "Satellite mobile edge computing: improving QoS of high-speed satellite-terrestrial networks using edge computing techniques," IEEE Network, vol. 33 no. 1, pp. 70-76, DOI: 10.1109/MNET.2018.1800172, 2019.
[34] N. Torkzaban, A. Gholami, J. S. Baras, C. Papagianni, "Joint satellite gateway placement and routing for integrated satellite-terrestrial networks," ICC 2020-2020 IEEE International Conference on Communications (ICC), .
[35] H. Xu, D. Li, M. Liu, G. Han, C. Xu, "A hybrid routing algorithm in terrestrial-satellite integrated network," 2020 IEEE/CIC International Conference on Communications in China (ICCC), pp. 90-95, .
[36] Q. Guo, R. Gu, T. Dong, J. Yin, Z. Liu, L. Bai, Y. Ji, "SDN-based end-to-end fragment-aware routing for elastic data flows in LEO satellite-terrestrial network," IEEE Access, vol. 7, pp. 396-410, DOI: 10.1109/ACCESS.2018.2885473, 2019.
[37] Y. Liu, D. Lu, G. Zhang, J. Tian, W. Xu, "Q-learning based content placement method for dynamic cloud content delivery networks," IEEE Access, vol. 7, pp. 66384-66394, DOI: 10.1109/ACCESS.2019.2917564, 2019.
[38] S. Pan, P. Li, D. Zeng, S. Guo, G. Hu, "A Q-learning based framework for congested link identification," IEEE Internet of Things Journal, vol. 6 no. 6, pp. 9668-9678, DOI: 10.1109/JIOT.2019.2930459, 2019.
[39] Z. Wei, F. Liu, Y. Zhang, J. Xu, J. Ji, Z. Lyu, "A Q-learning algorithm for task scheduling based on improved SVM in wireless sensor networks," Computer Networks, vol. 161, pp. 138-149, DOI: 10.1016/j.comnet.2019.06.006, 2019.
[40] G. Qiao, S. Leng, S. Maharjan, Y. Zhang, N. Ansari, "Deep reinforcement learning for cooperative content caching in vehicular edge computing and networks," IEEE Internet of Things Journal, vol. 7 no. 1, pp. 247-257, DOI: 10.1109/JIOT.2019.2945640, 2020.
[41] M. Alhabo, L. Zhang, N. Nawaz, "GRA-based handover for dense small cells heterogeneous networks," IET Communications, vol. 13 no. 13, pp. 1928-1935, DOI: 10.1049/iet-com.2018.5938, 2019.
[42] C. Wu, K. Kumekawa, T. Kato, "Distributed reinforcement learning approach for vehicular ad hoc networks," IEICE Transactions on Communications, vol. 93-B no. 6, pp. 1431-1442, 2010.
[43] T. Li, H. Zhou, H. Luo, S. Yu, "SERvICE: a software defined framework for integrated space-terrestrial satellite communication," IEEE Transactions on Mobile Computing, vol. 17 no. 3, pp. 703-716, DOI: 10.1109/TMC.2017.2732343, 2018.
[44] H. Alayed, F. Dahan, T. Alfakih, H. Mathkour, M. Arafah, "Enhancement of ant colony optimization for QoS-aware web service selection," IEEE Access, vol. 7, pp. 97041-97051, DOI: 10.1109/ACCESS.2019.2927769, 2019.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2021 Yabo Yin et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Satellite-terrestrial integrated network (STIN) is an indispensable component of the Next Generation Internet (NGI) due to its wide coverage, high flexibility, and seamless communication services. It uses the part of satellite network to provide communication services to the users who cannot communicate directly in terrestrial network. However, existing satellite routing algorithms ignore the users’ request resources and the states of the satellite network. Therefore, these algorithms cannot effectively manage network resources in routing, leading to the congestion of satellite network in advance. To solve this problem, we model the routing problem in satellite network as a finite-state Markov decision process and formulate it as a combinatorial optimization problem. Then, we put forth a Q-learning-based routing algorithm (QLRA). By maximizing users’ utility, our proposed QLRA algorithm is able to select the optimal paths according to the dynamic characteristics of satellite network. Considering that the convergence speed of QLRA is slow due to the routing loop or ping-pong effect in the process of routing, we propose a split-based speed-up convergence strategy and also design a speed-up Q-learning-based routing algorithm, termed SQLRA. In addition, we update the
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer





