Full text

Turn on search term navigation

This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

1. Introduction

With the rapid development of artificial intelligence technology, intelligence has become an important direction of the development of various application fields, and machine learning has made significant progress in many fields, especially the success of AlphaGo and AlphaZero technology in Go human-computer game, making machine learning method to solve traditional problems become a new way [1].

Wireless ad hoc network is to build the communication network coverage within the application scenario area according to the communication support requirements of users and application scenarios and provide users with random access communication channels. The main work is to reasonably select the deployment location of mobile nodes and the connection relationship between mobile nodes [2]. Therefore, the deployment process of wireless ad hoc mobile nodes can be analogized to the game playing process of both sides of Go game, mobile game. The nodes and users can be regarded as the black and white pieces of Go, and the grid terrain area in the application scene can be regarded as the board of Go. On this basis, we explore the application of machine learning in wireless ad hoc network and build a deep reinforcement learning model of wireless ad hoc network based on the AlphaZero algorithm, so as to realize the intelligent deployment of mobile node location. The key and difficulty of the method are to generate a large number of high-quality sample data through the continuous self-play of the model. In the formal scenario application, the best deployment probability of mobile nodes can be predicted by sampling the sample data for supervised learning. The important foundation of machine learning comes from the sample data. Compared with the sample collection of chess game data, the data accumulation of wireless ad hoc network and other application fields is less, and the amount of data available is limited. After the model algorithm is built, the focus is how to generate a large number of high-quality and type rich data samples through the model of self-play to guide the model optimization and realize the best prediction of the deployment location and networking of mobile nodes in practical application.

2. Analysis of Traditional Algorithm Model

At present, for small and typical wireless ad hoc networks, it can be realized by mature network topology automatic planning technology; for large and heterogeneous mesh networks and wireless ad hoc networks under special application scenarios, it mainly uses network planning tool software to carry out static planning in advance and dynamically fine tune and optimize the adaptation in practical application.

2.1. Review of Traditional Algorithms

At present, the algorithm model of multiobjective heterogeneous wireless network mainly abstracts the connection relationship between mobile nodes and nodes into points and lines. Based on graph theory, it simulates, analyzes, and optimizes by setting approximate assumptions and constraints and designing various channel models, traffic models, wireless models, and link models. It mainly includes 4 categories: (a) Optimize design algorithms around network hierarchical structure, topological structure, etc., and design and improve algorithms for different types of network structures such as mesh, tree, and star. For example, for the problem of dense and dense network structures, a K-means-based algorithm is the proposed random graph topology generation algorithm and hierarchical structure topology generation algorithm; for the construction of wireless sensor network plane topology structure, a network topology optimization algorithm based on a Voronoi diagram is proposed; for multihop packet wireless network structure, the shortest route tree table based on node switching is adopted. The method updates and optimizes the network topology. Aiming at the problem of network structure loss, taking the node degree and connectivity as constraints proposes a hierarchical network topology planning algorithm. In addition, it also focuses on network node mobility and network connectivity, links one or more indicators such as reliability and network capacity, sets approximate assumptions and constraints, and analyzes and studies the network topology by constructing various channel models, traffic models, mobility models, and link models. Typical algorithms include the Minimum Spanning Tree Algorithm (MST), Shortest Path Algorithm (SPT), Delaunay Triangulation Algorithm (DT), and Voronoi Diagram Algorithm [3, 4]. (b) Construct a multiobjective optimization function that focuses on multiobjective optimization combination algorithms that focus on user communication requirements, network coverage, deployment costs, network service quality, and other specific communication requirements. For example, around network connectivity, network fault tolerance, network throughput, and other index requirements, respectively, construct a wireless mesh network topology control model based on the conflict domain; around the wireless mesh backbone network topology throughput, propose a minimum spanning tree and conflict load joint topology control algorithm; propose a network topology planning method based on probability statistics based on indicators such as network coverage and network connectivity; based on network index systems and weights, propose a network topology planning method based on performance and effectiveness evaluation [5]. (c) Optimize artificial intelligence methods and strategies, focusing on solving multiobjective heterogeneous wireless network planning problems to build algorithm models, mainly including optimized search space algorithms, random search algorithms, intelligent algorithms, and improvements and combination algorithms based on the above algorithm models. For example, based on the heuristic search algorithm model, the network nodes and links are designed with the goals of optimizing the path loss in the wireless link and optimizing node deployment; abstract the node deployment problem in the network topology as the K center point problem in geometric mathematics. Optimize the links between nodes by constructing an improved particle swarm algorithm model; optimize network connectivity and network coverage by constructing simulated annealing algorithm models and genetic algorithm models; optimize wireless mesh network nodes by constructing an ant colony algorithm model. Optimize deployment; use the tabu search algorithm model to design a global optimization combination strategy and taboo table for the deployment of wireless mesh network nodes to achieve the global optimization of the network topology. Through a greedy algorithm, simulated annealing algorithm, tabu table algorithm, heuristic search algorithm, ant colony algorithm, particle swarm optimization algorithm, genetic algorithm, etc., there is intelligent algorithm optimization, improvement, and reorganization [6]. (d) Draw lessons from the concepts of complex networks, super networks, and fields in physics, and explore and study cross-domain cross-combination algorithms around the importance of network nodes and edges, information, and interaction between nodes. For example, the concepts of field and hypergraph in physics are introduced into complex networks and hypernetworks, and the network topology is optimized through the process of information interaction between nodes. In addition, in the application of artificial intelligence methods, it is proposed to use deep learning to intelligently plan wireless ad hoc network topology [7, 8].

2.2. Shortage of Traditional Algorithm

The static preplanning method is often used to optimize the topology design algorithm. The model is set for the fixed scene; the index system is greatly constrained by the conditions and cannot be adjusted flexibly and dynamically according to the scene changes effectively. (a) The accuracy of the network cannot be quantitatively evaluated, especially for large-scale network applications. The dynamic adjustment adaptability of the model is not enough, and the planning time is too long. Based on the multiobjective combination modeling method, restricted by the constraints, it can only be implemented according to different communication means or groups. The number of indicators that can be concerned is limited, and it cannot give full consideration to all indicator systems. (b) The accuracy of networking cannot be quantitatively evaluated. In the index optimization decision-making, it is easy to lose one or the other, and the universality is not high. Based on genetic algorithm, artificial bee colony algorithm, particle swarm algorithm, and other artificial intelligence network topology optimization algorithms are mostly aimed at the convergence speed, accuracy, and robustness of the model, which are commonly used in simulation analysis and verification, and the real transformation application is still relatively small. (c) The research of network topology based on complex network theory cannot fully solve the problem of multinetwork interaction in wireless ad hoc networks and cannot fully represent the characteristics of network structure and effectively reflect the function of node type.

3. Model of Algorithm

Based on the analogy between wireless ad hoc network and board games such as Go [9, 10], and referring to the AlphaZero algorithm framework, a deep reinforcement learning algorithm for wireless ad hoc network in typical application scenarios with full visibility is constructed [11].

3.1. Algorithm Principle

According to the principle of the AlphaZero algorithm [12], a deep neural network model $p, v = f_{θ} s$ with parameters $θ$ and the evaluation index system of network system effect are constructed $z \in - 1, 1$ . Taking the user location and the location state distribution of mobile nodes as the input $s$ , the deployment location probability $p$ of mobile nodes and the evaluation $v$ of network system effect of wireless ad hoc network are output. As shown in formula (1), MCTS search is used for heuristic search and optimization through the interaction of neural network and MCTS: on the one hand, MCTS is guided to perform heuristic search according to the maximum deployment location probability predicted by a neural network; on the other hand, the maximum location probability predicted by MCTS search reacts on the weight update of neural network and forecasts the current again [13]. The next best deployment location of mobile nodes in grid map is to maximize the similarity between the prediction probability of optional location neural network and the search probability of MCTS and to minimize the difference between the network deployment effect $v_{t}$ and the deployment success $z_{t}$ [14]. $\begin{matrix} (1) & l = {z - v}^{2} - π^{2} \log p + c θ^{2} . \end{matrix}$

Specifically, according to formula (2), the gradient descent method is used to update the parameters $θ$ of the neural network in every $t$ iterations. According to formula (3), the weights of the neural network are updated and the sample data set is optimized $s, p, v$ . In the formal deployment, the weights of the neural network are optimized by the sample set to guide the prediction of the maximum location probability of mobile nodes, and the wireless ad hoc network with high quality and satisfying requirements is gradually generated. Figure 1 is shown the mathematical model of deploying mobile nodes. $\begin{matrix} (2) & ∆ ρ \propto \frac{\partial v_{θ} s}{\partial θ} z - v_{θ} s, \\ (3) & ∆ θ \propto \frac{\partial \log p_{ρ} a_{t} ∣ s_{t}}{\partial θ} z_{t} . \end{matrix}$

[figure omitted; refer to PDF]

Through the nesting and looping of the above submodules, the weights of the neural network are updated and iterated step by step, so as to continuously improve the accuracy and scientificity of the neural network function fit and prediction results.

5.3.2. Model Training

The change of model loss function reflects the construction effect of reinforcement learning iterative feedback mechanism. The change of loss function and cross-entropy reflects the interaction of neural network and heuristic search algorithm. The change of model optimization convergence can be seen through the change of loss function and cross-entropy. Figure 8 shows the changes in the model loss function and the cross-entropy in the strategy network in the four grid sizes in the case of 1000 training samples. The specific analysis is as follows.

[figure omitted; refer to PDF]

Overall analysis: under the four grid sizes, although the loss function and strategy network cross-entropy fluctuate in sample training batches, the overall trend of change is gradually decreasing. It shows that the model training process is designed reasonably and the method is feasible.

Loss function analysis: in $6 \times 6$ , $8 \times 8$ , and $10 \times 10$ grid sizes, after model sampling training is 500 times, 700 times, and 800 times, the loss function basically remains at about 2.0, 2.4, and 2.5., which shows that the basic training of the model is completed under the above three grid sizes. However, after the model has been sampled and trained 1000 times under the $16 \times 16$ grid size, the loss function still continues to change and has not stabilized. It can be judged that the model can converge quickly and show a trend of gradual improvement under small grid sizes; as the number of grids increases, model convergence and parameter tuning take longer, and the training effect is not obvious.

Strategy function cross-entropy analysis: the strategy function cross-entropy directly reflects the difference between the actual probability and the expected probability in the training of the communication guarantee unit. Similar to the change of the loss function, the cross-entropy of the function gradually decreases and tends to stabilize under the first three grid sizes, indicating that as the sample sampling batch increases, the neural network predicts the probability of the deployment location of the communication guarantee unit from random. Approximately equal-probability gradually showed a trend of uneven distribution step by step, and the result prediction gradually showed a trend, indicating that the model gradually tends to converge, and the design of the interaction mechanism between neural network and heuristic search in network topology planning is feasible.

5.3.3. Design Aspects of Evaluation Indicators and Evaluation Methods

In this study, we adopted the method of regularly testing the effect of model network topology planning. The effect of model network topology planning was analyzed every 20 training sample batches. Analyze and evaluate the effect of network topology planning to test the effectiveness of the proposed evaluation method. Figure 9 shows that the model is tested for 20 network topology planning results after every 20 sampling training during the model training process. The specific analysis is as follows.

[figure omitted; refer to PDF]

Evaluation index analysis: the network topology planning results under the four grid sizes meet the average network connectivity times of 17/20, 12/20, 10/20, and 6/20. It shows that the model can train the optimization model through the whole network connectivity index and realizes the improvement of the prediction accuracy probability of the communication guarantee unit site selection. It shows that the index decomposition method is adopted and the method of using the whole network connectivity index as the model evaluation index is feasible.

Evaluation method analysis: the effect of the evaluation method in machine learning cannot be directly observed, but the horizontal comparison and analysis of the network topology planning results under the four grid sizes show that the evaluation method is basically feasible and available. At the same time, as the number of grids increases, the number of times that the network topology planning result meets full connectivity gradually decreases, which indicates that the efficiency of constructing the connectivity relationship between node pairs in the evaluation method increases. When the model has a large solution space, the realization of the function requires further improvement methods.

6. Conclusion

Compared with traditional planning methods, the AI deep learning method does not solidify knowledge in an algorithm model but achieves the abstraction and understanding of knowledge through self-learning, reduces the interference of human factors, and improves scientificity. The application of artificial intelligence technology is based on large-scale sample data. In this study, by migrating the AlphaZero algorithm framework, a new mobile node deployment algorithm model under the condition of complete intervisibility is constructed as the specific practice exploration of artificial intelligence in the typical application of mobile wireless ad hoc network. The difficulty and key work are how to generate rich and comprehensive sample data through self-play, so the design of the self-play process is the core and foundation. Referring to the main methods of the AlphaZero algorithm and comparing the different points in the mapping between Go and wireless ad hoc network, this study solves the difficult problems that affect the training sample data generation, such as model optimization, data collection, and model convergence, realizes the migration application of AlphaZero technology in wireless ad hoc network under the condition of full visibility, and provides the next step for the model exploration and application in complex terrain. The basis of the research is given.

References

[1] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, "Mastering chess and shogi by self-play with a general reinforcement learning algorithm," 2017. https://arxiv.org/abs/1712.01815

[2] P. Santi, "Topology control in wireless ad hoc and sensor networks," ACM Computing Surveys, vol. 37 no. 2, pp. 164-194, DOI: 10.1145/1089733.1089736, 2005.

[3] E. Amaldi, A. Capone, M. Cesana, I. Filippini, F. Malucelli, "Optimization models and methods for planning wireless mesh networks," Computer Networks the International Journal of Computer & Telecommunications Networking, vol. 52 no. 11, pp. 2159-2171, 2008.

[4] H. Kim, E. C. Park, S. K. Noh, S. B. Hong, "Angular MST-based topology control for multi-hop wireless ad hoc networks," ETRI Journal, vol. 30 no. 2, pp. 341-343, DOI: 10.4218/etrij.08.0207.0249, 2008.

[5] A. Noack, P. B. Bok, S. Kruck, "Evaluating the impact of transmission power on QoS in wireless mesh networks," 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN), .

[6] M. E. Newman, S. H. Strogatz, D. J. Watts, "Random graphs with arbitrary degree distributions and their applications," Physical review E, vol. 64 no. 2,DOI: 10.1103/PhysRevE.64.026118, 2001.

[7] S. Sakamoto, E. Kulla, T. Oda, M. Ikeda, L. Barolli, F. Xhafa, "A comparison study of simulated annealing and genetic algorithm for node placement problem in wireless mesh networks," Journal of Mobile Multimedia, vol. 9 no. 2, pp. 101-110, 2013.

[8] N. N. G. Le HD, N. H. Dinh, N. D. Le, V. T. Le, "Optimizing gateway placement in wireless mesh networks based on ACO algorithm," International Journal of Computer & Communication Engineering, vol. 2 no. 2, pp. 45-53, 2013.

[9] O. E. David, N. S. Netanyah, "End-to-end deep neural network for automatic learning in chess," International Conference on Artificial Neural Networks, pp. 88-96, .

[10] C. Clark, A. J. Storkey, "Training deep convolutional neural networks to play Go," International conference on machine learning, vol. 37, pp. 1766-1774, .

[11] X. Zou, R. Yang, C. Yin, Z. Nie, H. Wang, "Deploying tactical communication node vehicles with AlphaZero algorithm," IET Communications, vol. 14, 2019.

[12] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, "Mastering the game of Go without human knowledge," Nature, vol. 550, pp. 354-359, 2017.

[13] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, pp. 484-489, 2016.

[14] R. Coulom, "Efficient selectivity and backup operators in Monte-Carlo tree search," International conference on computers and games, pp. 72-83, .

[15] C. H. Liu, S. Y. Kuo, D. T. Lee, C. S. Lin, J. H. Weng, S. Y. Yuan, "Obstacle-avoiding rectilinear Steiner tree construction: a Steiner-point-based algorithm," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31 no. 7, pp. 1050-1060, 2012.

[16] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518 no. 7540, pp. 529-533, DOI: 10.1038/nature14236, 2015.

[17] D. Silver, L. Newnham, D. Barker, S. Weller, J. McFall, "Concurrent reinforcement learning from customer interactions," International conference on machine learning, vol. 28, pp. 924-932, .

[18] C. Finn, P. Christiano, P. Abbeel, S. Levine, "A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models [EB/OL]," 2016. https://arxiv.org/abs/1611.03852

[19] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," International conference on machine learning, vol. 48, pp. 1928-1937, .

[20] S. Loffe, C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," International conference on machine learning, vol. 37, pp. 448-456, .

[21] D. Perez, P. Rohlfshagen, S. M. Lucas, "Monte-Carlo tree search for the physical travelling salesman problem," European Conference on the Applications of Evolutionary Computation, pp. 255-264, .

[22] A. Doerr, N. D. Ratliff, J. Bohg, M. Toussaint, S. Schaal, "Direct loss minimizaton inverse optimal control," Molecular Ecology, vol. 23 no. 10, pp. 2602-2618, 2015.

Word count: 3292

Show less

Copyright © 2021 Huitao Wang et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Deep reinforcement learning is one kind of machine learning algorithms which uses the maximum cumulative reward to learn the optimal strategy. The difficulty is how to ensure the fast convergence of the model and generate a large number of sample data to promote the model optimization. Using the deep reinforcement learning framework of the AlphaZero algorithm, the deployment problem of wireless nodes in wireless ad hoc networks is equivalent to the game of Go. A deployment model of mobile nodes in wireless ad hoc networks based on the AlphaZero algorithm is designed. Because the application scenario of wireless ad hoc network does not have the characteristics of chessboard symmetry and invariability, it cannot expand the data sample set by rotating and changing the chessboard orientation. The strategy of dynamic updating learning rate and the method of selecting the latest model to generate sample data are used to solve the problem of fast model convergence.

Details

Title

Research on the Difficulty of Mobile Node Deployment’s Self-Play in Wireless Ad Hoc Networks Based on Deep Reinforcement Learning

Author

Wang, Huitao¹

; Yang, Ruopeng¹; Yin, Changsheng¹; Zou, Xiaofei¹; Wang, Xuefeng²

¹ College of Information and Communication, National University of Defense Technology, No. 45 JieFang Park Road, Wuhan Hubei, China
² College of Army Logistics, No. 20 North 1st Road, University Town, Shapingba District, Chongqing, China

Editor

KI-IL Kim

Publication year

2021

Publication date

2021

Publisher

John Wiley & Sons, Inc.

e-ISSN

15308677

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2021/4361650

ProQuest document ID

2503353021

Research on the Difficulty of Mobile Node Deployment’s Self-Play in Wireless Ad Hoc Networks Based on Deep Reinforcement Learning

Jump to:

Full text

Abstract

Details

Suggested sources