This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
The intelligent water transportation system (IWTS) is to monitor and manage ships that sail on water [1]. The ship recognition that recognizes ships target sailing or parking on the water is the key component of IWTS in smart cities. In contrast to traditional intelligent road transportation system (IRTS) [2], IWTS is confronted with (1) various types of ships, (2) high deployment cost, and (3) complex settings. Thus, ship recognition is more challenging than vehicle recognition. To achieve ship recognition, a series of literature [3–6] have been investigated and explored.
In terms of the technology used in ship identification, it can be roughly divided into two categories: (1) ship recognition assisted by specialized equipment [3], such as Synthetic Aperture Radar (SAR), Automatic Identification System (AIS), and (2) ship recognition through common equipment [4], such as the camera. In general, the former can cover a wider scope and is suitable for marine or coastal settings, but is faced with a high deployment cost. Meanwhile, the latter does not require ships to install any extra equipment, which is expense-friendly for ships. Besides, some small vessels do not install AIS to save money or turn off AIS to avoid monitor. Furthermore, SAR is effective to capture images of large ships but is unable to capture images of small vessels. Thus, recent ship recognition systems [4, 5, 7, 8] employ cameras to capture images of ships and further adopts the computer vision technique [9, 10] to implement ship recognition.
Although ship recognition systems based on the computer vision technique have an advantage in deployment cost and small target recognition, they still provide coarse-grained services. In other words, prior ship recognition systems [4–8] only support ship target detection and classification, which fails to satisfy the increasing requirements of IWTS, such as tracking a given ship. Ship tracking provides crucial on-site microscopic kinematic traffic information which benefits traffic flow analysis, ship safety enhancement, traffic control, etc. [11, 12].
Given a monitored person image, person reidentification (Re-ID) was aimed at retrieving the location of the specific person in images or videos given the query image. It can be treated as a subproblem of image retrieval [13] using computer vision technology to determine whether there is a specific person in an image or video sequence, which is regarded as a subproblem of image retrieval [13]. Inspired by person Re-ID [13] and vehicle Re-ID [14], we adopt the Re-ID technology to recognize ships and retrieve a given ship. Different from person reidentification or vehicle reidentification, which has been studied thoroughly before, ship reidentification (ship Re-ID) is still underexplored. The ship Re-ID is quite different from the person Re-ID and vehicle Re-ID. The differences can be concluded into three-folds: (1) The shape difference: different from the coherent shape of the human body and vehicle, the shape of different ships can be quite different, for example, the shape of different types of ships also varies greatly. (2) The size difference: in person Re-ID, human bodies are always about the same size, which can enable the input image to have same resolution. However, the sizes of the different ships can be largely different from each other. For example, the sizes of the canoe and steamship are quite different. This phenomenon makes the designed network learn difficultly with different resolution image inputs. (3) The appearance change: different from the human always on the same clothes captured by the cameras, the appearance of the ship can be changed in a short time. For instance, the facilities can be removed from their original position in a short space of time.
Considering the above-mentioned problem, we propose a Fine-Grained Ship Retrieval NETwork (FGSR-Net) to address the ship Re-ID problem. Specifically, to address the problem of difference in shape and size, the proposed FGSR-Net employs a pyramid structure to extract the feature of the input image, which can handle the different resolution inputs. With regard to the problem of appearance change, we designed an occlusion attention mechanism to produce an occlusion map to represent the region in which the two images remain consistent. With the help of the occlusion map, we can compute the similarity of the areas that have not changed, while we can compare the areas that have changed. Combining these two similarities, we can retrieve the corresponding ship was given a ship image.
To facilitate the research of the ship Re-ID, we also propose a new large-scale fine-grained ship retrieval dataset (named FGSR) that consists of 30,000 images of 1000 ships captured in the wild. We set up high-definition cameras on the banks of the river to capture images of ships passing by and sent out a communication request to the ships to ask if we can use their ship’s images for research purposes only.
In a word, we have three contributions:
(i) We construct a new large-scale fine-grained ship retrieval dataset (named FGSR) that consists of 30,000 images of 1,000 ships captured in the wild for the intelligent water transportation system in smart cities
(ii) We propose a novel fine-grained ship reidentification network (FGSR-Net) to address the ship Re-ID problem. FGSR-Net adopts a pyramid structure to address the shape and size problem, while it also employs an occlusion attention module to tackle the problem of appearance change
(iii) Our extensive experiments show the outperformance compared with existing state-of-the-art methods from relevant fields
2. Related Work
The related work of this paper consists of ship target recognition and target reidentification. Thus, we introduce the existing schemes of ship target recognition and target reidentification.
2.1. Ship Target Recognition
Ship target recognition usually adopts SAR image, AIS data, or video streaming. Karabayır et al. [5] pointed out the importance of a training library and proposed a ship target recognition scheme based on k-nearest-neighbour (KNN) classifier. Chaturvedi et al. [3] integrated SAR image and AIS data to identify ships at an area, which required ships to install radar and AIS equipment. However, some illegal ships deliberately shut off the radar or AIS systems. Nowadays, researchers tried to utilize video streaming and deep learning to recognize ships. To overcome the influences of ship background, object occlusion, variations of weather, and light conditions for target ship recognition, Zhao et al. [4] proposed a two-stage neural network (DCNet) to detect and recognize ships from video streaming, where one neural network was used to detect ships and the other one was used to recognize ships. Cao et al. [6] also adopted two neural networks to recognize ships, where a convolutional neural network (CNN) was used to extract the ship image features and KNN-SVM was utilized to train to recognize ships. In contrast to DCNet, Fu et al. [7] realized the ship target recognition based on the single-stage neural network, faster regions with CNN (F-RCNN), which only required a single-stage to recognize a target. After that, Fu et al. [15] carefully improved the F-RCNN to improve the accuracy of detection. Specifically, in [15], they extract the feature of target using ResNet [16] and optimize the F-RCNN [17] using batch normalization layer. Considering a complex marine environment, Zou et al. [8] combined the hard example mining technology and F-RCNN and replaced VGG16 with the ResNet just like the work in [15]. Cao et al. [6] adopted the single-stage target recognition network, YOLO, to recognize ships and analyze the ship behaviours.
Different from these methods for ship target recognition, which only extracts the location information of ships from the input images, our method can differentiate different ships by reidentification.
2.2. Ship Reidentification
Ship reidentification is pretty similar to person reidentification (Re-ID). Geng et al. [13] considered the person Re-ID task a classification/identification task and a verification task and adopted classification/identification loss and verification loss to train a classification subnet and a verification subnet. Varior et al. [18] used the Siamese network to implement the person Re-ID by comparing two similarities of the two photos. Intuitively, it is easier to produce discriminative representations by combining the global feature and local feature of a person. Wang et al. [19] proposed, a multibranch deep network architecture, Multiple Granularity Network (MGN), which consists of a global branch and two local branches. MGN was regarded as the state-of-the- art method from Person Re-Id. Huang et al. [14] applied person Re-ID technology to vehicle reidentification and designed a deep feature fusion with multiple granularity (DFFMG) method. DFFMG consists of one branch for global feature representations, two for vertical local feature representations, and the other two horizontal ones.
In this paper, we extend the person Re-ID technology to ship reidentification and carefully design a novel multiple granularity networks. Different from existing Re-ID methods with prior knowledge on human, our method explicitly considers the characteristic (shape, size, and appearance) of ships in our model design.
3. Our Proposed Dataset and Method
3.1. FGSR: A New Large-Scale Fine-Grained Ship Retrieval Dataset
3.1.1. Camera Selection
In this work, to obtain a high-resolution image of ship targets, we select camera DS-2DY9240IX-A(T5) of Hikvision to capture the image. We report the key parameters of that camera in Table 1.
Table 1
The partial parameter of our adapted camera.
Camera | Image sensor | 1/1.8 progressive scan CMOS |
Shutter speed | 1/1 s ∼1/30,000 s | |
Focus | Auto; semiauto; manual | |
WDR | 140 dB WDR | |
Digital zoom | 16x | |
Optical zoom | 40x | |
Lens | Focal length | 6.0 mm to 240 mm, |
Zoom speed | Approx.5.6 s | |
Aperture | F1.2 | |
Illuminator | IR distance | Up to 400 m IR distance |
PTZ | Movement range (Pan) | 360 |
Movement range (Tilt) | +40 ∼ -90 | |
Presets | 300 | |
Patrol scan | 8 patrols | |
Pattern scan | Pattern scans |
3.1.2. The Difficulties of Dataset Collection
In the ship Re-ID task, there is no available dataset to interstate this task. A large dataset is difficult to collect for three reasons. (1) The image of a specific ship is private, we can collect the image of ships only with their owner’s permission. (2) It is a huge cost to collect ship images. Some of the ships are far from shore, so we often need some expensive high-definition cameras to capture them. (3) Time consuming, we need to set up the cameras in a specific position and take a snapshot of the ship on the water surface.
3.1.3. The Design Scheme of Dataset Collection
In the stage of dataset collection, to reduce labour costs, we design an automatic ship capture system (ASCS). We show the diagram of the ASCS in Figure 1. In ASCS, we deploy two cameras, i.e., global camera and local camera. Specifically, we first utilize the global camera to spy on the large area of the shipping lane. The local camera will focus on the detailed texture of the ship targets detected by the global camera. In this system, we employ the YOLO detector [20] as the ship detection engine in the global camera. In this system, the positions of two cameras (i.e., global camera and local camera) are carefully tuned to ensure the local camera can localize the ship position with the detected position of the ship provided by the global camera. In order to cover the entire waterway, we can set up more than one ASCS. The detailed ship images captured by local cameras are stored in our database for further processing.
[figure(s) omitted; refer to PDF]
3.1.4. The Detail of the Collected Dataset
Due to the lack of datasets for ship retrieval, we deploy 50 ASCS nearby different waterways and collect about one million images of ship targets for further usage. The captured images are in
[figure(s) omitted; refer to PDF]
3.1.5. Experimental Setting
We extract the representations of these images using the proposed FSRN, after which we split these representations into two sets, i.e., query set and gallery set. In our experiment setting, the gallery set contains 23,156 ship targets, while there are 2748 feature vectors in the query target.
3.2. FGSR-Net: A Multioriented Ship Reidentification Network
With the challenges we mentioned above, we proposed a fine-grained ship reidentification network (FGSR-Net) that contains three modules to address these problems (i.e., the problem of variant size and shape, the area of ship image is changeable). And the last module, multibranch identity module, not only captures the global information of ship targets but also recognizes their details of them in both the horizontal and vertical directions.
3.3. Overview
In this work, we propose a novel fine-grained ship reidentity network (FGSR-Net) to address the problem of ship Re-ID. In our proposed FGSR-Net, the three-module consist of the main contribution of this paper. (1) In the pyramid fusion module, as the size and the shape of different ships are quite different, to enable the model handle the different resolution inputs, we utilize the feature extraction convolutional network to extract pyramid feature maps to represent the different levels of spatial information. After that, we employ a pyramid fusion module to aggregate these different size feature maps to obtain a semantically strong representation. (2) We also propose an occlusion module to predict an occlusion map to indicate the area that can be changed in different time slots. Then, we can mainly compare the unchangeable area of the ship. (3) Combine the physical shape of the ship, we employ a multibranch identity module to capture the texture information in different orientations, i.e., a global branch to capture the whole information of the ship target, a vertical branch, and a horizontal branch to recognize the details of ship targets from the horizontal and vertical perspectives, respectively.
3.4. Pyramid Fusion Module
As we mentioned above, the size and ship difference of ship targets are the main obstacles of ship Re-ID. Inspired by FPN [21], we can also employ a similar structure to address the different size and shape problems.
As shown in Figure 3, the feature extraction convolutional layers firstly extract a pyramid feature maps
[figure(s) omitted; refer to PDF]
From the former stage, while the first stage is the original feature maps
where the UP (·) means the upsample operator, while the
With a coarser-resolution feature map
3.5. Occlusion Module
Different from the person Re-ID problem, the appearance of ship targets can be changed in different time slots. For example, the facilities can be moved in the ship and the cargo carried by ships may vary from time to time. This phenomenon makes ship retrieval become difficult. To tackle this problem, we propose an occlusion module to produce an occlusion map, which aims at identifying the areas of ship targets that are changeable.
Figure 5 shows the architecture of the proposed occlusion module. Our occlusion module consists of two major module: a transformer-based spatial feature extractor to extract long-range occlusion-aware features and a direction-aware attention module to model the spatial correlation of the input feature, from a direction-based perspective. The input feature of the occlusion module is first forwarded to a flatten layer to obtain
[figure(s) omitted; refer to PDF]
After obtaining the embedded patches, we feed all the patches to a transformer encoder, which consists of
After the long-range occlusion-aware features extracted, they are inputted into a direction-aware attention. As shown in Figure 6, we aim to produce a spatial occlusion map to indicate the changeable area of ship targets in the occlusion module. In the very beginning, we first utilize a
[figure(s) omitted; refer to PDF]
In our practice, the ship is photographed as a rectangular object. Thus, we can utilize the strip pooling technique [23] to average all the feature values in a row or a column as shown in Figure 6. Thus, the output of the horizontal strip pooling and vertical strip pooling can be written as
With the obtained feature vectors, we apply a matrix multiplication to fuse them to produce the occlusion map that indicates the area that can be easy to change in other time slots:
With the horizontal strip pooling and vertical strip pooling layers, the network can easily investigate the inherent knowledge of the ship target. Thanks to the long and narrow kernel shape, the produced occlusion map can focus on capturing local details due to its narrow kernel shape along the other dimension.
3.6. Multibranch Identity Module
In the person Re-ID problem, the most important thing is to produce a discriminative representation for each instance. Inspired by MGN [19], we apply a multibranch to produce the representation with respect to different orientations, i.e., horizontal or vertical.
As shown in Figure 3, we mask the feature
We illustrate the structure of multibranch identity module in Figure 7. As shown in Figure 7, there are three branches in multibranch identity module, i.e., global branch, horizontal branch, and vertical branch. The global branch was aimed at capturing the global information of the ship targets, while the last two branches attempt to recognize the details of ship targets from the horizontal and vertical perspectives, respectively. Similar to the MGN [19], after obtaining the output of the former occlusion module, we feed it into three different branches, i.e., global branch, horizontal branch, and vertical branch.
[figure(s) omitted; refer to PDF]
Here, we report the settings of these three branches in Table 2.
Table 2
Structure of multibranch identity module. Here, our input image is
Branch | Part no. | Map size | Dims | Feature |
Global | 1 | 256 | ||
Horizontal | 3 | |||
Vertical | 3 |
In the global branch, we first utilize convolution layers to downsample the input feature; then, a global max-pooling layer is applied on the downsampled feature map, while a
The second and last branches (horizontal branch and vertical branch) have a similar
structure to the global branch. Specifically, we do not downsample the input features but uniformly split them into several parts in horizontal/vertical orientation to maintain the proper areas of reception fields for local features. Then, we utilize the same following structure to learn the local feature representations as learning global features.
In addition to splitting the feature map in the beginning, we are still downsampling the feature map to obtain a global feature representation for the last two branches (i.e.,
3.7. Objective Functions
Here, to boost the learning of discriminative feature representation, we mainly utilize the widely used loss functions in the reidentification task to act as our objective functions, i.e., the softmax classification loss
3.7.1. Softmax Classification Loss
At the very beginning, we classify the feature representations of
3.7.2. Triplet Loss
After obtaining the dimensional reduced global features in three branches, we apply a triplet loss on these three global features to learn a more diverse representation for each individual ship target. This loss function is formulated as follows:
3.7.3. Ship Retrieval
After obtaining the features of those three branches in FSRN, we concatenate them (i.e.,
Let
Here, we use cosine distance to measure the similarity of two features. Thus, we can obtain the similarity between query and any feature in gallery
3.8. Experimental Evaluations
In this work, we propose a large-scale fine-grained ship retrieval dataset (FGSR dataset) to evaluate our proposed method. In this section, we conduct massive experiments to verify the effectiveness of the proposed method.
3.8.1. Dataset and Metric
In our experiments, we follow previous Re-ID work [19] to report the cumulative matching characteristics (CMC) at rank-1, rank-5, rank-10, rank-20, and rank40 and mean average precision (mAP) on our proposed datasets.
3.8.2. Implementation Details
In this work, we implement our whole framework based on PyTorch. One GeForce RTX 3090 GPU is used to run all the experiments. In our proposed network, the convolutional layers before the pyramid fusion module are borrowed from the RestNet50 [16]. And we extract the output of each block of RestNet50 to form our pyramid feature maps. Their size are
3.9. Compare Methods
In this section, since there is no approach for ship Re-ID, we compare our method with several popular object Re-ID methods, i.e., MGN [19], OSNet [24], VANet [25], and VehicleNet [26].
(i) MGN. Multiple Granularity Network (MGN) contains three branches, a global branch to capture the global information of the human body, a horizontal branch, and a vertical branch to extract the local detail representations
(ii) OSNet. Omniscale network (OSNet) designs a residual block composed of multiple convolutional streams, each detecting features at a certain scale. Also, it introduces a unified aggregation gate mechanism to dynamically fuse multiscale features with input-dependent channel-wise weight
(iii) VANet. Viewpoint-aware network was aimed at learning two metrics for similar viewpoints and different viewpoints in two feature spaces, respectively. The former one (within-space constraint) forces the positive pairs closer than negative pairs in each feature space, while the latter one (cross-space constraint) does the same thing when pairs are in different feature spaces
(iv) VehicleNet. In VehicleNet, they design a simple yet effective two-stage progressive approach to learning more robust visual representation from their proposed dataset
In this work, we train the above methods using our proposed dataset and report the results to compare them with our proposed methods.
3.10. Comparison to the State-of-the-Art Methods
In this section, we first compare our method with the current Re-ID methods (i.e., MGN [19], OSNet [24], VANet [25], and VehicleNet [26]) on our proposed dataset, and the results are reported in Table 3. We can see that our method obtains the best results on Rank-1 as well as mAP. It is notable that our method achieves 100% in Rank-40.
Table 3
Comparison of our algorithm with other methods on the collected dataset.
Method | Rank-1 | Rank-5 | Rank-10 | Rank-20 | Rank-40 | mAP |
MGN [19] | 83.6 | 85.4 | 87.5 | 93.6 | 96.9 | 83.5 |
OSNet [24] | 86.3 | 88.4 | 93.2 | 96.7 | 98.3 | 85.4 |
VANet [25] | 82.1 | 86.3 | 90.4 | 93.6 | 96.8 | 80.4 |
VehicleNet [26] | 85.3 | 87.9 | 91.3 | 95.6 | 98.3 | 84.6 |
Ours | 94.3 | 95.7 | 98.4 | 99.5 | 100 | 92.4 |
Compared with the original MGN, our method gains 8.9% in mAP, because we split the feature map into several stripes in two different orientations, while MGN only split feature maps in one orientation. Also, compared with all methods, our method can effectively explore the inherent knowledge of the ship target to obtain the best results. For the shape, these results verify the effectiveness of our proposed method.
Figure 8 shows the retrieval results from our proposed method from a query image. Our method can retrieve the same ship images from multiple views. This demonstrates the effectiveness of our proposed direction-aware modules.
[figure(s) omitted; refer to PDF]
3.11. Ablation Studies
3.11.1. The Effectiveness of Pyramid Fusion Module
To evaluate the effectiveness of the pyramid fusion module, we remove the pyramid fusion module from our whole framework, while the feature extraction convolutional layers only output one level feature map, which will be fed into an occlusion map. The results are reported in Table 4.
Table 4
The ablation study of our proposed method. We denote Trans. as the transformer module in our proposed MBIM.
Method | Rank-1 | Rank-5 | Rank-10 | Rank-20 | Rank-40 | mAP |
Ours w/o PFM | 90.4 | 92.8 | 94.3 | 96.2 | 98.3 | 90.4 |
Ours w/o OM | 91.8 | 92.4 | 95.6 | 97.2 | 99.3 | 91.3 |
Ours w/o MBIM | 89.5 | 91.2 | 94.0 | 95.8 | 97.5 | 90.0 |
Ours w/o trans. | 90.3 | 92.5 | 94.4 | 96.6 | 98.1 | 90.8 |
Ours | 94.3 | 95.7 | 98.4 | 99.5 | 100 | 92.4 |
We can find that, the variance method without pyramid fusion module (“Ours w/o PFM”) underperform our full method (“Ours”) for all metrics, e.g., our full method achieves a significant improvement of 3.9% over the “Ours w/o PFM” on Rank-1. These results indicate that the pyramid fusion module plays an important role in our framework. With a pyramid fusion module, our approach can address the variant size and shape problem of ship Re-ID.
3.11.2. The Effectiveness of Occlusion Module
To better identify the identical ship targets, we proposed an occlusion module to estimate an occlusion map that represents the changeable areas of the ship image. By masking this area, the remaining areas are stable. Thus, we can rely on these areas to identify a ship image. We report the results of our method without the occlusion module. As shown in Table 5, the occlusion module can gain stable improvements for our method. These results indicate the effectiveness of our proposed occlusion module.
Table 5
Results with different settings on our proposed dataset. “+” means the concatenation operation, while “All” in the last column indicates that we concatenate all subfeatures to form the ship representation.
Representation | All | |||||||
Rank-1 | 75.1 | 74.3 | 74.9 | 86.4 | 88.2 | 83.5 | 90.3 | 94.3 |
Rank-5 | 76.2 | 76.0 | 75.8 | 88.6 | 90.1 | 84.1 | 93.4 | 95.7 |
Rank-10 | 80.4 | 83.1 | 82.1 | 90.7 | 94.2 | 88.3 | 96.9 | 98.4 |
mAP | 73.6 | 73.2 | 73.5 | 83.8 | 86.4 | 81.5 | 88.8 | 92.4 |
3.11.3. The Effectiveness of Multibranch Identity Module
To verify the multibranch identify module in our method, we replace it with 3 convolutional layers and finally output a 1024-dimension feature vector to represent the corresponding ship image. The results are shown in Table 4. We can find that the performance degrades if we remove the multibranch identify module from our full method. For example, the “mAP” becomes 90.0% from 92.4%.
These results show that extracting the representative feature vector from different orientations can better identify the different ships.
3.11.4. The Effectiveness of Each Subfeature
In this section, we report the results of variant combinations of extracted features, i.e.,
3.11.5. The Influence of Different Number Blocks
In this work, we mainly split the occlusive feature map into 3 blocks in two different orientations. To evaluate the effect of the number of blocks, we also conduct related experiments to investigate it. We show the results in Figure 9. The experiments show that we can obtain the best results when the number of blocks is 3 in two different orientations.
[figure(s) omitted; refer to PDF]
3.12. Speed Evaluation
In addition to the accuracy, we also evaluate the retrieval speed of our proposed method. We report the results on Table 6. From Table 6, we see that our retrieval method can search a similar target in the gallery with less time cost. For example, for Rank-1, our retrieval method takes around 3 s to give the retrieval result from 30,000 images. Thus, we argue that the proposed retrieval method is efficient.
Table 6
The speed of retrieving a ship target (Unit: second).
Scale | 5 k | 10 k | 20 k | 30 k |
Rank-1 | 0.8 | 1.5 | 2.3 | 3.0 |
Rank-5 | 1.2 | 1.9 | 2.6 | 3.6 |
Rank-10 | 1.6 | 2.4 | 3.0 | 3.8 |
4. Conclusion
In this paper, we aim to study the ship retrieval methods for intelligent water transportation system in smart cities. To achieve this goal, we construct a new large-scale fine-grained ship retrieval dataset (named FGSR) that consists of 30,000 images of 1,000 ships captured in the wild. Besides, we propose a novel fine-grained ship reidentification network (FGSR-Net) based on the MGN, which consists of three important modules: pyramid fusion module, occlusion module, and multibranch identify module. By applying our proposed method, we can address the variant size and shape problem and better produce discriminative feature representation. Our extensive experiments show the outperformance compared with existing state-of-the-art methods from relevant fields.
[1] M. Mohaimenuzzaman, S. M. Monzurur Rahman, M. Alhussein, G. Muhammad, K. Abdullah al Mamun, "Enhancing safety in water transport system based on Internet of Things for developing countries," International Journal of Distributed Sensor Networks, vol. 12 no. 2,DOI: 10.1155/2016/2834616, 2016.
[2] M. A. Sotelo, F. J. Rodriguez, L. Magdalena, "Virtuous: vision-based road transportation for unmanned operation on urban-like scenarios," In: IEEE Transactions on Intelligent Transporta- tion Systems, vol. 5 no. 2, pp. 69-83, 2004.
[3] S. K. Chaturvedi, C. S. Yang, K. Ouchi, P. Shanmugam, S. K. Chaturvedi, C. S. Yang, K. Ouchi, P. Shanmugam, "Ship recognition by integration of SAR and AIS," In: The Journal of Navigation, vol. 65 no. 2,DOI: 10.1017/S0373463311000749, 2012.
[4] H. Zhao, W. Zhang, H. Sun, B. Xue, "Embedded deep learning for ship detection and recognition," Future Internet, vol. 11 no. 2,DOI: 10.3390/fi11020053, 2019.
[5] O. Karabayır, U. Saynak, M. Z. Kartal, A. F. Coşkun, T. O. Gulum, B. S. Batı, M. Z. Kartal, A. F. Coşkun, T. O. Gulum, B. Batı, "Synthetic-Range-Profile-Based training library construction for ship target recognition purposes of scanning radar systems," IEEE Transactions on Aerospace and Electronic Systems, vol. 56 no. 4, pp. 3231-3245, DOI: 10.1109/TAES.2020.2972249, 2020.
[6] X. Cao, S. Gao, L. Chen, Y. Wang, "Ship recognition method combined with image segmentation and deep learning feature extraction in video surveillance," Multimedia Tools and Applications, vol. 79 no. 13-14, pp. 9177-9192, DOI: 10.1007/s11042-018-7138-3, 2020.
[7] F. Huixuan, Y. Li, Y. Wang, P. Li, "Maritime ship targets recognition with deep learning," In: 2018 37th Chinese control conference (CCC), pp. 9297-9302, DOI: 10.23919/ChiCC.2018.8484085, .
[8] J. Zou, W. Yuan, Y. Menghong, "Maritime target detection of intelligent ship based on faster R-CNN," In: 2019 Chinese automation congress (CAC), pp. 4113-4117, .
[9] K.-K. Tseng, R. Zhang, C.-M. Chen, M. M. Hassan, "DNetUnet: a semi-supervised CNN of medical image segmentation for super-computing AI service," The Journal of Supercomputing, vol. 77 no. 4, pp. 3594-3615, DOI: 10.1007/s11227-020-03407-7, 2021.
[10] E. K. Wang, C.-M. Chen, M. M. Hassan, A. Almogren, "A deep learning based medical image segmentation technique in Internet-of- Medical-Things domain," In: Future Generation Computer Systems, vol. 108, pp. 135-144, DOI: 10.1016/j.future.2020.02.054, 2020.
[11] G. Vivone, P. Braca, J. Horstmann, "Knowledge-based multitarget ship tracking for HF surface wave radar systems," In: IEEE Transactions on Geoscience and Remote Sensing, vol. 53 no. 7, pp. 3931-3949, DOI: 10.1109/TGRS.2014.2388355, 2015.
[12] X. Chen, S. Wang, C. Shi, H. Wu, J. Zhao, J. Fu, "Robust ship tracking via multi-view learning and sparse representation," In: The Journal of Navigation, vol. 72 no. 1, pp. 176-192, DOI: 10.1017/S0373463318000504, 2019.
[13] M. Geng, Y. Wang, T. Xiang, Y. Tian, "Deep transfer learning for person re-identification," 2016. http://arxiv.org/abs/1611.05244
[14] P. Huang, R. Huang, J. Huang, R. Yangchen, Z. He, X. Li, J. Chen, "Deep feature fusion with multiple granularity for vehicle re-identification," In: CVPR Workshops, pp. 80-88, 2019.
[15] F. Huixuan, Y. Li, Y. Wang, L. Han, "Maritime target detection method based on deep learning," In: 2018 IEEE International Conference on Mechatronics and Automation (ICMA). IEEE, pp. 878-883, DOI: 10.1109/ICMA.2018.8484727, .
[16] K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition," In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, .
[17] R. Girshick, "Fast R-CNN," In: Proceedings of the IEEE international conference on computer vision, pp. 1440-1448, .
[18] R. R. Varior, M. Haloi, G. Wang, "Gated Siamese convolutional neural network architecture for human re-identification," In: European Conference on Computer Vision, pp. 791-808, DOI: 10.1007/978-3-319-46484-8_48, 2016.
[19] G. Wang, Y. Yuan, X. Chen, J. Li, X. Zhou, "Learning discriminative features with multiple granularities for person re-identification," In: Proceedings of the 26th ACM international conference on Multimedia, pp. 274-282, DOI: 10.1145/3240508.3240552, .
[20] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, "You only look once: unified, real-time object detection," In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, .
[21] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, "Feature pyramid networks for object detection," In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125, .
[22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, "Attention is all you need," In: Advances in neural information processing systems, vol. 30, 2017.
[23] Q. Hou, L. Zhang, M. M. Cheng, J. Feng, "Strip pooling: rethinking spatial pooling for scene parsing," In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4003-4012, .
[24] K. Zhou, Y. Yang, A. Cavallaro, T. Xiang, "Omni-scale feature learning for person re-identification," In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3702-3712, .
[25] R. Chu, Y. Sun, Y. Li, Z. Liu, C. Zhang, Y. Wei, "Vehicle re-identification with viewpoint-aware metric learning," In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8282-8291, .
[26] Z. Zheng, T. Ruan, Y. Wei, Y. Yang, T. Mei, "VehicleNet: learning robust visual representation for vehicle re-identification," IEEE Transactions on Multimedia, vol. 23, pp. 2683-2693, DOI: 10.1109/TMM.2020.3014488, 2020.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2022 Yunting Xian et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Ship reidentification is an important part of water transportation systems in smart cities. Existing ship reidentification methods lack a large-scale fine-grained ship retrieval dataset in the wild and existing ship recognition solutions mainly focus on the ship target identification rather than the fine-grained ship reidentification. Furthermore, previous ship target identification systems are usually based on synthetic aperture radar (SAR) image, automatic identification system (AIS) data, or video streaming, which is confronted with expensive deployment costs, such as the installation cost of SAR and AIS, and the communication and storage overhead. Indeed, ship reidentification benefits for traffic monitoring, navigation safety, vessel tracking, etc. To address these problems, we propose a new large-scale fine-grained ship retrieval dataset (named FGSR) that consists of 30,000 images of 1000 ships captured in the wild. Besides, to tackle the difficulty of spatial-temporal inconsistency in ship identification in the wild, we design a multioriented ship reidentification network named FGSR-Net that consists of three modules to address different crucial problems. The pyramid fusion module was aimed at addressing the problem of variant size and shape of ship targets, the occlusion modules attempt to detect the unchangeable area of ship images, while the multibranch identity module generates discriminative feature representation for ship targets from different orientations. Experimental evaluations on FGSR dataset show the effectiveness and efficiency of our proposed FGSR-Net. The mean average precision of ship reidentification is around 92.4%, and our FGSR-Net proposed method only takes 3 seconds to give the retrieval results from 30,000 images.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer