Full text

Turn on search term navigation

1. Introduction

The ability to accurately enumerate wildlife populations is fundamental to many conservation and natural resource management programs [1,2]. For instance, accurately determining a species population size is essential for drawing up effective plans to protect endangered species [3], monitoring the movement and behaviour of migratory animals [4], managing populations of invasive species within eradication programs [5,6], and optimising the protection of wildlife during natural disasters. However, detecting wildlife is a challenging task. Wildlife moves and is often camouflaged against its background [7], and the environmental background is often “cluttered”, consisting of a complex interplay of elements. Despite the difficulty of detection, land managers require accurate data to achieve their objectives without negatively impacting wildlife. However, traditional wildlife surveying methods, including diurnal searches, nocturnal spotlighting [8,9], detection dogs and radio or satellite collars, camera traps [9], hand-held thermal imagers [10], and crewed aerial surveys [11] all have limitations. In addition, they are often time-consuming, labour intensive, and expensive, and some are hazardous [1].

More recently, uncrewed aerial vehicles (UAVs) or drones have become commercially available and used in diverse applications. UAVs have been used in conjunction with sensors such as colour and thermal cameras to monitor forests and detect animals such as livestock [12,13], kangaroos [14], deer [15,16], fur seals [17], sea turtles [18], monkeys [19], and koalas [20,21]. In these studies, the presence of animals was determined manually, by visually examining recorded imagery. However, while drones and cameras offer a new way of monitoring wildlife, manually inspecting the imagery is time consuming, tedious, and expensive, especially when there is a large amount of imagery [22].

The use of UAVs to assist in the automated detection of wildlife is not new. They have been used to detect animals such as wild turkeys [23], rabbits and chickens [24,25], cows [26], white-tailed deer [27], hippopotamus [28], seals [29], greater kudus, gemsboks, and hartebeests [1,30]. A more detailed literature review on the automated detection of wildlife using UAVs can be found in [31].

Automated deep-learning techniques can identify and extract critical features from imagery to detect objects of interest [32,33,34,35]. Supervised deep-learning techniques have historically shown promising results in wildlife and object detection within aerial imagery [23,26,36,37,38]. However, these techniques have limitations. Firstly, they require large amounts of training data, which are not always available for specific objects of interest. In addition, sometimes the available training data are noisy and hence require using certain techniques to eliminate the noisy training labels before using them in training, as deep neural networks can easily overfit noisy labels, leading to significant degradation in the results [34]. Although fine-tuning and data-augmentation techniques have significantly reduced the amount of training data required [39], a relatively large amount of data is still needed. Secondly, as neural networks use contextual features (i.e., multiple neighbouring pixels) within the relevant image set, the training data have to have enough object features (i.e., occupy sufficient pixels) for the network to learn how to classify patterns of interest, not just the intended objects of interest in the dataset(s). The need to use object features for classification can be a significant problem in datasets where the object of interest is very small (occupying fewer than 10 pixels) and occlusion further reduces the visible size of the object. Consequently, comparative neural-network algorithms such as Faster RCNN [40] and YOLO [41] techniques may not be able to learn the key features of items of interest with such small signatures. They could therefore fail during the training stage. While results may improve for these techniques when the images are scaled up, processing times would also increase.

Other factors that have effects on the detection techniques include the type of camera used and its resolution. For example, in thermal images, small objects generally exhibit even fewer features than their visible (colour) counterparts. This is because thermal cameras typically have lower resolutions than visible cameras and contiguous pixels are often not as thermally distinct from one another as the information in a colour image. Additionally, the key features of small objects are typically lost while passing through the pooling and convolutional layers of a deep neural network. For instance, a 32 × 32 pixel object will be represented (at most) as 1 pixel after five processing steps by a pooling layer in the VGG16 network. This means it could easily be missed if processed using more layers [42]. Therefore, there is a need for techniques that can detect objects with a low number of features and do not require large amounts of training data to address the aforementioned issues. Many image-processing techniques have been inspired by biology to address a diversity of applications. Flying insects undertake complex tasks such as detecting and chasing objects in high clutter backgrounds under a wide variety of weather and environmental lighting conditions [43,44,45,46,47,48]. Despite the small size, weight, and power draw of an insect brain, and its limited number of neurons [47], these animals perform complex tasks easily in real time. This remarkable capability has encouraged scientists to study the vision systems of insects and develop models that are inspired by or directly mimic them [47,49,50,51]. These bio-inspired vision methods can outperform traditional image-processing techniques in complex environments [52], and many of these bio-inspired techniques do not require training or prior knowledge of a scene’s lighting conditions [52]. Therefore, one of the aims of this paper is to develop and test such a method for use in the field of wildlife detection and compare it to existing techniques. The impetus for this research was a need for the accurate detection of koalas to mitigate potential harm that could arise from forest operations. Koalas are an iconic Australian species, but unfortunately, due to a range of factors, including habitat fragmentation, disease, vehicle strikes, poor genetic diversity, and bushfires [53], there is a significant decline in populations. In 2022, the Australian government declared koalas as endangered across much of their range under the Environment Protection and Biodiversity Conservation Act 1999. While they are listed as stable in the location of the study, there has been a concerted effort across Australia to protect this species, not only in native forests but also when they establish themselves in forest plantations.

Unlike conventional object-detection techniques, detecting wildlife in complex forest environments is a challenge due to several factors, such as dense trees, various weather conditions, and the ability of animals to camouflage within the environment [54]. The use of colour imagery for koala detection is problematic as koalas are typically hidden by tree foliage and the species blends in with the surrounding, meaning colour contrast is low [55]. UAVs and thermal imaging technologies have been used widely in forestry and wildlife conservation [56]. Recently, several researchers have used UAVs and thermal cameras to detect koalas [7,39,55,57]. The results are promising when compared to traditional methods as they offer a comparable detection accuracy, in addition to improved safety and surveying speeds [39]. However, these studies were conducted under temporal constraints (early in the morning) and in cold weather [39,55,58], which ensured there was a stronger contrast between the koala, a warm-blooded animal, and the surrounding trees and foliage and consequently increased the likelihood of accurate estimates of the koala population. From a practical point of view, however, land managers, such as forest companies, desire the ability to accurately assess population numbers during any season, under all weather conditions, and at any time of the day [20,21]. Having such a technology will help land managers to achieve their objectives without compromising the welfare of wildlife in the area.

To enable a better understanding of the challenges associated with koala detection in infrared aerial imagery, the influence of occlusion by the canopy was examined using a combination of simulated and real data. The effect on the probability of detection of the relative geometry of the koalas and drone/camera was also analysed. This study aimed to investigate the effectiveness of automated object-detection systems for identifying koalas in eucalyptus plantations in southwest Victoria, Australia.

The main contributions of this paper are as follows:

An accurate ground-truth dataset of aerial high-dynamic-range thermal imagery containing koalas in eucalyptus plantations. The dataset, known as Koala InfraRed Aerial Imagery (Kirai), has images from four flights at three different locations. The dataset is publicly available to the research community (https://github.com/LaithAlShimaysawee/KiraiDataset, accessed on 10 October 2024), acknowledging the shortage of such field data and the need for further research.
The introduction of a new object-detection method and comparison to 10 existing state-of-the-art object-detection techniques for the detection of koalas in eucalyptus plantations.
A pilot study on the effect of time of the day on automated koala detection in infrared aerial imagery and recommendations for future research.
An analysis of the effect of occlusion by tree canopy on koala detection (and likely arboreal mammals in general) using a combination of simulated and real-world data.
Recommendations regarding drone/camera angles on the probability of detection.

This paper is organised into two sections. Section 1 discusses the methodology of real-data collection, comparative detection techniques and their experimental settings, performance metrics, and the results of these detection techniques. Section 2 discusses the methodology and results of other experiments using real and simulation data to study the effects of several parameters on koala detection. These parameters are environmental, such as the temperature, tree canopy structure, and the koala’s position within the tree. Additionally, there are flight parameters, including factors related to the flight settings, such as the drone’s flight altitude and the camera’s depression angle.

2. Part 1: Evaluation of Several Detection Techniques on Koala Detection

2.1. Methodology of Part 1

This section describes the equipment used in the field trials and the recorded datasets. It also describes a benchmark of existing object-detection algorithms, the experimental settings, and the evaluation criteria used in the study.

2.1.1. Camera and Drone

The model of infrared camera used was an ICI-8640 P-series (ICI, Beaumont, Manufacturers in Beaumont, TX, USA), which has a spectral band of 7–14 µm [59]. It had a pixel-depth (dynamic range) of 14 bits encoded in a 16-bit wrapper. The camera was attached to a bespoke payload, designed by researchers at the university of South Australia, which captured the high-dynamic-range (HDR) raw thermal images at a frame rate of 10 Hz (note, this has subsequently been upgraded to 30 Hz). The image resolution was 640 × 512 pixels. The focal length of the lens was 12.5 mm with manual focus. This translated to a field of view (FOV) of about 50° × 37.5°. The payload was mounted on a DJI Ronin Gimbal (DJI, Shenzhen, China) to provide spatial stability, carried by a DJI Matrice 600 drone (DJI, Shenzhen, China) [60].

2.1.2. Survey Site and Real-Data Collection

Three datasets were recorded from two sites in southwestern Victoria. The first area was known to have a high population of koalas and comprised around three hectares (300 m × 100 m). Two flights were conducted at this site, one at 10:30 a.m. and one at 11:30 a.m., both on 14 November 2019. The data recorded from these flights are referred to as datasets A and B, respectively. Figure 1A,B shows two image samples from each dataset. The second site was around five hectares (300 m × 168 m). One flight was conducted at this site at 11:15 a.m. on 12 November 2019. The data recorded from this site is referred to as dataset C. Figure 1C shows two image samples from this dataset. A key reason for performing the flight missions later in the morning was to test the performance of the methods at a time that is more challenging for conventional infrared monitoring. The forest industry could be harvesting at any time of the day, and in all seasons, so it was important that some data in the warmer part of the day have been recorded. On both trial days, wind speed was around 20 km/h and ambient temperature was about 12–17 °C. It should be noted that the data were collected around the middle of November, which is considered the end of spring in Australia. The drone flew above the sites in a lawn-mower pattern at an altitude above the ground of approximately 60 m (35 m above the tree tops), with a constant forward speed of 8 m/s. The flight path was designed to have 50% minimum side overlap (at ground level) between images from adjacent transects. Datasets A, B, and C comprised 2770, 2530, and 4025 image frames, respectively. It took the drone between 8 and 15 min to cover the two sites. The location of each image was recorded by the drone using a real-time kinematic (RTK) carrier phase differential global position system (GPS).

To create ground truth, prior to conducting the trials, the trunk of every tree was marked with a unique identifier (ID) using chalk. Then, during the trial, for every tree trunk occupied by a koala, eight independent expert koala spotters individually identified the location of all koalas by tree ID, time of day, and GPS coordinates. On average, spotters took 1.5 ± 0.5 and 3.0 ± 1.13 h to survey both sites (mean ± standard deviation), respectively. Upon completion of all ground surveys, a list of every koala detected by every spotter was compiled, and the location of every found koala was independently verified through visual re-inspection of the sites using tree ID. It should be noted that, although unlikely, it is possible there were koalas not detected by either the eight ground observers or the drone-mounted sensor. The key purpose of pre-labelling all trees in a study area was to ensure the spotters findings remained independent, i.e., they did not share information about the koalas’ locations. In other words, whilst it would be more time-efficient for spotters to simply mark each tree where they spotted a koala, this would have prevented spotters from conducting each survey “blindly”. The second reason for labelling the trees was that finding the GPS location of trees in the sub-canopy is a notoriously challenging problem. The location uncertainty in such a task derives from many factors, including the attenuation and scattering of the GPS signal, the multi-path caused by the trees and ground, the poor geometry of the available satellites that are visible, and the increased jitter in the tracking loops (lower accuracy pseudo range measurements) caused by the lower signal-to-noise ratio of the GPS signal. In addition, spotters generally report the location at which they are standing (looking up at the koala) rather than a careful (vector offset) estimate of the centre of the tree or location beneath the koala.

The ground truth for the images from each dataset was found by first creating an orthomosaic image from the infrared imagery for each flight using Agisoft’s Metashape 2.1.3 software [61]. Then, using the GPS coordinates and tree IDs provided by the spotters as a guide, the koalas’ locations were manually marked on the orthomosaic. These locations were then manually adjusted to ensure all relevant aerial and ground-based observations corresponded with one another, noting that all koalas detected by ground-based spotters were clinging to a tree, i.e., no koalas were spotted on the ground. In other words, the locations of all detected koalas in the infrared imagery were extracted and the (x, y) pixel positions within each frame with a koala in it manually confirmed.

2.1.3. Comparative Methods

A new object-detection method was introduced and compared to ten existing state-of-the-art object-detection techniques using three infrared datasets containing koalas in eucalyptus plantations.

The introduced technique is known as the Multiscale Object Bio-Inspired Vision Line Scanner (MOBIVLS) and draws inspiration from the visual pathways of flying insects. This vision system has many stages, each designed for different tasks, but in general the object-detection pipeline is composed of the photoreceptor cells (PRCs), lamina monopolar cells (LMCs), rectified transient cells (RTCs, and elementary small target motion detectors (ESTMDs) [47,49,62]. The photoreceptor cells are responsible for adapting to light changes. This adaptation helps to effectively compress high-dynamic-range images without losing important information. It also enhances the contrast between objects of interest and their surroundings using temporal processing to enhance object separation by up to 70% [63]. The lamina monopolar cells (LMCs) are responsible for removing spatial and temporal redundancy in the signal passed downstream of the photoreceptor cells [49,64,65,66]. By removing the redundant information, the contrast of objects of interest can be enhanced in the scene, leading to better object discrimination [66,67]. The rectified transient cells (RTCs) are responsible for dividing the bipolar signal coming from the LMCs into ON (positive) and OFF (negative) channels [68,69,70]. Each channel adapts to the polarity change (decreasing or increasing) in the illumination [69,70,71]. The adaptation is fast when the signal is increasing (de-polarisation) and slow when the signal is decreasing (re-polarisation) [69]. This allows for the suppression of rapid signal variations (potential background clutter) and permits only large changes (potential objects of interest) in the signal [47,72]. The elementary small target motion detectors (ESTMDs) can detect and discriminate objects of interest by correlating signals from the ON and OFF channels of the RTC stage [62,70,73]. The correlation process includes delaying the ON signal by temporal low pass filter and multiplying it by the OFF signal to detect bright objects [62,74]. For dark-object detection, the OFF signal is delayed and multiplied by the ON signal [47,70]. The MOBIVLS has been used to detect small, dim objects, such as drones, at long ranges (for more details, see [75]). The MOBIVLS processes an input image by first applying two cross-directional line scanners. Then, the output of each line scanner is split into a positive signal (the “ON” channel) and an inverted negative signal (the “OFF” channel) using a half-wave rectifier inspired by the rectified transient cells (RTCs) of the insect brain [70]. These ON–OFF channels are then passed through a multiple shift register, multiplication processing, and accumulative addition. The final detection map is generated after applying an adaptive threshold that allows the object of interest to be detected and most of the clutter suppressed [75]. Figure 2 shows a block diagram of the detection process.

The first six comparative techniques have been widely used to detect dim and small objects in infrared images. These techniques are as follows: Average Absolute Gray Difference (AAGD) [77], Improved Average Absolute Gray Difference (IAAGD) [78], High Boost Multiscale Local Contrast Measure (HB-MLCM) [79], Improved Local Contrast Measure (ILCM) [80], Multiscale Local Contrast Measure (MLCM) [81], and Multiscale Patch Contrast Measure (MPCM) [82]. In general, each of these methods computes the contrast between an object of interest and its surroundings by sliding a window around the input image vertically and horizontally, where the window centre is intended to be the object of interest. The MPCM method has the ability to detect bright and dark objects. The ILCM focuses on improving the processing speed but could negatively affect the detection performance. The HB-MLCM uses a high boost filter as a preprocessing step before applying the MLCM technique to enhance objects of interest and suppress noise and background clutter. The AAGD enhances the object of interest region by taking the difference between the average of a central window (potential object of interest region) and the average of surrounding pixels. The IAAGD solves some of the limitations of the AAGD method. It has the ability to differentiate between bright and dark objects of interest and reduce false alarms when there are sharp edges in the imagery.

The next four comparative techniques used machine-learning or neural-network techniques. Each has recently been used to detect wildlife, including koalas. The first two techniques were Faster Region Convolutional Neural Network (Faster R-CNN) [40] and You Only Look Once (YOLOv2) [41], which have been widely used in object detection, including wildlife [83,84]. Faster-RCNN is a two-stage detector where the first stage proposes regions in the imagery likely to contain objects of interest and the second stage classifies the objects of interest in these regions. YOLO is a one-stage detection network that skips the regions proposal stage and applies the detection stage directly onto the imagery. The third technique was the Template Matching Binary Mask (TMBM), which has been used to detect koalas in small areas [55]. The fourth technique was a combination of outputs (i.e., detection maps) of both Faster R-CNN and YOLOv2. In this approach, each detector processes the image sequence independently and the final output for each image is the average of the detection maps of both techniques. This idea was recently used to detect koalas in native forests [39] and eucalyptus plantations [7] and to detect Rusa deer (Cervus timorensis) [22]. It is called the Combined 2DCNN method, where 2DCNN refers to two deep convolutional neural networks. Combining detection maps of both models helps to reduce false alarms in the final output as the detected false alarms are less likely to be the same in both models, whereas the true detections are more likely to be the same.

2.1.4. Experimental Settings

All detection techniques were programmed and run in MATLAB (R2020b). The values of the parameters used by each technique were empirically tuned to obtain maximal true positive detection rates for our datasets. Table 1 shows the parameter settings. Three distinct sets of data were used in this work: training, validation, and testing. The training data were used to retrain the pretrained networks, while the validation data were used to check the status of the training in real time. Both training and validation sets were collected from a separate location at a different time to the testing data to ensure there was no contamination between them and the test data. This separate dataset was split into 80% training and 20% validation. Once trained, all algorithms were benchmarked using the three distinct testing parameters previously mentioned: datasets A, B, and C. The neural networks (Faster R-CNN and YOLOv2) needed to be modified for koala detection. Faster R-CNN and YOLOv2 were both trained to detect koalas (one class) by fine tuning the weights of a VGG16 feature extraction network, which was pre-trained on more than a million images from the ImageNet database [85]. The fine tuning was conducted using 1239 instances of koalas from 770 images: 80% of the samples were used for training and 20% for validation. These images were recorded using the same drone and IR payload but from a different site that was not part of the three test datasets. The background of this dataset is quite similar to the three datasets used for testing. It will also be available to the public. The reason for not using the three datasets in the training is that it will lead to a bias as, for each koala in the site, some of its instances will be in the training, validation, and the test parts, which may lead to an improper experiment.

Standard data-augmentation techniques were used to improve koala detection accuracy by randomly horizontally and vertically flipping the training data to generate four versions of each sample. The raw images were normalised and stored as PNG images (bit-depth of 16) so they could be processed by the neural networks. Deep-learning techniques are usually used to detect ‘large’ objects ( $32 \times 32$ pixels) [42,86,87]. This is because the key features of an object of interest smaller than this are typically lost while passing through the pooling and convolutional layers of a deep neural network, where a $32 \times 32$ pixel object will be represented (at most) as 1 pixel after five processing steps by a pooling layer in the VGG16 network. This means it could easily be missed if passed through more layers [42]. Therefore, the input images were divided into tiles. Each tile was scaled up and processed so that the deep-learning algorithms (i.e., Faster R-CNN and YOLOv2) could successfully detect small objects, i.e., objects less than $7 \times 7$ pixels. This was achieved as follows: Two versions of Faster R-CNN and YOLO were implemented. First, to reduce the required processing load and match the input layer of the VGG16 network, the input image was divided into nine tiles of 224 × 224 pixels. The tiles were then processed individually by the neural networks and a detection map constructed by combining the processing outputs of all tiles. Unfortunately, due to the small size of the koalas in our dataset, neither CNN could be trained to correctly detect them. The image was therefore divided into 80 tiles of 64 × 64 pixels, and each tile scaled up to 224 × 224 pixels to match the size of the minimum input layer of the VGG16 network. These 80 tiles were then processed individually by the neural networks and a detection map constructed by combining the processing outputs of all tiles. The need to run so many versions of the neural-network classifiers on each image greatly increased the processing time for these detectors compared to the other methods. The computer used in this processing was an Alienware M15 Laptop (Dell, Miami, FL, USA) model with a Core i7-9750H 2.6 GHz CPU (Intel, Santa Clara, CA, USA), 16 GB DDR4 memory (JEDEC, Arlington, VA, USA), and NVIDIA GeForce RTX 2060 GPU (NVIDIA, Santa Clara, CA, USA).

2.1.5. Performance Metrics

Equations (1) were used to evaluate the performance of all 11 comparative methods. These metrics have been widely used for the evaluation of detection techniques of objects including wildlife [1,7,23,83,88]:

(1) $\begin{matrix} T r u e P o s i t i v e R a t e (T P R) o r R e c a l l & = \frac{T P}{T P + F N} \\ P r e c i s i o n & = \frac{T P}{T P + F P} \\ 1 - P r e c i s i o n & = \frac{F P}{F P + T P} \\ F a l s e P o s i t i v e R a t e (F P R) & = \frac{F P}{F P + T N} \\ F 1 S c o r e & = 2 \times \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n} \end{matrix}$

where

T P

is the number of true positives (correct detections),

F N

is the number of false negatives (missing detections),

F P

is the number of false positives (incorrect detections), and

T N

is the number of true negatives (correct non-detections), where the true negative represents a patch of

12 \times 12

pixels not part of correct detections. For all algorithms, a detection was deemed to occur correctly (

T P

) if the distance between the centroid of detection and the ground truth was less than or equal to six pixels (six pixels was approximately equal to 48 cm on the ground or 24 cm at the top of the trees). Two evaluation curves were computed: the receiver operating characteristic (

R O C

) curve and the recall vs. (1-precision) curve. These curves were computed by changing the global detection threshold from one to zero. The equal error rate (

E E R

) and area under the ROC curve (

A U R O C

) were also computed. The

E E R

represents a point on the recall vs. (1-precision) curve where the recall and precision values are equal, i.e., the number of missing detections (

F N

) equals the number of false detections (

F P

). The

A U R O C

, as its name suggests, is the area under the

R O C

curve.

A U R O C

evaluates the detection rate for a technique over an

F P R

range rather than at a single

F P R

. The

A U R O C

was computed using a linear

F P R

axis range of (0 and

10^{- 4}

), where regions of higher false positives in the

R O C

curve (i.e., FPR >

10^{- 4}

) are generally impractical in real-world applications.

In addition, the average detectability per koala ( $A v g_{k d e t}$ ) was computed. This has some advantages over the more basic overall/total number of koala detections. The average detectability per koala refers to the number of images that each unique koala is detected in divided by the number of images for which this koala is present. As the same koala can be observed in multiple images, the use of the average detectability per koala allows more nuanced examination of how many times the same koala could be detected by each method (relative to the total number of possible detections), as opposed to whether or not the technique was able to detect the koala only a limited number of times.

2.2. Results of Part 1

Figure 3(a_1–3–c_1–3) shows the $R O C$ and recall vs. (1-precision) curves, as well as the AUROC and EER for all 11 comparative detection methods (AAGD, IAAGD, HB-MLCM, ILCM, MLCM, MPCM, TMBM, Faster R-CNN, YOLOv2, Combined 2DCNN, and the MOBIVLS) when tested on the datasets A, B, and C, respectively. Figure 3(d_1–3) shows the overall results, which were computed by treating the datasets as a single entity. The MOBIVLS significantly outperformed all existing methods in terms of detection and false alarm rates, irrespective of the dataset.

Figure 3(d₃) shows the overall percentage of area under the $R O C$ curve ( $A U R O C$ ) and the overall equal error rate ( $E E R$ ) for the 11 comparative methods. The $F P R$ and $T P R$ ranges are (0– $10^{- 4}$ ) and (0–1) for the $A U R O C$ .

Combining the Faster R-CNN and YOLOv2 [7,39] through averaging significantly improved the results. The overall $A U R O C$ and $E E R$ of the Combined 2DCNN was 46.4% and 64.8%, respectively. A 24.8% and 18.8% increase in $A U R O C$ over the individual Faster R-CNN and YOLOv2 models was recorded, along with an 18.5% and 16.5% increase for the $E E R$ . However, the $A U R O C$ for the MOBIVLS was 73.9%, a 27.5% improvement over the next best method, the Combined 2DCNN. The $E E R$ result for the MOBIVLS was 77.9%, a 13.1% improvement over the Combined 2DCNN.

Table 2 shows the overall results of the evaluation metrics described in Section 2.1.5 at $F P R$ values of (a) $10^{- 6}$ and (b) $10^{- 5}$ for all of the techniques for datasets A, B, and C. It can be seen that MOBIVLS outperformed all other detection techniques tested at both levels of $F P R$ . Individual results tables for each dataset can be seen in Appendix A (Table A1, Table A2 and Table A3).

At a $F P R$ of $10^{- 6}$ , the MOBIVLS provided an F1 score of 39.8%. This compares to an F1 score of 26.3% for the next best technique, MLCM. Allowing for a $F P R$ of $10^{- 5}$ , the F1 score was 76.1% (MOBIVLS) vs. 48.6% (ILCM). The MOBIVLS also showed a higher likelihood of detecting more observations of the same koala as the average detectability per koala was 67.5%, as opposed to 28.2% for the next-best method, ILCM (FPR $10^{- 5}$ ). Overall, the MOBIVLS was able to detect 51 of the 56 koalas present, at a false alarm rate of $10^{- 5}$ , while the best performing alternative method (MLCM) only detected 41 and the Combined 2DCNN just 17.

In all cases, and using all metrics, the proposed MOBIVLS algorithm outperformed every other approach tested, often by a wide margin. The $A U R O C$ and $E E R$ metrics indicated that the second-best technique was the Combined 2DCNN and the third-best was the MLCM (see Figure 3(d₃)). However, the recall, F1 score, koala count, and average detectability per koala metrics indicate the second- and third-best techniques were the MLCM and the ILCM, while the performance of the deep-learning techniques (Faster R-CNN, YOLOv2, and Combined 2DCNN) were the worst (Table A1 and Table 2). This is because the evaluation criteria in Table A1 and Table 2 were computed using a single level of $F P R$ ( $10^{- 6}$ or $10^{- 5}$ ), whereas the $A U R O C$ metric reflects the performance over a range of $F P R$ (0– $10^{- 4}$ ), and $E E R$ reflects the performance when the number of false negatives equals the number of false positives. As the performance of the Combined 2DCNN, the best performing supervised learning technique, surpassed the dim-object-detection techniques (at an $F P R$ of $2.5 \times 10^{- 5}$ , see Figure 3(d₁)), the $A U R O C$ and $E E R$ metrics better reflect its overall performance.

Also, although deep-learning techniques provided better results in $A U R O C$ and $E E R$ metrics than dim-object-detection techniques, they require large amounts of training data, high computational power, and are relatively slow. This slow speed was primarily due to the small size of the koalas and hence the need to run multiple tiles through the detector for each image. Table 2 shows the average processing times for MATLAB implementations of the AAGD, IAAGD, HB-MLCM, ILCM, MLCM, MPCM, TMBM, Faster R-CNN, YOLOv2, and Combined 2DCNN methods. The average processing time of the unoptimised MATLAB implementation of MOBIVLS was 0.8 s per frame. Moreover, these evaluations were computed using koalas that have a detectable heat signature (as in Figure 4a,b). In other words, detections of koalas that are fully occluded (as in Figure 4c), and thus have no heat signature, were considered false alarms. This is consistent with the literature, where true detections are only computed for koalas that display a thermal signature [39].

Figure 5 shows detection maps for the eleven detection methods compared (AAGD, IAAGD, HB-MLCM, ILCM, MLCM, MPCM, TMBM, Faster R-CNN, YOLOv2, Combined 2DCNN (Faster R-CNN and YOLOv2), and MOBIVLS), applied to the image samples shown in Figure 1B. Figures of detection maps for the eleven detection methods applied to the image samples, Figure 1A,C, can be seen in Appendix A (see Figure A1 and Figure A2). For the first ten detection methods, the koalas were either completely missed, detected with low confidence, and/or had many false alarms. The Faster R-CNN and YOLOv2 performed better than the AAGD, IAAGD, HB-MLCM, ILCM, MLCM, MPCM, and TMBM techniques, and combining the results using Combined 2DCNN provided even better performance. This is because the number of false alarms decreased during the combination process. However, the Combined 2DCNN method required the koalas be detected by both the Faster R-CNN and YOLOv2 to effect higher confidence than the individual approaches. For example, as can be seen from Figure 5, where the koala in frame two was missed by YOLOv2 but detected by Faster R-CNN, the combined technique does not improve the final output. The result is a low confidence detection with many false alarms. By contrast, the MOBIVLS detected all koalas in both frames with no or very few false alarms.

It is also interesting to examine the differences and similarities between the datasets. Figure 6 shows a histogram of observed brightness temperatures for all three datasets. Datasets A and B were recorded at the same site but one hour apart (and the ground truth for both was the same, i.e., it was known that the koalas did not move between flights). Dataset C was recorded at a different site on a different day, but at approximately the same time of the day as B. The temperature distributions for sites A and B were 14.2 °C and 17.4 °C, with standard deviations of 1.35 °C and 2.64 °C, respectively. The drone, camera/payload, flight paths, forward velocity, and observation altitudes were the same for all three. However, although both the A and B datasets were recorded at the same site using the same flight paths, the images in these two datasets were not taken exactly at the same locations. Even though the A and B datasets were acquired only an hour apart, there is a slight increase in temperature (3.2 °C ± 1.3 °C) over this period. This temperature difference led to a reduction in performance. This reduction was indicated by a drop in $A U R O C$ and $E E R$ of 7–20% and 5–17%, respectively, for most methods. The exceptions were the Faster-RCNN, which fell by 35% and 24%, and the AAGD, which improved slightly (less than 1%); see Table A1 and Table A2 and Figure 3(a_1–3,b_1–3). This is consistent with the literature, which indicates that time of day plays an important part in the performance of automated koala detection techniques that rely on long-wave infrared (LWIR) imagery. This is because data that are not acquired early in the morning have proven challenging for koala detection techniques [20] as the relative temperature between tree canopies and the animals is lower when these surveys are conducted later in the day, and also the specular reflection from ground objects generates large numbers of false alarms [39,55,57]. However, as mentioned earlier, there is a demand for a detection method that is effective at all times of the day.

In summary, the results in the section indicated that the MOBIVLS has shown promising results when compared to the existing baseline techniques. In addition, results have shown that time of day plays an important role in degrading the performance of detection techniques. In the next sections, some analysis of real and simulation data were conducted to investigate the effects of several environmental parameters and flight settings on the detection results.

3. Part 2: Effects of Environmental and Flight Parameters on Koala Detection

Trees have an impact on the probability of koala detection for two main reasons. Firstly, the koala’s signature can be partially or fully occluded by canopy structures (tree stems, branches, foliage). This reduces the apparent size of the koala. Secondly, intervening canopy can (through attenuation) reduce the contrast of a koala’s heat signature relative to other objects. This reduces the ability to distinguish the koala from its background.

Several parameters relate to the first effect. These include tree structure, tree separation distance, vegetation density, koala altitude, drone altitude above canopy, and camera depression angle. Regarding the attenuating effect, there are several more complex and interdependent parameters. These include time of day, season, weather, temperature, humidity, moisture, atmospheric attenuation, emissivity of tree canopy (leaves, wood, etc.), koala emissivity, soil/ground emissivity, effect of wind (motion), and the sensitivity of the infrared camera. Moreover, these parameters are also dependent upon parameters related to the first effect. Simulating all these parameters with high fidelity would be a challenging task and is beyond the scope of this study. The goal here is to understand the broad impact of koala occlusion on the probability of detection, and to determine flight options that may eliminate, or at least reduce, such effects. Several experiments were conducted to examine this.

3.1. Methodology of Part 2

3.1.1. Real-Data Analysis

Figure 4 shows several examples of koalas manually extracted from datasets A, B, and C, where their thermal signatures are (a) minimally attenuated/occluded; (b) somewhat attenuated/occluded, either in apparent size, contrast or both; and (c) fully occluded. The individual sub-images in (a), (b), and (c) are ’zoomed’ shots of $24 \times 24$ pixels taken from the original 640 × 512 pixel LWIR images. As can be seen, both occlusion effects discussed in the previous section (the reduction in the apparent size of koalas and the attenuation in the contrast between koalas and their surroundings) are interdependent. Thus, a koala can appear to be full size but its contrast with its surroundings is low, and vice versa. Figure 7 shows a histogram of apparent koala size and of koala vs. background intensity contrast for the three datasets and a 2D histogram for both parameters.

Figure 8 shows a sequence of images taken as the drone flew almost directly along a row of trees containing a koala. In this instance, the animal was fully occluded in frames 4 and 5; partially occluded in frames 3 and 6; and minimally (or not) occluded in frames 1, 2, 7, and 8. The main images show the koala in a set of 640 × 512 pixel images. The image in the lower right section of each picture shows a zoomed section around the koala ( $50 \times 50$ pixels).

Datasets A, B, and C are comprised of 2770, 2530, and 4025 frames, respectively, with koalas potentially detectable in 1482, 1356, and 412 frames, respectively (note: the potential detectability of the koalas was derived from the geometry of the drone and the spotter/ground-truth locations of the koalas). Based on the geometry alone (i.e., regardless of whether or not it was possible to manually confirm the koala detection due to occlusion), these frames contained 2336, 2175, and 412 koala detections. There were 25 unique koalas in datasets A and B and 6 in dataset C. In 40%, 42%, and 36% of these locations, based on manual verification, koala heat signatures were completely blocked and could not be visually identified in datasets A, B, and C, respectively. These ratios were computed by checking the koala locations in the frames that have koalas and declaring full occlusion when no thermal signature was visible (as in Figure 4c). The average overall full occlusion ratio for the three datasets was 40%, i.e., if a koala was present in 100 images, its heat signature could be expected to be detectable in only 60 of these images.

Figure 9 shows the attenuation probability for the horizontal (across-track) field of view (HFOV) and vertical (along-track) field of view (VFOV), respectively, for the three datasets and the average overall results. It can be seen that koalas are more likely to be fully occluded near the centre of the image as the average probability of attenuation increased by 27% and 13% with respect to the HFOV and VFOV, respectively. This indicates koalas are less likely to be detected when they are on trees directly beneath the drone. However, this was based on our datasets that employed drone collection strategies using a camera in nadir. To generalise the statement, more data need to be collected at different sites, times of the day, seasons, and weather conditions, with the camera at different angles and in plantations as well as in native forest.

3.1.2. Simulation Analysis

A series of simple simulation experiments were conducted to analyse the effect of different flight configurations on the probability of detecting koalas. In these experiments, different simulated tree structures, tree separation distances, and koala heights were used with a variety of camera tilt angles and flight altitudes. A description of the tree’s structure, koala altitude, and drone-camera flight settings were simulated as follows:

Tree’s structure: Trees in mature eucalyptus plantations are mainly composed of straight stems with an average height of 30 m, diameter of 30 cm, and uniform spacing between trees of approximately 3 m. Figure 10 shows two images taken at different eucalyptus plantations. The structure of trees in native forests is very different to these, being far more variable.
Figure 11 shows a graphical representation of four simulated tree structures. These images show (a) tree stems; (b) stems and branches; (c) stems, branches, and foliage of 1 m diameter; and (d) stems, branches, and heavier foliage of 2 m diameter. Each has a progressively greater occluding effect on koala detection. In the simulation, tree stems are represented as a uniform lattice of 30 m high vertical cylinders of 30 cm diameter, and branches are represented by four cylinders of 1 m length and 15 cm diameter at an elevation angle of 45°, oriented as a cross at the top of each tree. Foliage is represented as a spherical blob, diameter 1 m for light foliage and 2 m for heavy foliage. Any ray travelling from the drone to the koala that penetrates or grazes a tree structure was considered to fully attenuate the koala’s signature. In other words, the model depicted a binary occlusion model, with no attenuation or spatial diminution represented (known as the Beer–Lambert law).
In reality, the structure of tree canopy and branches is highly complex and depends on many (interdependent) environmental factors. It is thus challenging to realistically simulate all such elements, and such an undertaking is beyond the scope of this paper. However, whilst somewhat unrepresentative of the physical world, the model suffices as a rough indicator in terms of the impact that trees, branches, and foliage have on the occlusion of koala signatures.
Koala altitude: Koalas were modelled as spheres, radius 25 cm, at three heights (see Figure 11). They were placed at altitudes of 25 m, 15 m, and 0 m. These altitudes represent near the top of a 30 m tree, which is where koalas are often found (in the foliage), just below the main foliage (near the middle of the tree), and on the ground next to a tree.
Drone-camera flight settings: The simulated flight altitude was varied from 40 m up to 100 m over trees of 30 m height, i.e., 10 to 50 m above the canopy. The camera depression angle was varied from 0 (horizontal) to 90° (nadir) in 10° increments. The camera had a 50 × 50° field of view with a 640 × 640 pixel image resolution.
The trigonometric relationship between drone altitude ( $f h$ ), camera depression angle ( $d a$ ), and koala altitude ( $k a$ ) and the koala size in the captured images is depicted in Figure 12 and Equations (2) and (3).
(2) $d c a = \frac{f h - k a}{c o s (90 - d a + θ)}$

(3) $K o a l a s i z e (i n p i x e l s) = \frac{I m a g e D i m e n s i o n (i n p i x e l s) \times K o a l a S i z e (i n m)}{2 \times d c a (i n m) \times t a n (\frac{F O V}{2})}$
where $d c a$ refers to distance between camera and koala (in m), $f h$ refers to flight height (in m), $k a$ refers to koala height (in m), $d a$ refers to depression angle (in degrees), and $θ$ refers to camera incident angle of the field of view (in degrees) (See Figure 12).

Figure 10

Two images taken at different eucalyptus plantations where tree stems are straight and spacing between trees is uniform.

[Figure omitted. See PDF]

Figure 11

Four types of simulated tree structures that enabled the occluding effects of tree stems, branches, and foliage to be examined. The tree structure comprised (a) tree stems, (b) branches, (c) foliage of 1 m diameter (light canopy), and (d) foliage of 2 m diameter (heavy canopy). Tree stems were represented as a uniform lattice of vertical cylinders, 30 m high and with a diameter of 30 cm. Branches were represented by four cylinders of 1 m length and 15 cm diameter oriented at 45° (elevation angle) to the horizontal and distributed as a cross at the top of each tree. Foliage was represented as a spherical blob, with a diameter of 1 m for light canopy and 2 m for heavy canopy. The koala was simulated as a sphere of radius 25 cm at three different altitudes: 25 m, 15 m, 0 m (Note: This is only an illustrative sketch so the dimensions are not to scale).

[Figure omitted. See PDF]

Figure 12

A sketch to demonstrate the mathematical relation between the flight height ( $f h$ ), camera depression angle ( $d a$ ), koala altitude ( $k a$ ), camera incident angle of the field of view ( $F O V$ ), and the koala size in the captured images.

[Figure omitted. See PDF]

3.2. Results of Part 2

The simulation was encoded using the Unity Engine 5 software [89]. The simulation environment was composed of a square grid of 9 × 9 trees, with a simulated koala attached to the central tree. For each experiment, the camera overflew the tree grid in a lawn-mower pattern, with a 2 m separation between adjacent transects (the approximate angular discrimination between transects was 2.9–1.9° for 40 m and 60 m flight altitudes, respectively). The transects were set close enough to ensure that the environment was captured from different perspectives of the camera along-track and across-track field of views. The along- and across-track extrema for the flight transects was such that the tree containing the simulated koala was visible in all of the images. The camera captured an image from the scene every 2 m in the along-track direction.

To avoid any bias, each simulation was repeated eight times, with the koala oriented on a different side of the central tree each time (north, north-east, east, south-east, south, south-west, west, and north-west). For each experiment, the probability of koala detection (true positive rate) was then computed as per Equation (1). The detector used to extract the koala from images was a simple threshold filter. That is, ray paths were considered to travel from the centre of the drone to each koala pixel, and any that intersected a tree structure (stem, branch, or foliage) were considered to block the path to the koala area in that angle of the field of view entirely. In other words, the model represented binary occlusion (no intensity signature attenuation), with only a spatial diminution represented.

The first simulation investigated the effect of tree separation distance on the probability of koala detection. Figure 13 shows the probability of detection using 1 m, 3 m, and 5 m separation distances for the four different tree structures (stems only, stems with branches, and stems with 1 m and 2 m foliage, as detailed previously) as a function of camera depression angle range (0–90°), three koala altitudes (25 m, 15 m, and 0 m on a tree height of 30 m), and a drone height of 40 m, i.e., 10 m above the canopy. Despite the simplicity of the simulation, the results show several trends that should be considered when planning for aerial koala surveys.

First, in the absence of foliage, koalas are less occluded when they are close to the tree tops. This is self-evident as the occlusion is caused by the stems alone. Second, the tree separation distance had a significant impact on koala occlusion, and this effect increased when koalas were closer to the ground. In other words, koalas are more likely to be missed due to occlusion in high-density plantations or forests. Third, the camera tilt angle should be at nadir (in Figure 13, nadir is a depression angle of 90°), except in high tree density areas with heavy foliage, where the camera tilt angle should be angled in the range 40–60° to view beneath the ’impenetrable’ canopy. In Figure 13a, as the heavy foliage was simply represented as a sphere of 2 m diameter that fully occludes anything behind it (binary occlusion), and the tree separation distance was 1 m, any koala close to the top of the trees (but below the canopy) could not be detected when the camera was at nadir. However, in Figure 13b,c, when the koala was at lower altitudes (15 m and 0 m, respectively) and they are located far below the foliage, their detectability improves relative to Figure 13a. In all cases in Figure 13a–c, the detection was optimal when tilting the camera in the range of 40–60° for the scenario of heavy foliage and 1 m tree separation distance.

In Figure 13d,e,g,h, it can be seen that the probability of detection drops around a depression angle of 40°, especially in the scenario of heavy foliage (seen more clearly in Figure 13d). This drop in detectability is an artefact of the simple geometric model used in the simulation, and the reason is again related to the simplicity of the model used and the geometry between the camera tilt angle and the location of koalas relative to the canopy structure and tree separation distance. The simplicity of the model precludes nuanced findings being reported, as these are likely artefacts of the model used. Nevertheless, key messages may be drawn: (a) for readily penetrable foliage scenarios orienting the camera at nadir offers the highest detection probability (lowest occlusion), unless the trees are packed very tightly together, (b) the impact of tree stems alone (no foliage) can have a dramatic impact on koala occlusion, even for koalas high up in the trees, and (c) aside from when trees are very densely packed (1 m separation distance), the effect of light foliage (branches and 1 m foliage) relative to tree stems alone is relatively modest.

The second simulation experiment investigated the effect of flight height on the probability of koala detection/occlusion. Figure 14 shows the probability of detection for drone flights 10 m, 30 m, and 50 m above the canopy, for a 3 m tree stem separation distance and the four different foliage structures used previously. Once again, the detection probability is plotted as a function of camera depression angle for three koala altitudes (25 m, 15 m, and 0 m) on trees of 30 m, and the detector used to extract the koala from captured images was (again) a simple threshold filter. Whilst the probability of detection due to occlusion did not change much as a function of drone altitude, it should be noted that the size of a koala’s signature decreased as the range increased. Figure 15 provides an estimate of koala size in pixels, ignoring any occlusion effect of trees, etc. (note: $25^{2}$ pixels means 25 × 25 pixels, not 5 × 5 pixels). In this figure, the koala is represented as a sphere of 25 cm radius; drone altitude ( $f h$ ) varied between 40 m and 100 m; camera depression angle ( $d a$ ) from 0 to 90°; and koala altitude ( $k a$ ) of 25 m, 15 m, and 0 m on a 30 m tree. The results shown are for the centre of the camera’s field of view, $θ$ ( $θ$ = 0°). The trigonometric relationship between these factors is depicted in Figure 12 and Equations (2) and (3). The effect of signature diminution had a more significant impact on the probability of detection in real datasets than the simulated datasets as not every detection technique can detect koala signatures ‘cleanly’, i.e., if the signatures are small (i.e., less than $5 \times 5$ pixels) and detected, they are likely accompanied by a large number of false alarms (not simulated in this model). Note also that techniques such as CNN-based detectors cannot generally detect such small signatures, whilst for others, such as MOBIVLS, small objects are not so problematic.

Figure 15a shows that, for koalas close to the tree tops, their signature is more than 5 × 5 pixels for flight altitudes up to 70 m above the canopy and camera depression angles of 80–90° and that, as the depression angle decreases, the flight height required to keep the koala at or above 5 × 5 pixels also decreases in a roughly linear fashion, i.e., for a depression angle of 20° the maximum altitude is about 20 m. For koalas on or near the ground (Figure 15c), the flight altitude required to keep the koala signature at or above 5 × 5 pixels decreases to 40 m above canopy, and once again, as the depression angle decreases, the flight altitude required to keep the koala size at or above 5 × 5 pixels decreases, albeit this time to 20 m above the canopy for a depression angle of 50°.

Figure 16 shows how koala signature size varies as a function of camera zenith/depression angle. It can be seen that the apparent koala size varies as a function of image location and is related to the camera angle, an effect that is quite pronounced for shallow (near horizontal) angles (0–40°).

To obtain an insight into how foliage might attenuate the koala signature, simplistic calculations were conducted where the rate of attenuation, $α$ , is assumed to have a linear relationship to the integrated path length of a ray as it passes through foliage blocking the direct line of sight between the camera and koala (note: this relationship is often known as the Beer–Lambert law). Although the assumption is not a perfect representation of physical reality, it is a reasonable assumption that offers insight into the effect. Figure 17 shows the results of this simple attenuation experiment. The camera height was 40 m (10 m above the canopy); koala altitude 25 m; foliage represented by sphere of radius 1 m; and tree separation distances of 1 m, 3 m, and 5 m, as shown in Figure 17a, b, and c, respectively.

Figure 17d, e, and f depict plots for relative attenuation for Figure 17a, b, and c, respectively. The x-axis represents the horizontal distance between the camera and the koala. The values in the x- and y-axes are shown as a percentage of the foliage radius (R) so that they are more physically meaningful. Attenuation values may be computed by multiplying the value on the y-axis by an appropriate value of $α$ . As the separation distance between trees decreased, the attenuation effect of foliage increased, and vice versa, and heavier foliage had a greater attenuating impact on the koala signature. The artefacts/spikes, especially in Figure 17d, are related to the simplicity of the geometric model used in the simulation. This simplicity of the model precludes nuanced findings being reported, as these are likely artefacts of the model used.

Nevertheless, key conclusions may be drawn: (a) there is an inverse relationship between the separation distance between trees and the attenuation effect of foliage on koala heat signatures, and (b) aside from when trees are very densely packed (1 m separation distance), koala signatures tend to be attenuated by canopy structure when they are directly beneath the drone and less likely to be attenuated when they are away. This is consistent with the findings from real data (see Figure 9), where it was noted that koala signatures were more likely to be occluded (less detectable) when they are close to the centre of the image than on its sides. However, this was based on our datasets and drone collection strategies (camera in nadir), and to generalise the statement, more data need to be collected and processed from different sites (plantations and native forests), times of the day, seasons, and weather conditions.

4. Discussion

This paper examined the performance of the newly proposed novel Multiscale Object Bio-Inspired Vision Line Scanner technique (MOBIVLS) [75] versus ten existing detection techniques on three LWIR datasets of koalas in eucalyptus plantations. The ten techniques are composed of six LWIR small-object-detection techniques widely used in the literature to detect dim and small objects [77,78,79,80,81,82], four computer-vision and CNN-based techniques that have previously been used to detect different objects (including koalas) [7,39,40,41,55]. The results show that the MOBIVLS method significantly outperforms all other methods tested in this study (including the deep-learning techniques), and that the approach was able to detect koalas in high-clutter environments with low false alarm rates. This was despite the observations being made during the day as opposed to in the early morning, where the temperate differential makes detection simpler. It is also noted that MOBIVLS does not need to be pre-trained [75]. Results could be further improved by using a tracker or data association between sequences of frames [7,23,39,55], and this will be investigated in future work.

Time of day is known to have a significant impact on the performance of automated koala detection techniques using LWIR imagery due to the diurnal effects of solar heating. It therefore needs to be considered when planning field trials. Preliminary results show that a one-hour time difference between two datasets recorded at the same site (using the same drone, camera payload, flight path, altitude, etc.) increased the mean recorded brightness temperature by 3 °C ± 1 °C. This small increase in temperature appears to have degraded the performance of the automatic detection techniques, as measured by $A U R O C$ and $E E R$ , by 7–20% and 5–17%, respectively. High summer temperatures reduce the contrast between koalas and their surroundings. Therefore, it increases the difficulty of detecting them. Moreover, as the environment becomes heated, this will also likely increase false detections as more heat reflections will be falsely detected as koalas. This is shown when comparing the results of datasets A and B, which comprises images of the same site but with a one-hour time difference, and a corresponding temperate change between them. It is noted, however, that one sample of data is insufficient to draw conclusions, and to thoroughly investigate the effect of temperature/time of day, field trials need to be carried out over several days at different months of the year (i.e., different seasons) and at multiple sites.

The results also indicate that occlusion by tree stems and canopy structure has a significant impact on the potential detection of koalas, with the animals fully occluded in up to 40% of images known to have koalas. However, this finding was based on a limited number of datasets, and to generalise the statement, more datasets need to be examined.

Simple simulation experiments were conducted to examine the effect of different flight configurations on the effects of occlusion on the probability of koala detection. Despite the simplicity of the model employed, the experiments offer some useful insights for planning aerial koala surveys. Firstly, the position of the koala on the tree has a high impact on koala occlusion, with detection probability decreasing when koalas are closer to the ground. Secondly, camera tilt angle should be at nadir (depression angle of 90°), or at least in the range of 80–90°, except in cases of high-density trees with heavy foliage. In these cases, tilting the camera to 40–60° should provide better results. In the literature, although there are studies that used nadir [39,55] and oblique imagery [7,55] to detect koalas, there is no quantitative assessment or comparison as to which mode provides better detection results. This simulation therefore provides a step towards understanding the effect of camera tilt angle on koala detection.

Our simulation experiments are consistent with the extant literature regarding the effect of flight altitude on koala detection, finding that the probability of detection increases when flying at lower altitudes, i.e., closer to the tree tops [55,90]. This is to be expected as flying at lower altitudes ensures koalas occupy more pixels in the captured images, and this reduces the likelihood of a detector missing them. However, flying at low altitudes is not always possible (or safe) as trees are usually not of uniform height and terrain is not always even, especially in natural forests. In general, the flight altitude is not recommended to be more than 40 m above the canopy as the mathematical calculations suggest that the size of a koala in the captured images are expected to be smaller than $8 \times 8$ and $5 \times 5$ pixels (for the camera configuration used in this study) when they are at 25 m and 0 m altitudes, respectively, and this assumes there is no occlusion. In reality, the effects of foliage will reduce the effective size of the koala, and other environmental effects (weather, temperature, humidity, etc.) may diminish the signatures even further.

5. Conclusions

This paper presents contributions in the area of object detection, with a particular emphasis on detecting wildlife in low contrast, high clutter environments. A biologically inspired object-detection algorithm, known as the Multiscale Object Bio-Inspired Vision Line Scanner, or MOBIVLS, is presented and tested on three datasets. The MOBIVLS can process single images and does not require high frame rate temporal information, as is the case for other biologically inspired vision techniques.

The main focus of part 1 of this paper is on koala detection using long wave infrared imagery. Despite their small size and low levels of contrast against their surroundings in high-clutter environments, the MOBIVLS outperformed all state-of-the-art detection techniques to which it was compared and for all datasets that were examined. It achieved 74% area under the receiver operating characteristic curve (AUROC), a measure of average detection probability. This represented a 27% improvement over the next best method when used to detect koalas in aerial infrared images. MOBIVLS dynamically adapts to a range of environments without the need for training.

In part 2, in addition to applying MOBIVLS to infrared aerial datasets containing koalas, several other issues associated with koala detection in infrared aerial images were examined using a combination of simulation and real data. These included the nature of koala occlusion by tree structures, the effects of plantation properties on the effects of occlusion (and hence the probability of koala detection), and the effect of different flight parameters. The main findings indicate that koala occlusion by tree structure can have a significant impact on the potential detection of koalas, with koalas fully occluded in up to 40% of images in which they would otherwise be expected to be detected. However, this finding was based on a limited number of datasets, and to generalise the statement, more data need to be collected and processed for different sites, stand properties, times of day, season, and weather conditions. For the best results, based on the simple geometric model used to examine the matter, the drone’s camera should generally be set to nadir view (90°), or at least in the range of 80–90°. In certain circumstances, such as very high-density trees (1 m spacing) or heavy foliage, setting a camera tilt angle in the range 40–60° and flying closer to the tree tops can achieve better results than nadir.

In the future, it is planned to experiment with the MOBIVLS approach for the detection of more varied wildlife in forestry areas and also to investigate how the work could be adapted for the detection of objects of interest in urban environments.

Author Contributions

Conceptualisation, L.A.H.A.-S., A.F. and R.S.A.B.; methodology, L.A.H.A.-S., R.S.A.B. and A.F.; software, L.A.H.A.-S.; formal analysis, L.A.H.A.-S.; investigation, L.A.H.A.-S.; data curation, L.A.H.A.-S.; visualisation, L.A.H.A.-S.; validation, A.F. and R.S.A.B.; resources, A.F., R.S.A.B., D.W. and M.F.S.; writing—original draft preparation, L.A.H.A.-S.; writing—review and editing, R.S.A.B., A.F., D.W. and M.F.S.; supervision, A.F., R.S.A.B. and D.W.; project administration, A.F., R.S.A.B. and D.W.; funding acquisition, A.F., R.S.A.B., D.W. and M.F.S. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

The animal study protocol was approved by the Institutional Review Board (or Ethics Committee) of Deakin University (protocol code Deakin AEC B25-2019).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank everyone involved in the field trials, especially Steven Andriolo of EyeSky and the team of koala spotters.

Conflicts of Interest

The authors declare they have no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UAV/UAVs	Uncrewed Aerial Vehicle/S
Kirai	Koala InfraRed Aerial Imagery
HDR	High Dynamic Range
RTK	Real-Time Kinematic
GPS	Global Position System
ID/IDs	Identifier/s
AAGD	Average Absolute Gray Difference
IAAGD	Improved Average Absolute Gray Difference
MLCM	Multiscale Local Contrast Measure
HB-MLCM	High Boost Multiscale Local Contrast Measure
ILCM	Improved Local Contrast Measure
MPCM	Multiscale Patch Contrast Measure
R-CNN	Region Convolutional Neural Network
YOLO	You Only Look Once
TMBM	Template Matching Binary Mask
2DCNN	Two Deep Convolutional Neural Networks
MOBIVLS	Multiscale Object Bio-Inspired Vision Line Scanner
RTC	Rectified Transient Cells
TPR	True Positive Rate
FPR	False Positive Rate
TP	True Positive
FP	False Positive
TN	True Negative
FN	False Negative
EER	Equal Error Rate
ROC	Receiver Operating Characteristic
AUROC	Aeria Under Receiver Operating Characteristic
$A v g_{k d e t}$	AVeraGe DETectability per Koala
°C	degree Celsius
LWIR	Long-Wave InfraRed
IR	InfraRed
2D	Two Dimensions
FOV	Field Of View
HFOV	Horizontal (across-track) Field Of View
VFOV	Vertical (along-track) Field Of View
ka	Koala Altitude
fh	Flight Height
da	Depression Angle
dca	Distance between camera and koala
$θ$	Camera incident angle of the field of view
R	Foliage Radius

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. Two image samples from each dataset (A–C). Each image contains a koala in a eucalyptus plantation, highlighted by a red box.

View Image - Figure 2. Block diagram of the multiscale object of bio-inspired vision line scanners (MOBIVLS). The × symbol refers to a Hadamard product [76], while the + symbol refers to normal array addition. Images output from the LMC stages show that the object contrast has been enhanced in the direction of the scanning. PRC, LMC, and RTC stands for photoreceptor cell, lamina monopolar cell, and rectified transient cell, inspired from insects vision pathway.

Figure 2. Block diagram of the multiscale object of bio-inspired vision line scanners (MOBIVLS). The × symbol refers to a Hadamard product [76], while the + symbol refers to normal array addition. Images output from the LMC stages show that the object contrast has been enhanced in the direction of the scanning. PRC, LMC, and RTC stands for photoreceptor cell, lamina monopolar cell, and rectified transient cell, inspired from insects vision pathway.

View Image - Figure 3. Evaluation curves for 11 comparative koala detection techniques (AAGD, IAAGD, HB-MLCM, ILCM, MLCM, MPCM, TMBM, Faster R-CNN, YOLOv2, Combined 2DCNN, and the MOBIVLS): (a1–d1) show the receiver operating characteristic ([Forumla omitted. See PDF.]) curves (TPR vs. FPR); (a2–d2) show the recall vs. (1-precision) curves; and (a3–d3) show the [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] percentages. The [Forumla omitted. See PDF.] range over which the [Forumla omitted. See PDF.] calculations were computed was (0–[Forumla omitted. See PDF.]), while [Forumla omitted. See PDF.] range used was (0–1). The uppermost three rows of Figures show the results from datasets A–C, respectively, with the last row showing the overall (average) results. In all cases, the proposed MOBIVLS algorithm outperformed all of the other approaches tested.

Figure 3. Evaluation curves for 11 comparative koala detection techniques (AAGD, IAAGD, HB-MLCM, ILCM, MLCM, MPCM, TMBM, Faster R-CNN, YOLOv2, Combined 2DCNN, and the MOBIVLS): (a1–d1) show the receiver operating characteristic ([Forumla omitted. See PDF.]) curves (TPR vs. FPR); (a2–d2) show the recall vs. (1-precision) curves; and (a3–d3) show the [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] percentages. The [Forumla omitted. See PDF.] range over which the [Forumla omitted. See PDF.] calculations were computed was (0–[Forumla omitted. See PDF.]), while [Forumla omitted. See PDF.] range used was (0–1). The uppermost three rows of Figures show the results from datasets A–C, respectively, with the last row showing the overall (average) results. In all cases, the proposed MOBIVLS algorithm outperformed all of the other approaches tested.

View Image - Figure 4. Instances of different koalas from datasets A, B, and C, where (a) shows instances of minimally attenuated koala heat signatures, (b) shows instances of somewhat attenuated koala heat signatures (either in apparent size, contrast, or both), and (c) shows instances of koala heat signatures fully attenuated/occluded. The individual sub-images in (a–c) are ‘zoomed’ [Forumla omitted. See PDF.] pixel patches taken from the original 640 × 512 pixel LWIR images. Koalas are approximately central in each image.

Figure 4. Instances of different koalas from datasets A, B, and C, where (a) shows instances of minimally attenuated koala heat signatures, (b) shows instances of somewhat attenuated koala heat signatures (either in apparent size, contrast, or both), and (c) shows instances of koala heat signatures fully attenuated/occluded. The individual sub-images in (a–c) are ‘zoomed’ [Forumla omitted. See PDF.] pixel patches taken from the original 640 × 512 pixel LWIR images. Koalas are approximately central in each image.

View Image - Figure 5. Detection maps generated by 11 comparative object detection methods applied to infrared images. For each detection method, images I and II show processed samples for the raw data shown in Figure 1B. In each case, more false detections were generated, and/or the koala response was a lower contrast than for the MOBIVLS method (best viewed in digital format).

Figure 5. Detection maps generated by 11 comparative object detection methods applied to infrared images. For each detection method, images I and II show processed samples for the raw data shown in Figure 1B. In each case, more false detections were generated, and/or the koala response was a lower contrast than for the MOBIVLS method (best viewed in digital format).

View Image - Figure 6. Temperature distributions for the three datasets. Datasets A and B were recorded at the same site one hour apart; dataset C was recorded at a similar time of day to B but at a different location. The x-axis represents the brightness temperature values captured by the IR camera. The y-axis represents the probability of occurrence (note: the summation of frequency for all values of each dataset is equal to one).

Figure 6. Temperature distributions for the three datasets. Datasets A and B were recorded at the same site one hour apart; dataset C was recorded at a similar time of day to B but at a different location. The x-axis represents the brightness temperature values captured by the IR camera. The y-axis represents the probability of occurrence (note: the summation of frequency for all values of each dataset is equal to one).

View Image - Figure 7. (a) shows a 2D histogram of koala sizes versus their average contrast with respect to their surrounding background for all instances of koala detections in datasets A, B, and C. (b,c) show histograms of koala sizes and their average contrast with their surrounding background, respectively. The data are drawn from all 3250 detections in datasets A, B, and C. The average contrast between koala and background was computed by differencing the average intensity of koala pixels and the average of their surrounding background pixels within the [Forumla omitted. See PDF.] pixel patches immediately around the koalas.

Figure 7. (a) shows a 2D histogram of koala sizes versus their average contrast with respect to their surrounding background for all instances of koala detections in datasets A, B, and C. (b,c) show histograms of koala sizes and their average contrast with their surrounding background, respectively. The data are drawn from all 3250 detections in datasets A, B, and C. The average contrast between koala and background was computed by differencing the average intensity of koala pixels and the average of their surrounding background pixels within the [Forumla omitted. See PDF.] pixel patches immediately around the koalas.

View Image - Figure 8. Sequence of images showing the same koala for different occlusion cases. The red, yellow, and green bounding boxes show cases when the koala is fully, partially, and not obscured by tree canopies or trunks. The UAV flew from left to right so the position of the koala in the image sequence appears to move from right to left. Zoomed in regions around the koala ([Forumla omitted. See PDF.] pixels) are shown in the lower right of each image for enhanced clarity.

Figure 8. Sequence of images showing the same koala for different occlusion cases. The red, yellow, and green bounding boxes show cases when the koala is fully, partially, and not obscured by tree canopies or trunks. The UAV flew from left to right so the position of the koala in the image sequence appears to move from right to left. Zoomed in regions around the koala ([Forumla omitted. See PDF.] pixels) are shown in the lower right of each image for enhanced clarity.

View Image - Figure 9. The attenuation probability with respect to the camera horizontal (across-track) field of view (HFOV) (a) and vertical (along-track) field of view (VFOV) (b) for the three datasets and the average overall results. The y-axis represents the attenuation probability computed based on the accumulation of koala observations within images, and the x-axis represents the across-track or along-track field of view, where the HFOV was 50° (−24.5° to 24.5°) and the VFOV was 37.5° (−18.25° to 18.25°).

Figure 9. The attenuation probability with respect to the camera horizontal (across-track) field of view (HFOV) (a) and vertical (along-track) field of view (VFOV) (b) for the three datasets and the average overall results. The y-axis represents the attenuation probability computed based on the accumulation of koala observations within images, and the x-axis represents the across-track or along-track field of view, where the HFOV was 50° (−24.5° to 24.5°) and the VFOV was 37.5° (−18.25° to 18.25°).

View Image - Figure 13. The true positive rates (probability of detection, y-axis) for a range of simulation experiments using 1 m, 3 m, and 5 m tree separation distances (upper (a–c), middle (d–f), and bottom (g–i) rows, respectively) for four types of tree structure (tree stems alone (dark blue lines), tree stems with branches (light blue lines), tree stems with foliage of 1 m diameter (green lines), and tree stems with heavier foliage of 2 m diameter (orange lines)), camera depression angle range (0–90°) (x-axis), and three koala altitudes (25 m (left column), 15 m (centre column) and 0 m (right hand column)). Tree height was 30 m and drone height was 40 m, i.e., 10 m above the canopy. Simulated koalas were always placed adjacent to a tree stem.

Figure 13. The true positive rates (probability of detection, y-axis) for a range of simulation experiments using 1 m, 3 m, and 5 m tree separation distances (upper (a–c), middle (d–f), and bottom (g–i) rows, respectively) for four types of tree structure (tree stems alone (dark blue lines), tree stems with branches (light blue lines), tree stems with foliage of 1 m diameter (green lines), and tree stems with heavier foliage of 2 m diameter (orange lines)), camera depression angle range (0–90°) (x-axis), and three koala altitudes (25 m (left column), 15 m (centre column) and 0 m (right hand column)). Tree height was 30 m and drone height was 40 m, i.e., 10 m above the canopy. Simulated koalas were always placed adjacent to a tree stem.

View Image - Figure 14. True positive rates (probability of detection) of the simulation experiments that make use of 3 m tree separation distances for four types of tree structure (tree stems only, branches, foliage of 1 m diameter, and heavier foliage of 2 m diameter), camera depression angle range (0–90°), three koala altitudes (left hand column: 25 m, centre column: 15 m, right hand column: 0 m) on a tree of 30 m height and flight heights of (upper row: 10 m (a–c), centre row: 30 m (d–f), lower row: 50 m (g–i)) above canopy (i.e., 40 m, 60 m, 80 m above ground).

Figure 14. True positive rates (probability of detection) of the simulation experiments that make use of 3 m tree separation distances for four types of tree structure (tree stems only, branches, foliage of 1 m diameter, and heavier foliage of 2 m diameter), camera depression angle range (0–90°), three koala altitudes (left hand column: 25 m, centre column: 15 m, right hand column: 0 m) on a tree of 30 m height and flight heights of (upper row: 10 m (a–c), centre row: 30 m (d–f), lower row: 50 m (g–i)) above canopy (i.e., 40 m, 60 m, 80 m above ground).

View Image - Figure 15. An estimate of koala size in pixels with respect to flight height range (40–100 m), camera depression angle range (0–90°), and centre of the camera (field of view ([Forumla omitted. See PDF.]): 0°) for different koala altitudes: (a) 25 m, (b) 15 m, (c) 0 m. The equations used in these calculations (see Equations (2) and (3)) provide an estimate of koala size (in pixels) in an ideal case and ignore any occlusion effect of trees or the surrounding environment.

Figure 15. An estimate of koala size in pixels with respect to flight height range (40–100 m), camera depression angle range (0–90°), and centre of the camera (field of view ([Forumla omitted. See PDF.]): 0°) for different koala altitudes: (a) 25 m, (b) 15 m, (c) 0 m. The equations used in these calculations (see Equations (2) and (3)) provide an estimate of koala size (in pixels) in an ideal case and ignore any occlusion effect of trees or the surrounding environment.

View Image - Figure 16. Koala size (in pixels) as a function of incident camera angles and field of view for different koala and flight altitudes. The estimated koala sizes were computed with respect to flight height above the canopy (upper row (a–c): 10 m, centre row (d–f): 30 m, lower row (g–i): 50 m), camera depression angle range (0–90°), koala altitude (left hand column: 25 m, centre column: 15 m, right hand column: 0 m), and camera field of view, [Forumla omitted. See PDF.] = −25° to 25°. The equations used in these calculations (Equations (2) and (3)) provide an estimate of koala size for ideal cases only and ignores the occluding effects of trees and their environment.

Figure 16. Koala size (in pixels) as a function of incident camera angles and field of view for different koala and flight altitudes. The estimated koala sizes were computed with respect to flight height above the canopy (upper row (a–c): 10 m, centre row (d–f): 30 m, lower row (g–i): 50 m), camera depression angle range (0–90°), koala altitude (left hand column: 25 m, centre column: 15 m, right hand column: 0 m), and camera field of view, [Forumla omitted. See PDF.] = −25° to 25°. The equations used in these calculations (Equations (2) and (3)) provide an estimate of koala size for ideal cases only and ignores the occluding effects of trees and their environment.

View Image - Figure 17. Simulation-based estimate of the amount of foliage obstructing the direct line between a camera and a koala. The heavier the foliage, the more a koala signature was attenuated, and vice versa. Therefore, foliage attenuation (y-axis) may be used as a surrogate for foliage density (Note: Although this is not an accurate representation of real-world data, it does give a rough indication of the attenuating effect of foliage on koala signatures). The camera height was fixed at 40 m (10 m above the canopy); the koala was at a height of 25 m; the foliage was represented as a sphere of radius (R) 1 m; and the tree separation distance was 1 m, 3 m, and 5 m, as shown in (a–c), respectively (note: these are only illustrative sketches so the dimensions are not to scale). (d–f) show the amount of foliage attenuation that obstructs a direct line between a camera and the koala for each figures directly above it. The x-axis represents the horizontal distance between the camera and the koala host tree.

Figure 17. Simulation-based estimate of the amount of foliage obstructing the direct line between a camera and a koala. The heavier the foliage, the more a koala signature was attenuated, and vice versa. Therefore, foliage attenuation (y-axis) may be used as a surrogate for foliage density (Note: Although this is not an accurate representation of real-world data, it does give a rough indication of the attenuating effect of foliage on koala signatures). The camera height was fixed at 40 m (10 m above the canopy); the koala was at a height of 25 m; the foliage was represented as a sphere of radius (R) 1 m; and the tree separation distance was 1 m, 3 m, and 5 m, as shown in (a–c), respectively (note: these are only illustrative sketches so the dimensions are not to scale). (d–f) show the amount of foliage attenuation that obstructs a direct line between a camera and the koala for each figures directly above it. The x-axis represents the horizontal distance between the camera and the koala host tree.

Table 1

Parameter settings of different methods.

No.	Methods	Parameter Settings
1	AAGD [77]	Inner window scales: 3, 5, 7, 9, 11 pixels
2	IAAGD [78]	Outer window scales: 21, 21, 21
3	HB-MLCM [79]	21 pixels
4	ILCM [80]	Window scale: 10 pixels, step size: 1 pixel
5	MLCM [81]	Object window scales: 3, 5, 7, 9, 11
6	MPCM [82]	Object window scales: 3, 5, 7, 9, 11
7	TMBM [55]	Template Processing: subtracting
		constant (C): 0.2, threshold (T): 0.6
		Scoring threshold: 0.8, pixel intensity
		threshold: 0.1
8	Faster R-CNN	Backbone: VGG16, batch size: 8,
	[7,39,40]	epoch: 100, initial learning rate: $10^{- 5}$
9	YOLOv2	Backbone: VGG16, batch size: 32,
	[7,39,41]	epoch: 100, initial learning rate: $10^{- 5}$
10	Combined 2DCNN	2DCNN are Faster R-CNN and YOLOv2,
	[7,39]	parameter settings: same as No. 8–9
11	MOBIVLS [75]	Object scales 3–11 pixels

Table 2

The overall results of the three datasets, which were computed by treating the datasets as a single entity. Several performance metrics were computed for different object-detection techniques at (a) $F P R$ of $10^{- 6}$ and (b) $10^{- 5}$ . The total number of unique koalas was 56. The best result of each metric is highlighted by an underline and bold style. The second-best result is indicated by bold style only. The proposed MOBIVLS algorithm performed better than all of the other techniques against all of the metrics used. The column ‘Time’ represents the processing time in seconds per frame.

No.	Methods	Recall (%)		F1 Score (%)		Koala Count		Avg_kdet (%)		Time
No.	Methods	$a$	$b$	$a$	$b$	$a$	$b$	$a$	$b$	Time
1	AAGD [77]	3.6	11	6.6	19.1	9	24	3.6	11.1	0.06
2	IAAGD [78]	9.4	25.9	15.6	40	17	35	8.7	23.7	0.18
3	HB-MLCM [79]	6.7	20.9	11.8	33.5	16	28	6.5	19.7	0.06
4	ILCM [80]	14.3	$33.3$	23.1	$48.6$	24	36	11.9	$28.2$	0.11
5	MLCM [81]	$16.8$	29.6	$26.3$	44.9	26	41	16	28.1	0.37
6	MPCM [82]	8	20.4	13.6	33	16	32	7.5	18.9	0.25
7	TMBM [55]	8.2	22.3	13.4	35.5	14	35	7.6	20.6	0.053
8	Faster R-CNN	0.4	4.3	0.5	5	3	13	0.5	4.6	14
	[7,39,40]
9	YOLOv2	0.6	6	0.8	7.6	3	17	0.7	7.2	3.5
	[7,39,41]
10	Combined 2DCNN	1.4	13.7	1.8	17.7	3	17	1.6	15.7	17.5
	[7,39]
11	MOBIVLS [75]	$30.2$	$63.7$	$39.8$	$76.1$	32	51	$32.1$	$67.5$	0.8

Appendix A

Table A1

Results from Dataset A of several performance metrics for different object-detection techniques at (a) $F P R$ of $10^{- 6}$ and (b) $10^{- 5}$ . The total number of unique koalas was 25. The best result of each metric is highlighted by an underline and bold style. The second-best result is written in bold style only. The proposed MOBIVLS algorithm performed better than all of the other techniques against all of the metrics used.

No.	Methods	Recall (%)		F1 Score (%)		Koala Count		Avg_kdet (%)
No.	Methods	$a$	$b$	$a$	$b$	$a$	$b$	$a$	$b$
1	AAGD [77]	3.1	10.5	5.8	18.5	4	12	4.7	15
2	IAAGD [78]	10.2	29	16.7	43.7	9	19	12.2	33.6
3	HB-MLCM [79]	7.5	24.2	13	37.9	9	16	8.8	28
4	ILCM [80]	$24.2$	44	$37.1$	$59.8$	15	20	23	$48.2$
5	MLCM [81]	21.2	33.6	32.3	49.5	15	22	23	36.4
6	MPCM [82]	9.7	25.6	16	39.8	8	17	11.5	29.8
7	TMBM [55]	9.9	26.5	15.8	40.9	8	21	11.2	29.8
8	Faster R-CNN	1.7	16.6	2.3	22.6	1	11	1.8	18.2
	[7,39,40]
9	YOLOv2	1.5	15.4	2.3	22.8	1	14	1.9	18.6
	[7,39,41]
10	Combined 2DCNN	2.1	20.7	2.6	26.2	1	10	2.3	23.4
	[7,39]
11	MOBIVLS [75]	$40.4$	$73.5$	$50.9$	83	17	25	$43.2$	82

View Image - Figure A1. Detection maps generated by 11 comparative object-detection methods applied to infrared images. For each detection method, images I and II show processed samples for the raw data shown in Figure 1A. In each case, more false detections were generated, and/or the koala response was lower contrast, than for the MOBIVLS method (best viewed in digital format).

Figure A1. Detection maps generated by 11 comparative object-detection methods applied to infrared images. For each detection method, images I and II show processed samples for the raw data shown in Figure 1A. In each case, more false detections were generated, and/or the koala response was lower contrast, than for the MOBIVLS method (best viewed in digital format).

View Image - Figure A2. Detection maps generated by 11 comparative object-detection methods applied to infrared images. For each detection method, images I and II show processed samples for the raw data shown in Figure 1C. In each case, more false detections were generated, and/or the koala response was lower contrast, than for the MOBIVLS method (best viewed in digital format).

Figure A2. Detection maps generated by 11 comparative object-detection methods applied to infrared images. For each detection method, images I and II show processed samples for the raw data shown in Figure 1C. In each case, more false detections were generated, and/or the koala response was lower contrast, than for the MOBIVLS method (best viewed in digital format).

Table A2

Results from Dataset B of several performance metrics for different object-detection techniques at (a) $F P R$ of $10^{- 6}$ and (b) $10^{- 5}$ . The total number of unique koalas was 25. The best result of each metric is highlighted by an underline and bold style. The second-best result is written in bold style only. The proposed MOBIVLS algorithm performed better than all of the other techniques against all of the metrics used.

No.	Methods	Recall (%)		F1 Score (%)		Koala Count		Avg_kdet (%)
No.	Methods	$a$	$b$	$a$	$b$	$a$	$b$	$a$	$b$
1	AAGD [77]	5.1	13.8	9.3	23.6	5	12	6.6	18.4
2	IAAGD [78]	10.2	27.3	16.8	41.73	7	15	12.3	32.8
3	HB-MLCM [79]	7.2	20.5	12.6	33.1	5	11	9.2	25.9
4	ILCM [80]	12.6	$28.7$	20.9	$43.5$	8	14	15.1	$34.9$
5	MLCM [81]	$16.1$	28	$25.6$	43	9	15	$19.6$	33.9
6	MPCM [82]	7.8	19.5	13.3	31.8	7	14	10.2	25.1
7	TMBM [55]	7.2	20.8	11.9	33.6	5	13	8.8	25.1
8	Faster R-CNN	0.3	2.5	0.3	2.6	1	1	0.3	2.9
	[7,39,40]
9	YOLOv2	0.4	4.1	0.5	4.6	1	2	0.4	4.4
	[7,39,41]
10	Combined 2DCNN	1.1	10.6	1.4	14.2	1	6	1.1	11.4
	[7,39]
11	MOBIVLS [75]	$24.6$	$56.7$	$33.9$	$70.8$	12	21	$28.4$	$63.8$

Table A3

Results from Dataset C of several performance metrics for different object-detection techniques at (a) $F P R$ of $10^{- 6}$ and (b) $10^{- 5}$ . The total number of unique koalas was 6. The best result of each metric is highlighted by an underline and bold style. The second-best result is written in bold style only. The proposed MOBIVLS algorithm performed better than all of the other techniques against all of the metrics used.

No.	Methods	Recall (%)		F1 Score (%)		Koala Count		Avg_kdet (%)
No.	Methods	$a$	$b$	$a$	$b$	$a$	$b$	$a$	$b$
1	AAGD [77]	0	0	–	–	0	0	0	0
2	IAAGD [78]	2.2	5.8	4.2	10.5	1	1	1.9	4.9
3	HB-MLCM [79]	2.1	6.3	3.9	11.3	1	1	1.7	5.3
4	ILCM [80]	0.5	1.9	1	3.6	1	2	0.4	1.8
5	MLCM [81]	$6.5$	$16.2$	$11.2$	$27.2$	2	3	$5.7$	$14.5$
6	MPCM [82]	1	2.9	2	5.5	1	1	0.9	2.4
7	TMBM [55]	3.2	8	6	14.4	1	1	3	7.1
8	Faster R-CNN	0.3	3.1	0.3	3.4	1	1	0.4	3.6
	[7,39,40]
9	YOLOv2	0.7	6.8	0.8	7.8	1	1	0.8	8.3
	[7,39,41]
10	Combined 2DCNN	0.9	8.9	1.1	10.6	1	1	1.1	11
	[7,39]
11	MOBIVLS [75]	21	$53.2$	$28.4$	$67.2$	3	5	$22.3$	$56.5$

References

1. Kellenberger, B.; Marcos, D.; Tuia, D. Detecting mammals in UAV images: Best practices to address a substantially imbalanced dataset with deep learning. Remote Sens. Environ.; 2018; 216, pp. 139-153. [DOI: https://dx.doi.org/10.1016/j.rse.2018.06.028]

2. Linchant, J.; Lisein, J.; Semeki, J.; Lejeune, P.; Vermeulen, C. Are unmanned aircraft systems (UAS s) the future of wildlife monitoring? A review of accomplishments and challenges. Mammal Rev.; 2015; 45, pp. 239-252. [DOI: https://dx.doi.org/10.1111/mam.12046]

3. Kamminga, J.; Ayele, E.; Meratnia, N.; Havinga, P. Poaching detection technologies—A survey. Sensors; 2018; 18, 1474. [DOI: https://dx.doi.org/10.3390/s18051474] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29738501]

4. Sanders, C.E.; Mennill, D.J. Acoustic monitoring of nocturnally migrating birds accurately assesses the timing and magnitude of migration through the Great Lakes. Condor Ornithol. Appl.; 2014; 116, pp. 371-383. [DOI: https://dx.doi.org/10.1650/CONDOR-13-098.1]

5. Ulhaq, A.; Khan, A. Pest Animal’s Detection and Habitat Identification in Low-resolution Airborne Thermal Imagery. Preprints; 2020; 2020090480. [DOI: https://dx.doi.org/10.20944/preprints202009.0480.v2]

6. Hanson, C.C.; Jolley, W.J.; Smith, G.; Garcelon, D.K.; Keitt, B.S.; Little, A.E.; Campbell, K.J. Feral cat eradication in the presence of endemic San Nicolas Island foxes. Biol. Invasions; 2015; 17, pp. 977-986. [DOI: https://dx.doi.org/10.1007/s10530-014-0784-0]

7. Hamilton, G.; Corcoran, E.; Denman, S.; Hennekam, M.E.; Koh, L.P. When you can’t see the koalas for the trees: Using drones and machine learning in complex environments. Biol. Conserv.; 2020; 247, 108598. [DOI: https://dx.doi.org/10.1016/j.biocon.2020.108598]

8. Wilmott, L.; Cullen, D.; Madani, G.; Krogh, M.; Madden, K. Are koalas detected more effectively by systematic spotlighting or diurnal searches?. Aust. Mammal.; 2019; 41, pp. 157-160. [DOI: https://dx.doi.org/10.1071/AM18006]

9. EPBC Act Referral Guidelines for the Vulnerable Koala. 2014; Available online: http://environment.gov.au/system/files/resources/dc2ae592-ff25-4e2c-ada3-843e4dea1dae/files/koala-referral-guidelines.pdf (accessed on 10 October 2024).

10. Focardi, S.; De Marinis, A.M.; Rizzotto, M.; Pucci, A. Comparative evaluation of thermal infrared imaging and spotlighting to survey wildlife. Wildl. Soc. Bull.; 2001; 29, pp. 133-139.

11. Lethbridge, M.; Stead, M.; Wells, C. Estimating kangaroo density by aerial survey: A comparison of thermal cameras with human observers. Wildl. Res.; 2019; 46, pp. 639-648. [DOI: https://dx.doi.org/10.1071/WR18122]

12. Ocholla, I.A.; Pellikka, P.; Karanja, F.N.; Vuorinne, I.; Odipo, V.; Heiskanen, J. Livestock detection in African rangelands: Potential of high-resolution remote sensing data. Remote Sens. Appl. Soc. Environ.; 2024; 33, 101139. [DOI: https://dx.doi.org/10.1016/j.rsase.2024.101139]

13. Barrios, D.B.; Valente, J.; van Langevelde, F. Monitoring mammalian herbivores via convolutional neural networks implemented on thermal UAV imagery. Comput. Electron. Agric.; 2024; 218, 108713. [DOI: https://dx.doi.org/10.1016/j.compag.2024.108713]

14. Brunton, E.A.; Leon, J.X.; Burnett, S.E. Evaluating the Efficacy and Optimal Deployment of Thermal Infrared and True-Colour Imaging When Using Drones for Monitoring Kangaroos. Drones; 2020; 4, 20. [DOI: https://dx.doi.org/10.3390/drones4020020]

15. Lyu, H.; Qiu, F.; An, L.; Stow, D.; Lewison, R.; Bohnett, E. Deer survey from drone thermal imagery using enhanced faster R-CNN based on ResNets and FPN. Ecol. Inform.; 2024; 79, 102383. [DOI: https://dx.doi.org/10.1016/j.ecoinf.2023.102383]

16. Witczuk, J.; Pagacz, S.; Zmarz, A.; Cypel, M. Exploring the feasibility of unmanned aerial vehicles and thermal imaging for ungulate surveys in forests-preliminary results. Int. J. Remote Sens.; 2018; 39, pp. 5504-5521. [DOI: https://dx.doi.org/10.1080/01431161.2017.1390621]

17. Gooday, O.J.; Key, N.; Goldstien, S.; Zawar-Reza, P. An assessment of thermal-image acquisition with an unmanned aerial vehicle (UAV) for direct counts of coastal marine mammals ashore. J. Unmanned Veh. Syst.; 2018; 6, pp. 100-108. [DOI: https://dx.doi.org/10.1139/juvs-2016-0029]

18. Rees, A.F.; Avens, L.; Ballorain, K.; Bevan, E.; Broderick, A.C.; Carthy, R.R.; Christianen, M.J.; Duclos, G.; Heithaus, M.R.; Johnston, D.W. et al. The potential of unmanned aerial systems for sea turtle research and conservation: A review and future directions. Endanger. Species Res.; 2018; 35, pp. 81-100. [DOI: https://dx.doi.org/10.3354/esr00877]

19. Kays, R.; Sheppard, J.; Mclean, K.; Welch, C.; Paunescu, C.; Wang, V.; Kravit, G.; Crofoot, M. Hot monkey, cold reality: Surveying rainforest canopy mammals using drone-mounted thermal infrared sensors. Int. J. Remote Sens.; 2019; 40, pp. 407-419. [DOI: https://dx.doi.org/10.1080/01431161.2018.1523580]

20. Beranek, C.T.; Roff, A.; Denholm, B.; Howell, L.G.; Witt, R.R. Trialling a real-time drone detection and validation protocol for the koala (Phascolarctos cinereus). Aust. Mammal.; 2020; 43, pp. 260-264. [DOI: https://dx.doi.org/10.1071/AM20043]

21. Witt, R.R.; Beranek, C.T.; Howell, L.G.; Ryan, S.A.; Clulow, J.; Jordan, N.R.; Denholm, B.; Roff, A. Real-time drone derived thermal imagery outperforms traditional survey methods for an arboreal forest mammal. PLoS ONE; 2020; 15, e0242204. [DOI: https://dx.doi.org/10.1371/journal.pone.0242204]

22. Sudholz, A.; Denman, S.; Pople, T.; Brennan, M.; Amos, M.; Hamilton, G. A comparison of manual and automated detection of Rusa deer (Rusa timorensis) from RPAS-derived thermal imagery. Wildl. Res.; 2021; 49, pp. 46-53. [DOI: https://dx.doi.org/10.1071/WR20169]

23. Kassim, Y.M.; Byrne, M.E.; Burch, C.; Mote, K.; Hardin, J.; Larsen, D.R.; Palaniappan, K. Small object bird detection in infrared drone videos using mask R-CNN deep learning. Electron. Imaging; 2020; 2020, pp. 85-1-85-8. [DOI: https://dx.doi.org/10.2352/ISSN.2470-1173.2020.8.IMAWM-085]

24. Christiansen, P.; Steen, K.A.; Jørgensen, R.N.; Karstoft, H. Automated detection and recognition of wildlife using thermal cameras. Sensors; 2014; 14, pp. 13778-13793. [DOI: https://dx.doi.org/10.3390/s140813778]

25. Steen, K.A.; Villa-Henriksen, A.; Therkildsen, O.R.; Green, O. Automatic detection of animals in mowing operations using thermal cameras. Sensors; 2012; 12, pp. 7587-7597. [DOI: https://dx.doi.org/10.3390/s120607587] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22969362]

26. Longmore, S.; Collins, R.; Pfeifer, S.; Fox, S.; Mulero-Pázmány, M.; Bezombes, F.; Goodwin, A.; De Juan Ovelar, M.; Knapen, J.; Wich, S. Adapting astronomical source detection software to help detect animals in thermal images obtained by unmanned aerial systems. Int. J. Remote Sens.; 2017; 38, pp. 2623-2638. [DOI: https://dx.doi.org/10.1080/01431161.2017.1280639]

27. Chrétien, L.P.; Théau, J.; Ménard, P. Visible and thermal infrared remote sensing for the detection of white-tailed deer using an unmanned aerial system. Wildl. Soc. Bull.; 2016; 40, pp. 181-191. [DOI: https://dx.doi.org/10.1002/wsb.629]

28. Lhoest, S.; Linchant, J.; Quevauvillers, S.; Vermeulen, C.; Lejeune, P. How many hippos (HOMHIP): Algorithm for automatic counts of animals with infra-red thermal imagery from UAV. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.; 2015; 40, [DOI: https://dx.doi.org/10.5194/isprsarchives-XL-3-W3-355-2015]

29. Seymour, A.; Dale, J.; Hammill, M.; Halpin, P.; Johnston, D. Automated detection and enumeration of marine wildlife using unmanned aircraft systems (UAS) and thermal imagery. Sci. Rep.; 2017; 7, 45127. [DOI: https://dx.doi.org/10.1038/srep45127] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28338047]

30. Rey, N.; Volpi, M.; Joost, S.; Tuia, D. Detecting animals in African Savanna with UAVs and the crowds. Remote Sens. Environ.; 2017; 200, pp. 341-351. [DOI: https://dx.doi.org/10.1016/j.rse.2017.08.026]

31. Corcoran, E.; Winsen, M.; Sudholz, A.; Hamilton, G. Automated detection of wildlife using drones: Synthesis, opportunities and constraints. Methods Ecol. Evol.; 2021; 12, pp. 1103-1114. [DOI: https://dx.doi.org/10.1111/2041-210X.13581]

32. Lahoz-Monfort, J.J.; Magrath, M.J. A comprehensive overview of technologies for species and habitat monitoring and conservation. BioScience; 2021; 71, pp. 1038-1062. [DOI: https://dx.doi.org/10.1093/biosci/biab073] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34616236]

33. Ma, Z.; Dong, Y.; Xia, Y.; Xu, D.; Xu, F.; Chen, F. Wildlife Real-Time Detection in Complex Forest Scenes Based on YOLOv5s Deep Learning Network. Remote Sens.; 2024; 16, 1350. [DOI: https://dx.doi.org/10.3390/rs16081350]

34. Zhang, R.; Cao, Z.; Yang, S.; Si, L.; Sun, H.; Xu, L.; Sun, F. Cognition-Driven Structural Prior for Instance-Dependent Label Transition Matrix Estimation. IEEE Trans. Neural Netw. Learn. Syst.; 2024; [DOI: https://dx.doi.org/10.1109/TNNLS.2023.3347633] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38190682]

35. Petso, T.; Jamisola, R.S., Jr.; Mpoeleng, D. Review on methods used for wildlife species and individual identification. Eur. J. Wildl. Res.; 2022; 68, 3. [DOI: https://dx.doi.org/10.1007/s10344-021-01549-4]

36. Portmann, J.; Lynen, S.; Chli, M.; Siegwart, R. People detection and tracking from aerial thermal views. Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA); Hong Kong, China, 31 May–7 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1794-1800.

37. Teutsch, M.; Muller, T.; Huber, M.; Beyerer, J. Low resolution person detection with a moving thermal infrared camera by hot spot classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; Columbus, OH, USA, 23–28 June 2014; pp. 209-216.

38. Bondi, E.; Fang, F.; Hamilton, M.; Kar, D.; Dmello, D.; Choi, J.; Hannaford, R.; Iyer, A.; Joppa, L.; Tambe, M. et al. Spot poachers in action: Augmenting conservation drones with automatic detection in near real time. Proceedings of the AAAI Conference on Artificial Intelligence; New Orleans, LA, USA, 2–7 February 2018; Volume 32.

39. Corcoran, E.; Denman, S.; Hanger, J.; Wilson, B.; Hamilton, G. Automated detection of koalas using low-level aerial surveillance and machine learning. Sci. Rep.; 2019; 9, 3208. [DOI: https://dx.doi.org/10.1038/s41598-019-39917-5] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30824795]

40. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv; 2015; arXiv: 1506.01497[DOI: https://dx.doi.org/10.1109/TPAMI.2016.2577031]

41. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Honolulu, HI, USA, 21–26 July 2017; pp. 7263-7271.

42. Nguyen, N.D.; Do, T.; Ngo, T.D.; Le, D.D. An Evaluation of Deep Learning Methods for Small Object Detection. J. Electr. Comput. Eng.; 2020; 2020, 3189691. [DOI: https://dx.doi.org/10.1155/2020/3189691]

43. Egelhaaf, M.; Kern, R. Vision in flying insects. Curr. Opin. Neurobiol.; 2002; 12, pp. 699-706. [DOI: https://dx.doi.org/10.1016/S0959-4388(02)00390-2]

44. Srinivasan, M.V.; Zhang, S.; Reinhard, J.; Chahl, J. Vision, perception, navigation and ‘cognition’in honeybees and applications to aerial robotics. Biochem. Biophys. Res. Commun.; 2021; 564, pp. 4-17. [DOI: https://dx.doi.org/10.1016/j.bbrc.2020.09.052]

45. Egelhaaf, M.; Boeddeker, N.; Kern, R.; Kurtz, R.; Lindemann, J.P. Spatial vision in insects is facilitated by shaping the dynamics of visual input through behavioral action. Front. Neural Circuits; 2012; 6, 108. [DOI: https://dx.doi.org/10.3389/fncir.2012.00108]

46. Halupka, K.J.; Wiederman, S.D.; Cazzolato, B.S.; O’Carroll, D.C. Discrete implementation of biologically inspired image processing for target detection. Proceedings of the 2011 Seventh International Conference on Intelligent Sensors, Sensor Networks and Information Processing; Adelaide, Australia, 6–9 December 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 143-148.

47. Wiederman, S.D.; Brinkworth, R.S.; O’Carroll, D.C. Bio-inspired target detection in natural scenes: Optimal thresholds and ego-motion. Biosensing; 2008; 7035, 70350Z.

48. Serres, J.R.; Viollet, S. Insect-inspired vision for autonomous vehicles. Curr. Opin. Insect Sci.; 2018; 30, pp. 46-51. [DOI: https://dx.doi.org/10.1016/j.cois.2018.09.005] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30553484]

49. Brinkworth, R.S.; O’Carroll, D.C. Robust models for optic flow coding in natural scenes inspired by insect biology. PLoS Comput. Biol.; 2009; 5, e1000555. [DOI: https://dx.doi.org/10.1371/journal.pcbi.1000555] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19893631]

50. Griffiths, D.; Scoleri, T.; Brinkworth, R.S.; Finn, A. Pixel-wise infrared tone remapping for rapid adaptation to high dynamic range variations. Electro-Opt. Infrared Syst. Technol. Appl. XVI; 2019; 11159, 111590V.

51. Uzair, M.; Brinkworth, R.S.; Finn, A. Insect-inspired small moving target enhancement in infrared videos. Proceedings of the 2019 Digital Image Computing: Techniques and Applications (DICTA); Perth, Australia, 2–4 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1-8.

52. Poursoltan, S.; Brinkworth, R.; Sorell, M. Biologically-inspired pre-compression enhancement of video for forensic applications. Proceedings of the 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA); Sharjah, United Arab Emirates, 12–14 February 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1-6.

53. Koalas and Community: How Koalas Can Help us to Reduce Our Carbon Emissions, University of South Australia. 2022; Available online: https://www.unisa.edu.au/research/sustainable-infrastructure-and-resource-management/our-research/smart-communities/koalas-and-community--how-koalas-can-help-us-to-reduce-our-carbon-emissions/ (accessed on 10 October 2024).

54. Su, X.; Zhang, J.; Ma, Z.; Dong, Y.; Zi, J.; Xu, N.; Zhang, H.; Xu, F.; Chen, F. Identification of Rare Wildlife in the Field Environment Based on the Improved YOLOv5 Model. Remote Sens.; 2024; 16, 1535. [DOI: https://dx.doi.org/10.3390/rs16091535]

55. Gonzalez, L.F.; Montes, G.A.; Puig, E.; Johnson, S.; Mengersen, K.; Gaston, K.J. Unmanned aerial vehicles (UAVs) and artificial intelligence revolutionizing wildlife monitoring and conservation. Sensors; 2016; 16, 97. [DOI: https://dx.doi.org/10.3390/s16010097]

56. Hu, Q.; Zhang, L.; Drahota, J.; Woldt, W.; Varner, D.; Bishop, A.; LaGrange, T.; Neale, C.M.; Tang, Z. Combining Multi-View UAV Photogrammetry, Thermal Imaging, and Computer Vision Can Derive Cost-Effective Ecological Indicators for Habitat Assessment. Remote Sens.; 2024; 16, 1081. [DOI: https://dx.doi.org/10.3390/rs16061081]

57. Meynink, R.; Bourne, T. Koala Survey-UAV, Forest and Wood Products Australia. 2017; Available online: https://fwpa.com.au/wp-content/uploads/2018/09/Koala_UAV_Final_Report_VNC389-1516.pdf (accessed on 10 October 2021).

58. Winsen, M.; Denman, S.; Corcoran, E.; Hamilton, G. Automated Detection of Koalas with Deep Learning Ensembles. Remote Sens.; 2022; 14, 2432. [DOI: https://dx.doi.org/10.3390/rs14102432]

59. Infrared Camera ICI 8640-p-Series. Available online: https://infraredcameras.com/product/8640-p-series/ (accessed on 16 December 2023).

60. DJI Matrice 600. Available online: https://www.dji.com/au/matrice600/ (accessed on 29 November 2022).

61. Agisoft Metashape Software Professional Edition. Available online: https://www.agisoft.com/ (accessed on 8 October 2021).

62. Melville-Smith, A.; Finn, A.; Brinkworth, R.S. Enhanced micro target detection through local motion feedback in biologically inspired algorithms. Proceedings of the 2019 Digital Image Computing: Techniques and Applications (DICTA); Perth, Australia, 2–4 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1-8.

63. Brinkworth, R.S.; Mah, E.L.; Gray, J.P.; O’Carroll, D.C. Photoreceptor processing improves salience facilitating small target detection in cluttered scenes. J. Vis.; 2008; 8, 8. [DOI: https://dx.doi.org/10.1167/8.11.8]

64. van Hateren, J.H. A theory of maximizing sensory information. Biol. Cybern.; 1992; 68, pp. 23-29. [DOI: https://dx.doi.org/10.1007/BF00203134]

65. Van Hateren, J. Processing of natural time series of intensities by the visual system of the blowfly. Vis. Res.; 1997; 37, pp. 3407-3416. [DOI: https://dx.doi.org/10.1016/S0042-6989(97)00105-3]

66. Drews, M.S.; Leonhardt, A.; Pirogova, N.; Richter, F.G.; Schuetzenberger, A.; Braun, L.; Serbe, E.; Borst, A. Dynamic signal compression for robust motion vision in flies. Curr. Biol.; 2020; 30, pp. 209-221. [DOI: https://dx.doi.org/10.1016/j.cub.2019.10.035]

67. Srinivasan, M.V.; Pinter, R.; Osorio, D. Matched filtering in the visual system of the fly: Large monopolar cells of the lamina are optimized to detect moving edges and blobs. Proc. R. Soc. B Biol. Sci.; 1990; 240, pp. 279-293.

68. James, J.V.; Cazzolato, B.S.; Grainger, S.; Wiederman, S.D. Nonlinear, neuronal adaptation in insect vision models improves target discrimination within repetitively moving backgrounds. Bioinspir. Biomim.; 2021; 16, 066015. [DOI: https://dx.doi.org/10.1088/1748-3190/ac2988]

69. Wiederman, S.; Shoemaker, P.; O’carroll, D. Biologically inspired small target detection mechanisms. Proceedings of the 2007 3rd International Conference on Intelligent Sensors, Sensor Networks and Information; Melbourne, VIC, Australia, 3–6 December 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 269-273.

70. Wiederman, S.D.; Shoemaker, P.A.; O’Carroll, D.C. A model for the detection of moving targets in visual clutter inspired by insect physiology. PLoS ONE; 2008; 3, e2784. [DOI: https://dx.doi.org/10.1371/journal.pone.0002784]

71. Jansonius, N.; Van Hateren, J. Fast temporal adaptation of on-off units in the first optic chiasm of the blowfly. J. Comp. Physiol. A; 1991; 168, pp. 631-637. [DOI: https://dx.doi.org/10.1007/BF00224353]

72. Wiederman, S.D.; Shoemaker, P.A.; O’Carroll, D.C. Correlation between OFF and ON channels underlies dark target selectivity in an insect visual system. J. Neurosci.; 2013; 33, pp. 13225-13232. [DOI: https://dx.doi.org/10.1523/JNEUROSCI.1277-13.2013]

73. Nordström, K.; O’Carroll, D.C. Small object detection neurons in female hoverflies. Proc. R. Soc. B Biol. Sci.; 2006; 273, pp. 1211-1216. [DOI: https://dx.doi.org/10.1098/rspb.2005.3424]

74. Melville-Smith, A.; Finn, A.; Uzair, M.; Brinkworth, R.S. Exploration of motion inhibition for the suppression of false positives in biologically inspired small target detection algorithms from a moving platform. Biol. Cybern.; 2022; 116, pp. 661-685. [DOI: https://dx.doi.org/10.1007/s00422-022-00950-9]

75. Al-Shimaysawee, L.A.H. Automated Detection and Geolocation of Objects in Low Contrast and High Clutter Environments Using Biologically Inspired Vision Techniques. Ph.D. Thesis; University of South Australia: Adelaide, Australia, 2023.

76. Horn, R.A. The hadamard product. Proc. Symp. Appl. Math.; 1990; 40, pp. 87-169.

77. Deng, H.; Sun, X.; Liu, M.; Ye, C.; Zhou, X. Infrared small-target detection using multiscale gray difference weighted image entropy. IEEE Trans. Aerosp. Electron. Syst.; 2016; 52, pp. 60-72. [DOI: https://dx.doi.org/10.1109/TAES.2015.140878]

78. Aghaziyarati, S.; Moradi, S.; Talebi, H. Small infrared target detection using absolute average difference weighted by cumulative directional derivatives. Infrared Phys. Technol.; 2019; 101, pp. 78-87. [DOI: https://dx.doi.org/10.1016/j.infrared.2019.06.003]

79. Shi, Y.; Wei, Y.; Yao, H.; Pan, D.; Xiao, G. High-boost-based multiscale local contrast measure for infrared small target detection. IEEE Geosci. Remote Sens. Lett.; 2017; 15, pp. 33-37. [DOI: https://dx.doi.org/10.1109/LGRS.2017.2772030]

80. Han, J.; Ma, Y.; Zhou, B.; Fan, F.; Liang, K.; Fang, Y. A robust infrared small target detection algorithm based on human visual system. IEEE Geosci. Remote Sens. Lett.; 2014; 11, pp. 2168-2172.

81. Chen, C.P.; Li, H.; Wei, Y.; Xia, T.; Tang, Y.Y. A local contrast method for small infrared target detection. IEEE Trans. Geosci. Remote Sens.; 2013; 52, pp. 574-581. [DOI: https://dx.doi.org/10.1109/TGRS.2013.2242477]

82. Wei, Y.; You, X.; Li, H. Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognit.; 2016; 58, pp. 216-226. [DOI: https://dx.doi.org/10.1016/j.patcog.2016.04.002]

83. Peng, J.; Wang, D.; Liao, X.; Shao, Q.; Sun, Z.; Yue, H.; Ye, H. Wild animal survey using UAS imagery and deep learning: Modified Faster R-CNN for kiang detection in Tibetan Plateau. ISPRS J. Photogramm. Remote Sens.; 2020; 169, pp. 364-376. [DOI: https://dx.doi.org/10.1016/j.isprsjprs.2020.08.026]

84. Wei, R.; He, N.; Lu, K. YOLO-mini-tiger: Amur Tiger Detection. Proceedings of the 2020 International Conference on Multimedia Retrieval; Dublin, Ireland, 8–11 June 2020; pp. 517-524.

85. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv; 2014; arXiv: 1409.1556

86. Pham, P.; Nguyen, D.; Do, T.; Ngo, T.D.; Le, D.D. Evaluation of deep models for real-time small object detection. Proceedings of the International Conference on Neural Information Processing; Guangzhou, China, 14–18 November 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 516-526.

87. Qi, M.; Zhu, B.; Wang, H.; Xie, B.; Xiang, F.; Cheng, Z. Evaluation of real-time object detection model based on small targets. Proceedings of the 9th International Symposium on Advanced Optical Manufacturing and Testing Technologies: Optoelectronic Materials and Devices for Sensing and Imaging; Chengdu, China, 26–29 June 2018; International Society for Optics and Photonics: Bellingham, WA USA, 2019; Volume 10843, 108430M.

88. Goyette, N.; Jodoin, P.M.; Porikli, F.; Konrad, J.; Ishwar, P. Changedetection. net: A new change detection benchmark dataset. Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1-8.

89. Unity Engine Software. Available online: https://unity.com/ (accessed on 13 October 2021).

90. Baxter, P.W.; Hamilton, G. Learning to fly: Integrating spatial ecology with unmanned aerial vehicle surveys. Ecosphere; 2018; 9, e02194. [DOI: https://dx.doi.org/10.1002/ecs2.2194]

Word count: 16142

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Effective detection techniques are important for wildlife monitoring and conservation applications and are especially helpful for species that live in complex environments, such as arboreal animals like koalas (Phascolarctos cinereus). The implementation of infrared cameras and drones has demonstrated encouraging outcomes, regardless of whether the detection was performed by human observers or automated algorithms. In the case of koala detection in eucalyptus plantations, there is a risk to spotters during forestry operations. In addition, fatigue and tedium associated with the difficult and repetitive task of checking every tree means automated detection options are particularly desirable. However, obtaining high detection rates with minimal false alarms remains a challenging task, particularly when there is low contrast between the animals and their surroundings. Koalas are also small and often partially or fully occluded by canopy, tree stems, or branches, or the background is highly complex. Biologically inspired vision systems are known for their superior ability in suppressing clutter and enhancing the contrast of dim objects of interest against their surroundings. This paper introduces a biologically inspired detection algorithm to locate koalas in eucalyptus plantations and evaluates its performance against ten other detection techniques, including both image processing and neural-network-based approaches. The nature of koala occlusion by canopy cover in these plantations was also examined using a combination of simulated and real data. The results show that the biologically inspired approach significantly outperformed the competing neural-network- and computer-vision-based approaches by over 27%. The analysis of simulated and real data shows that koala occlusion by tree stems and canopy can have a significant impact on the potential detection of koalas, with koalas being fully occluded in up to 40% of images in which koalas were known to be present. Our analysis shows the koala’s heat signature is more likely to be occluded when it is close to the centre of the image (i.e., it is directly under a drone) and less likely to be occluded off the zenith. This has implications for flight considerations. This paper also describes a new accurate ground-truth dataset of aerial high-dynamic-range infrared imagery containing instances of koala heat signatures. This dataset is made publicly available to support the research community.

Details

Title

Evaluation of Automated Object-Detection Algorithms for Koala Detection in Infrared Aerial Imagery

Author

Al-Shimaysawee, Laith A H¹

; Finn, Anthony¹

; Weber, Delene¹

; Schebella, Morgan F¹

; Brinkworth, Russell S A²

¹ UniSA STEM, University of South Australia, Mawson Lakes, SA 5095, Australia; [email protected] (A.F.); [email protected] (D.W.); [email protected] (M.F.S.)
² College of Science and Engineering, Flinders University, Tonsley, SA 5042, Australia; [email protected]

First page

7048

Publication year

2024

Publication date

2024

Publisher

MDPI AG

e-ISSN

14248220

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/s24217048

ProQuest document ID

3126287527

Evaluation of Automated Object-Detection Algorithms for Koala Detection in Infrared Aerial Imagery

Jump to:

Full text

2. Part 1: Evaluation of Several Detection Techniques on Koala Detection

2.1.1. Camera and Drone

2.1.2. Survey Site and Real-Data Collection

2.1.3. Comparative Methods

2.1.4. Experimental Settings

2.1.5. Performance Metrics

3. Part 2: Effects of Environmental and Flight Parameters on Koala Detection

3.1.1. Real-Data Analysis

3.1.2. Simulation Analysis

Abstract

Details

Suggested sources