Dynamic Camera Reconfiguration with Reinforcement

Full text

Turn on search term navigation

1. Introduction

Camera networks for surveillance applications play a key role to ensure safety of public gatherings [1,2,3,4]. Security applications in crowded scenarios have to deal with a variety of factors which can lead to critical situations [5,6,7]. In such scenarios, a camera network must be able to record local events as well as to ensure a global coverage of the area of interest [8].

Ensuring both coverage of the whole monitoring area and a good video quality of moving individuals is challenging using non-reconfigurable (fixed) cameras [5,9]. An high number of fixed cameras would provide the required coverage of the scene, but at a high cost. Moreover, fixed cameras, especially the ones with a large field of view (FoV) or a fisheye lens, would also capture areas of the scene where pedestrians are not present, thus creating an excessive amount of irrelevant data.

Reconfigurable cameras can dynamically adapt their parameters, such as FoV, resolution and position. For example, pan-tilt-zoom (PTZ) cameras and cameras mounted on unmanned aerial vehicles (UAVs) can dynamically adapt their position and FoV. Such cameras allow to greatly reduce the number of devices in the network while optimising coverage and target resolution given the current state of the crowded scene. The goal is to ensure a good resolution for common tasks such as face recognition in critical areas, while providing a sufficient video quality in others. UAVs have been particularly studied as a flexible and effective system for crowd gatherings surveillance in recent years [10,11]. Reinforcement learning approaches have shown great potential for distributed camera networks optimisation [2,12,13,14,15]. However, they have not been applied to the dynamic coverage of crowded scenes.

In [9], we proposed a greedy approach to control the trade-off between covering the widest possible area of the area of interest (global coverage) and focusing on the most crowded parts of the scene (people coverage). In our previous work, we proposed a decentralised greedy empirical approach, where each camera aims at optimising the coverage performance in its local neighbourhood. In this paper, we introduce a novel decentralised approach based on reinforcement learning (RL) which allows every camera to learn how to optimise the coverage performances. Both approaches rely on the estimation of the state of the crowd by merging the observations from individual cameras at a global level while each camera locally decides on its next state. Both RL and greedy approaches allow the cooperative use of fixed, PTZ and UAV-mounted cameras which can track and survey a crowd relying only on cooperation and map sharing, without using classical tracking-by-detection algorithms.

Our approach aims at guaranteeing the best possible coverage of the scene, exploiting the trade-off between global coverage and people coverage. For this goal, we employ different cameras, namely, fixed cameras, PTZ and UAV-based cameras, which have different features and capabilities. Using multiple heterogeneous cameras enriches the coverage of an area of interest by providing different points of view and possible camera configurations, thus increasing the reliability of the collected data. Being able to reconfigure camera parameters, such as position and FoV, allows our network to seamlessly work in both static and dynamic scenarios in which people move continuously in the environment.

Our contribution can be summarised as (1) a policy to trade-off between global coverage and people coverage, which can be fine-tuned for different camera types, (2) a new metric to evaluate the performances of the surveillance task, (3) a greedy framework to track the crowd flow based on a cooperative approach, (4) a distributed machine learning framework based on reinforcement learning (RL) for covering crowded areas and (5) a 3D simulator of crowd behaviours based on [16] and heterogeneous camera networks.

The remainder of this paper is organized as follows: Section 2 discusses related-work papers. Section 3 describes our greedy approach for camera reconfiguration. Section 3.7 introduces the evaluation metrics and Section 3.8 discusses our RL-based approach. Section 4 presents the results of our simulation study, and Section 5 provides some concluding remarks together with a discussion about potential future work.

2. Related Work

Cooperative video surveillance research has been developed to drastically reduce human supervision [17,18,19]. This is implemented by allowing cooperative cameras to share real-time information among them in order to capture events and to guarantee global coverage of the area of interest [1,2,3]. When observing a crowded scenario, the state of the scene evolves dynamically and the camera network should be able to reconfigure and cover events as they happen. Due to their nature, events generated by moving pedestrians are unique and often can not be reproduced, thus making it difficult to test and evaluate different camera network configurations and policies.

Leveraging on simulators and virtual environments can be an effective tool to deal with these limitations. Virtualisation paradigms have been exploited both in camera surveillance [5,6] and crowd analysis [9,20].

In camera surveillance, fixed cameras can be used together with reconfigurable cameras such as UAV-based and PTZ cameras [5,6,21]. PTZs can dynamically set their parameter to optimise the coverage of areas of interest, progressively scanning a wide area or zooming in on events of interest. These cameras have been particularly employed to cooperatively track pedestrians, for example [21,22,23,24].

UAVs have been employed for civil and military tasks, such as environmental pollution monitoring, agriculture monitoring and management of natural disaster rescue operations [25,26,27]. Military applications also involve surveillance, but their use in common crowd surveillance scenarios is limited because of regulations.

In [28], the key features of a distributed network for crowd surveillance are (1) locating and re-identifying a pedestrian across multiple cameras, (2) tracking people, (3) recognising and detecting local and global crowd behaviour, (4) clustering and recognising actions and (5) detecting abnormal behaviours. To achieve these features, the following issues need to be tackled: how to fuse information coming from multiple cameras, performing crowd behaviour analysis tasks, how to learn crowd behaviour patterns and how to cover an area with particular focus on key events.

Reinforcement learning approaches [29] have been applied to distributed systems in the context of surveillance for different purposes. Hatanaka et al. [12] investigated the optimal theoretical coverage that a network of PTZ cameras can achieve in an unknown environment. In [2,13], online tracking applications using reinforcement learning are shown to outperform static heterogeneous cameras configuration. Khan et al. [14] employed reinforcement learning for resource management and power consumption optimisation in distributed cameras system. In [15], dynamic alignment of PTZ cameras is exploited to learn coverage optimisation. Although RL has demonstrated its effectiveness in camera networks, dynamic coverage of crowded scenes using UAVs has not been tackled yet.

Recently, Altahir et al. [7] solved the camera placement problem with predefined risk maps which have an higher priority to be covered. In [30], a distributed Particle Swarm Optimisation (PSO) is employed to maximise the geometric coverage of the scene. Vejdanparast et al. [31] focused on the best zoom level selection for redundant coverage of risky areas using a distributed camera network.

3. Method In this section, we introduce the key elements of our method. First, the observation model for the environment establishes a relation between the observation and its confidence. Next, camera types and features are described in detail. Finally we describe how the greedy reconfiguration policy and the RL-based approach exploit the network-wide trade-off between global coverage and crowd resolution. 3.1. Observation Model

The region of interest C, which has to be surveyed is divided into a uniform grid ofI×Jsquare cells, where the indexesi∈{1,2,…,I−1}andj∈{1,2,…,J−1}of each cell_ci,j∈Crepresent the position of the cell in the grid. We assume a scenario evolving at discrete time stepst=0,1,2,⋯,_tend. At each time step, the network is able to gather the observation over the scene to be monitored, process it and share it with the other camera nodes. Given the observation, each camera is able to compute its next position. For this purpose, we define

an observation vector_Oi,j, which represents the number of pedestrians detected for each cell_ci,j∈C;
a spatial confidence vector_Si,j, which describes the confidence of the measures for each cell_ci,j∈C. Our spatial confidence depends only on the relative geometric position of the observing camera and the observed cell;
a temporal confidence vector_Li,jt, which depends on the time passed since the cell has last been observed; and
an overall confidence vector_Fi,jt, which depends on the temporal and spatial confidences.

The observations vector is defined as

_Oi,j={_o1,1,_o1,2,⋯,_oi,j,⋯,_oI,J}.

The value_oi,jfor each cell_ci,jis given as

_oi,j=pedPE_Dmaxifped≤PE_Dmax1ifped>PE_Dmax

wherepedis the current number of pedestrians in a cell andPE_Dmaxis the threshold for the number of pedestrians, above which a cell is considered as crowded.PE_Dmaxcan be manually tuned depending on the application. Crowded cells should be monitored with a higher resolution.

Occlusion of targets is one of the main challenges in crowded scenarios. We assume that our camera network is able to robustly detect a pedestrian when its head is captured with a resolution of at least24×24 pixels, which is in line with the smaller bound for common face detection algorithms [32].

For each cell, a spatial confidence vector is defined as

_Si,j={_s1,1,_s1,2,⋯,_si,j,⋯,_sI,J}

where the value0<_si,j≤1is bounded and decreases as the distance between the observing camera and the cell of interest_ci,jincreases. The actual value of the spatial confidence_si,j in a given cell depends on the type of observing camera and is described in Section 3.2.

Similarly, a temporal confidence vector is defined as

_Li,j={_l1,1t,_l1,2t,⋯,_li,jt,⋯,_lI,Jt}.

Each value_li,jtis defined as

_li,jt=1−t−_ti,j0_TMAXift−_ti,j0≤_TMAX0ift−_ti,j0>_TMAX

where_ti,j0is the most recent time instant, in which cell_ci,jwas observed and_TMAXrepresents the time instant, after which the confidence drops to zero. The value_li,jtdecays over time if no new observation_oi,jon cell_ci,jbecomes available.

Given the spatial and temporal confidence metrics, the overall confidence vector is defined as

^Ft={_f1,1t,_f1,2t,⋯,_fi,jt,⋯,_fI,Jt}

with

_fi,jt=_si,j*_li,jt.

Thus, for each cell_ci,jwe have an observation_oi,jwith an overall confidence_fi,jt. The confidence value varies between 0 and 1, where 1 represents the highest possible confidence. If a sufficient number of cameras is available for covering all cells concurrently, the overall confidence vector is given as^FI={1,⋯,1}.

3.2. Camera Models We briefly describe the models adopted for the three different camera types: fixed cameras, PTZ cameras and UAV-based cameras. We assume that all fixed and PTZ cameras are mounted at a fixed height. For the same reason, UAV-based cameras fly at a fixed altitude, which also helps in reducing the computational complexity of the problem.

3.2.1. Fixed Cameras

Fixed cameras (see Figure 1a) provide a confidence matrix, which gradually decreases as the distance from the camera increases. Being(x,y)a point in the space at a distance d from a fixed camera, the value of the spatial confidences(x,y)is defined as

s(x,y)=−1_dmax*d+1ifd<_dmax0ifd≥_dmax

where_dmaxis the distance from the camera, over which the spatial confidence is zero. Thus, the confidence value_si,jof cell_ci,jis defined as

_si,j=max_{{s(x,y)}∀(x,y)∈_ci,j}.

3.2.2. PTZ Cameras

PTZ cameras are modeled similarly to fixed cameras, with the additional capability to dynamically change the FoV (see Figure 1c).

PTZ cameras are able to pan-tilt and zoom between 9 different configurations and cover an area of 180° as shown in Figure 1c,d.

Figure 1c shows how a PTZ camera can achieve different configurations using only the pan movement along the horizontal axis. Each confidence map is defined as the one of a fixed camera. In Figure 1d, the camera is able to zoom on an area further away from the camera, which causes 3 effects: the FOV decreases, the confidence in the zoomed area increases and the confidence in other areas decreases. Let(x,y)represent a point in the scene at distance d from a fixed camera, then the value of the spatial confidence for a PTZ camera while zoomings(x,y)is defined as

s(x,y)=0ifd<_d0−1_dmax−_d0*d+_dmax_dmax−_d0if_d0≤d<_dmax0ifd≥_dmax

where_dmaxis the distance from the camera over which we have 0 spatial confidence and_d0the closest distance captured in the FOV.

3.2.3. UAV-Based Cameras

For UAV-based cameras, the FoV projection on the ground plane is different with respect to the previous models, as shown in Figure 1b. The spatial confidence of point(x,y)at a distance d from the UAV is computed as

s(x,y)=−1_duav*d+1ifd<_duav0ifd≥_duav

where_duavis the distance after which the confidence on the observation drops below a threshold g over which we consider the observation reliable.

3.3. Reconfiguration Objective The objective of the heterogeneous camera network is to guarantee the coverage of the scene focusing on densely populated areas. The priority metric defines the importance of each cell to be observed. A high value indicates that the cell is crowded or that we have a low confidence on its current state, thus requiring an action.

In order to formalise the reconfiguration objective, a priority vector P is defined as

^Pt={_p1,1t,_p1,2t,⋯,_pi,jt,⋯,_pI,Jt}.

The priority for each cell is defined as

_pi,jt=α*_oi,jt+(1−α)_fi,jI

where0≤α≤1represents a weighting factor to tune the configuration and_fi,jIrepresents the predefined ideal confidence for the cell.

The objective G of each camera is to minimise the distance between the confidence vector and the priority vector

G=min{||^Ft+1−^Pt||}

whereαcan vary between 0 and 1

min{^Ft+1−^FI}ifα=0min{^Ft+1−^Ot}ifα=1.

Settingα=1causes the network to focus on observing densely populated areas only, with no incentive to explore unknown cells. In contrast,α=0causes the network to focus on global coverage only, without distinguishing on the crowd density of the cells.

3.4. Reconfiguration Objectives: Custom Policies

The policy presented in [9] and reported in Section 3.3 suffers from two main limitations:

The reconfiguration objectives are the same for the different camera types, namely UAVs and PTZs. In the real world, UAVs have a higher cost of deployment and movement with respect to PTZs, while they provide more degrees of freedom for their reconfigurability.
The priority maps do not share information about camera type and position between different cameras. Especially in the case of UAVs, this can lead to a superposition of different cameras, which decrease the network performances.

We propose two approaches to tackle these limitations.

The first approach, called split priority, is to use different priority vectors for different types of cameras, namely UAVs and PTZs. This allows to use different values ofαfor UAVs and PTZs, thus allowing for different functionalities, such as ensuring a better coverage with UAVs, while the PTZs can focus on target areas, or vice versa. The two priority vectors_PPTZtand_PUAVtare defined as:

_PPTZt=_αPTZ·^Ot+(1−_αPTZ)(1−^FI)

and

_PUAVt=_αUAV·^Ot+(1−_αUAV)(1−^FI).

This second approach, called position-aware UAVs, aims at solving the superposition issue which comes from the different UAVs not being aware of each other’s position. The vector_PUAVtis modified as follows

_PUAVt=_αUAV·^Ot+(1−_αUAV)(1−^FI)+^Ut

where^Utis a position vector containing a value_uijfor each cell, such that_uijcan take on two values:

_ui,j=0if∄UAVin(i,j)−1if∃UAVin(i,j).

By doing so, the cell priority is kept low whenever there is a UAV, thus penalizing the locations where other UAVs are present. In order not to penalize its current position, each UAV (UA_Vk) updates its priority vector_{pUA_Vk−i,jt}by recovering its contribution to^Utby adding 1 to its current position:

_{pUA_Vk−i,jt}=_pUAV−i,jt+1if∃UA_Vkin(i,j)_pUAV−i,jtotherwise.

The last operation is that every UAV normalises its priority in the range [0; 1] from the range [−1 ; 1] so that it is compatible with the cost function in Equation (14) to be minimised.

3.5. Update Function

At each time step t, the network has knowledge about the current observation vector^Ot, the spatial confidence vector^St, the temporal confidence vector^Ltand the overall confidence vector^Ft. In order to progress to the next time stept+1, an update function for these vectors is required.

The temporary spatial confidence vector_Stempt+1is determined by the geometry of cameras at timet+1. For each cell, the value_{s_tempi,jt+1}is the maximum spatial confidence value of all cameras observing the cell(i,j). Cells that are not covered by any camera have a spatial confidence value equal to 0.

We estimate the temporal confidence vector as follows:_Ltimet+1 is computed by applying Equation (5) to each element of^Lt. Another temporary temporal confidence vector_Lnewt+1is computed by setting to 1 all cells currently observed, and setting to 0 all the others. With the estimated temporal and spatial vectors, we compute two estimations of the overall confidence vector:

_Ftimet+1=^St*_Ltimet+1

and

_Fnewt+1=_Stempt+1*_Lnewt+1.

The new overall confidence vector is then computed as:

^Ft+1=max_{{_Fnewt+1,_Ftimet+1}∀(i,j)}.

For each cell(i,j)in which_fnewt+1>_ftimet+1, we also need to update the last time the cell has been observed^t0(i,j)=t+1and the corresponding observation vector^ot(i,j).

3.6. Local Camera Decision: Greedy Approach

In our approach, all the information vectors described in Section 3.1 are shared and known to all cameras. Each camera locally decides its next position using a greedy approach to minimise the cost defined in Equation (14) in its neighbourhood.

At each time step, each PTZ and UAV-based camera select a neighbourhood that can be explored. The UAV’s neighbourhood is defined as a square centered at the cell where the drone is currently positioned (see Figure 1b). The PTZ neighbourhood is a rectangle, which covers the space in front of the camera, as shown in Figure 1c.

For each cell in the neighbourhood, we center a window W of size_Nw×_Nwon each cell_cW∈Wand we store in the cell the value:

_cW=∑||_fi,jt+1−_pi,jt||.

The UAV will then move towards the cell in its neighbourhood with the largest_cW, and the PTZ steers its FOV to be centred on that cell. If two or more cells have the same value of_cW, the camera selects one of them randomly.

3.7. Evaluation Metrics

We define the Global Coverage Metric (GCM) for evaluating the network coverage capability as

GCM(t)=∑∀_ci,j|_fi,jt>g1I*J

with g being the threshold over which the cell is considered to be covered. We then average the results for the whole duration of the observation as

GC_Mavg=∑t=0,⋯,_tendGCM(t)_tend+1.

We define the people coverage metric (PCM) for evaluating the network capability to cover pedestrian in the scene as:

PC_Mtot=∑∀person∈_ci,j|_fi,jt>p1totalPeople

with p being the threshold over which the person is considered to be covered.

3.8. Reinforcement Learning On the one hand, an approach based on reinforcement learning presents a few advantages with respect to a greedy approach, such as better performance and the ability to have longer-term planning since the decision of each agent does not depend only on the last observation but also from past observations. On the other hand, reinforcement learning requires a training phase which is not needed in case of an empirical greedy approach.

Our novel reinforcement learning approach is based on a set of UAV-based cameras. We focus on UAV-based cameras, being a very challenging scenario, since a high number of degrees of freedom are involved. Using our predefined observation and priority models (Section 3.1 and Section 3.4), we control each UAV-based camera using an RL agent, which replaces the greedy approach for local camera decision from Section 3.6 in their local decision-making process.

We rely on the vanilla ML Agents reinforcement learning network provided by [33] for our deployment with UAVs. We use Soft Actor Critic (SAC) [34] as the backbone of our RL method. We define (see Figure 2):

a set of statesS, which encode the local visual observation of each UAV,
a set of possible actionsAthat each UAV can choose to perform at the next time step and
a set of rewardsR, which depend on the observation vector^Otand its related confidence^Ft.

At each timestep t, the agent is provided with a visual observation embedded in_St∈S , as shown in Figure 3. The visual observation consists of a texture containing a visualisation of the priority vector^Pt, centered on the drone position with size11×11cells. The visual observation is embedded in the state vector_Stof each agent’s internal neural network. Each pixel and colour channel of the visual information is normalised to the range[0−0.1]. Based of the state_St, the agent selects an action_At∈A._Atis composed of all possible positions the drone can travel to in the observed window_St. With the state-action pair(_St,_At)the time t is incremented tot+1, the environment is transitioned to a new state_St+1∈Sand a reward_Rt+1∈Ris provided to the agent. Our reward is computed as

Reward=(α−1)·ΔGC_Mt+αPC_Mt

whereα can be set at training time to obtain the same effect described in Section 3.3. The two metrics are defined as

ΔGC_Mt=GCM(t)−GCM(t−1)

and

PC_Mt=∑∀person∈_ci,j|_fi,jt>p1totalCurrentPeople

which is the instantaneous people coverage metric.

For training, we set_Tmax=1sand execute each episode for 50 timesteps such that the drone can experience loss of coverage early and improve on it. An episode is completed if the whole map is covered or if the timestep limit has been reached.

4. Experimental Results

For the experiments, we define an environment of size60×60m². The scene is square-shaped, exhibiting people passing by cars and vegetation. Pedestrians can enter and exit the scene from any point around the square. Each cell_ci,jis a square of1×1m² . In this environment, 2 fixed cameras, 2 UAVs and 2 PTZs are positioned as shown in Figure 4a. Sample images of the environment from a PTZ and a UAV-based camera are shown in Figure 4b,c, respectively. For our experiments, we simulate the movement of 400 pedestrians crossing the scene with the following parameters:

_Tmax=3s
PE_Dmax=2
_dmax=10m
fixed and PTZ cameras height=5m
UAV-based cameras height=7m

4.1. Quantitative Results

In this section, we present the quantitative results obtained with our 4 different approaches (greedy, split priority, position aware and RL-based) in the simulated environment. The goal is to evaluate the capabilities of the system to survey a crowded scene using the metrics defined in Section 3.7. We run 33 different simulation experiments with varying values of g, p andα.

The same simulation setup (initial cameras positions and number of pedestrian in the scene) is used to evaluate the 4 different approaches: greedy approach (experiments(1–6), Table 1), split priority and position aware approaches (experiments(10–18), Table 2) and reinforcement learning based approach (experiments(19–24), Table 3). Experiments (7–9) display a single group of 10 pedestrians moving across the map and it is used to show the ability of our approach to track people in the scene.

The values g and p indicate the threshold above which we consider an observation reliable in time and space, respectively. A threshold of0.2indicates that our observation is at most2.4seconds old, when taken with a spatial confidence equal to 1. A threshold of0.01represents the cells and pedestrians about which we have a minimum level of information.

As a baseline approach, we assume that all 6 cameras are not able to change their configurations. Doing so, they are able to cover 6 % of the entire area withg=0.2and 12 % withg=0.01.

Table 1 summarises the results obtained using our greedy approach [9]. In experiments (3) and (6),αis set to 1, causing our camera network to focus only on observing pedestrians with no incentive to explore new areas in the environment. In experiments (1) and (4),αis set to 0 resulting in maximizing the coverage regardless of the position of pedestrians. In experiments (2) and (5),αis set to 0.5 aiming for balancing coverage and pedestrian tracking in crowded areas. We can observe that in experiments (1) and (4) we obtain the lowest values of GCM, which is expected since we are focusing on pedestrians. We also achieve the lowest scores in terms of PCM because cameras have no incentive in exploring new areas.

Experiments (7–9) are conducted using a directional crowd (Figure 4b). When the network focuses only on observation in (9), it obtains the best results in terms of PCM and the worst one in terms of global coverage GCM. As expected, we obtain the best results in terms of coverage of the environment (GCM) in experiments (3) and (6). Since the crowd is uniformly distributed in the space, we also obtain the best results in terms of PCM. In experiments (2) and (5), the network combines global coverage and crowd monitoring, the system under performs compared with the scenes whereα=0andα=1.

Table 2 summarises the results obtained using our split priority approach. Splitting the priority for different types of cameras shows how UAVs have a key role when they are allowed to focus on the global observation of the scene (experiments (10–12)). Otherwise, the performances of the whole network decreases (experiments (13–18)).

Both the greedy and split priority methods experience a decrease in performances when they have to focus on observing the more densely populated areas. When_αUAV=1, the UAVs tend to overlap and cover the same zone with a loss in the overall performance as shown in experiments (13–18).

To fix this issue, we developed the position-aware method, which results are reported in Table 2. With this methodology, which includes the knowledge of the UAVs position, the performance improves. The influence on the GCM with_αUAV=0is almost negligible, while for greater values the improvement is clearly visible in both metrics (experiments (10–18)).

With this methodology the problem of overlapping UAVs is solved and this leads to performance improvements, as UAVs collect information in different regions.

In Table 3, we report the results obtained using our RL-based approach. Our approach (experiments (19–21)) is able to outperform the greedy approach (experiments (1–3)) when parameters g and p are set to 0.2. This method is thus more effective in long-term scenarios, when the temporal decay of the observations is slower and allows for longer-term planning. On the other hand, when g and p are set to 0.01, the greedy approach (experiments (4–6)) is more effective when cameras reconfigure rapidly and with a lower confidence threshold with respect to the the RL-based approach (experiments (22–24)).

4.2. Qualitative Results In this section, we present the qualitative results obtained with our model in the simulated environment. The goal is to demonstrate how our system is able to follow the crowd relying only on detection of pedestrians in still frames rather then on classical tracking algorithms.

For this purposes, we simulate a single group of five pedestrians crossing the scene from bottom left to top right, as shown in the sequence depicted in Figure 5. The UAV is able to closely follow the pedestrians in the environment, scoring aPCM=70.4%andGCM=3.2% , as shown in Figure 6. Figure 7 shows how observation priority and confidences maps are updated over time in order to guide the UAV in the tracking scenario.

5. Conclusions In this paper, we have presented two camera reconfiguration approaches for crowd monitoring, a greedy camera approach and a RL-based one for UAV-mounted cameras. Our methods allow heterogeneous camera networks to focus on high target resolution or wide coverage. Although based on simplified assumptions for camera modelling and control, our approach is able to trade off coverage and resolution of the network in a resource-efficient way. We have demonstrated how different cameras can be used in different manners to optimise the effectiveness of our method. In future work, we aim at testing our approach in the real world to show it potential development. Moreover, more camera features will be modelled in our framework, such as UAVs limited time of flight.

ID	g and p	α	GCM	PCM
1	0.2	0	12.4%	17.4%
2	0.2	0.5	14.3%	20.5%
3	0.2	1	10.4%	13.5%
4	0.01	0	42.9%	47.6%
5	0.01	0.5	30.3%	33.1%
6	0.01	1	22.9%	28.2%
7	0.01	0	43.1%	45.6%
8	0.01	0.5	28.7%	54.4%
9	0.01	1	26.1%	61.2%

				Split Priority		Position Aware
ID	g and p	ffPTZ	ffUAV	GCM	PCM	GCM	PCM
10	0.2	0	0	15.6%	18.8%	15.5%	20.3%
11	0.2	0.5	0	16.7%	18.8%	16.7%	19.1%
12	0.2	1	0	16.8%	18.5%	16.6%	20.6%
13	0.2	0	0.5	11.3%	14.4%	15.5%	20.7%
14	0.2	0.5	0.5	11.5%	14.3%	16.7%	21.8%
15	0.2	1	0.5	11.5%	12.0%	16.5%	21.2%
16	0.2	0	1	11.3%	11.6%	15.5%	20.4%
17	0.2	0.5	1	11.5%	14.0%	16.3%	19.1%
18	0.2	1	1	11.5%	11.2%	16.1%	20.4%

ID	g and p	α	GCM	PCM
19	0.2	0	14.2±0.1%	12.2±0.2%
20	0.2	0.5	14.7±0.3%	13.6±0.5%
21	0.2	1	11.7±0.5%	13.0±0.9%
22	0.01	0	26.1±2.4%	25.23±4.2%
23	0.01	0.5	26.5±1.1%	24.0±2.0%
24	0.01	1	24.4±0.9%	20.8±1.1%

Author Contributions

Conceptualization, N.B. and B.R.; methodology, N.B.; software, A.X. and N.B.; validation, A.X. and N.B.; formal analysis, N.C.; investigation, N.C.; resources, N.C. and B.R.; writing-original draft preparation, N.B. and A.X.; writing-review and editing, N.C. and B.R.; visualization, N.B.; supervision, F.D.N. and B.R.; project administration, F.D.N.; funding acquisition, F.D.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Word count: 5879

Show less

© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Crowd surveillance plays a key role to ensure safety and security in public areas. Surveillance systems traditionally rely on fixed camera networks, which suffer from limitations, as coverage of the monitored area, video resolution and analytic performance. On the other hand, a smart camera network provides the ability to reconfigure the sensing infrastructure by incorporating active devices such as pan-tilt-zoom (PTZ) cameras and UAV-based cameras, thus enabling the network to adapt over time to changes in the scene. We propose a new decentralised approach for network reconfiguration, where each camera dynamically adapts its parameters and position to optimise scene coverage. Two policies for decentralised camera reconfiguration are presented: a greedy approach and a reinforcement learning approach. In both cases, cameras are able to locally control the state of their neighbourhood and dynamically adjust their position and PTZ parameters. When crowds are present, the network balances between global coverage of the entire scene and high resolution for the crowded areas. We evaluate our approach in a simulated environment monitored with fixed, PTZ and UAV-based cameras.

Details

Title

Dynamic Camera Reconfiguration with Reinforcement Learning and Stochastic Methods for Crowd Surveillance

Author

Bisagno, Niccolò

; Xamin, Alberto

; De Natale, Francesco; Conci, Nicola; Rinner, Bernhard

First page

4691

Publication year

2020

Publication date

2020

Publisher

MDPI AG

e-ISSN

14248220

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/s20174691

ProQuest document ID

2436760964

Dynamic Camera Reconfiguration with Reinforcement Learning and Stochastic Methods for Crowd Surveillance

Jump to:

Full text

Abstract

Details

Suggested sources