VIDEO – BASED HUMAN BEHAVIOUR INTERPRETATION AND MODELLING WITH THE STATBOX CONCEPT

Abstract

Detecting people in videos or images in order to interpret their behaviour is nowadays an important part of safety and security monitoring. Detection of persons in videos is more challenging than detection of any other moving objects, be it animals or vehicles. This paper reviews the recent work on a developing tool - STATBOX for spatial analysis and modelling. STATBOX concept is based on the video data collection for a given territory, which can be taken using different devices such as mobile phones, surveillance video cameras, as well as statistical data obtained from video files. Each video must contain geolocalization data and time information as well as other markers, which make it possible to determine the precise direction of information flow in the video. All information about moving objects in the video is extracted from each video frame. The extracted information on all determined objects contains their contours, proportions, textures, and markers, as well as the geolocalization information - spatial data and time data. Further on, this information is compared with information collected by other devices using a classification engine, and this allows for reconstruction of the object movements in the given territory. Using STATBOX minimizes spatial analysis costs in the urban environment, and it has a flexibility of creation automated adequate geosimulation models - agent based modelling on GAML patterns and GIS. So far the STATBOX concept was successfully used in pedestrian counting with recognition in the city park for flows simulation and analysis, as well as in campus for student behaviour analysis and movement flow simulation. This shows that the STATBOX concept can be used for better understanding of human behaviour based on video data, and its enhanced automation can bring great benefits in the areas such as public security surveillance, traffic control and pedestrian flow analysis, crisis management and similar.

Full text

Translate

Turn on search term navigation

Headnote

ABSTRACT

Keywords: human behaviour, STATBOX, geosimulation, GIS, agent based modelling

(ProQuest: ... denotes formulae omitted.)

INTRODUCTION

Many academic research papers present moving object detection in videos captured by a static or moving camera. Increased demand for safety and security monitoring, intelligent surveillance as a human behaviour interpretation, has received o lot of attention [1], [2]. System of detecting people in images could be used in surveillance systems, driver assistance systems, image indexing etc. [3]. For example, intelligent surveillance has a wide range of applications, such as event detection, behaviour description, tracking and identification of moving objects, object counting etc.

As already mentioned earlier, detecting people in images is more challenging than detecting of any other objects, because people are articulate objects that can take on a variety of shapes, peoples dress in variety of colours and garment types that would make it difficult for colour or fine scale edge-based techniques to work well.

The video sequences provide more information than the still images about how objects and scenarios change over time [4].

Effective techniques for human detection are of special interest in computer vision since many applications involve people's location and movements. Two main approaches to human detection have been explored: first class of methods consists of a generative process where detected parts of the human body are combined according to prior human model, the second class considers purely statistical analysis that combine a set low level features within a detection window to classify the window as containing a human or not [5], [3]. Some authors propose using grids of Histograms of Oriented Gradient descriptors, object segmentation with background construction and foreground extraction for human detection [6], [1]. Different methods of video and statistic images analyses are obtained which adopt common image interpretation keys namely tone, texture, pattern, colour etc. for feature matching [5], [7], [8].

Recently, a great deal of interest has been shown in hierarchical classification structures, i.e. data classification devices that are a combination of several other classifiers, for example, Adaptive Combination of Classifiers, Support Vector Machine classifiers etc. [3].

A review of the literature shows that pedestrian movement models have been developed since the 1970s and are used to explain and predict macro, meso and micro movement patterns. There are a number of possible approaches to simulating and modelling pedestrian movement [9].

Automata-based modelling tools hold many advantages for simulation of urban phenomena in space. The decentralized structure of systems as Cellular Automata (CA) and Multi-Agent Systems (MAS), their ability to directly handle individual spatial and non spatial elements offer many benefits for model builders [10],[11]. Automata-based models allow for very detailed representation of phenomena such as movement of pedestrians, behaviour of vehicles, the migration of household etc.

Geosimulation is based on Object Oriented Programming. Geographic Information Systems (GIS) and spatial analysis have provided a range of methods for handling, interpreting and producing data for geosimulation.

There are loose-coupling, indirect connections. Data may be generated in GIS and then fed into an urban simulation. The output that is generated by simulation could then be fed back into GIS to be visualized, or to have spatial analysis performed on it. Such system can call Geographic Automata System [11].

METHOD

The system consists of several STATBOX units and a Main server. STATBOX controls the given territory by means of video surveillance. If a visual change appears, it is assumed by the system that there is an object whose movement has to be monitored. In this case the sensor saves video data for future analysis. All raw data from sensor after a certain time period is sent to the Main Server, which collects data from all sensors.

After collecting raw data, Statistics Data processor processes all raw data and extracts all object movements. All moving objects are compared to each other by the hierarchical classification engine and then stored into the database. Upon finalization of the comparison process clustering analysis of the data stored in the data base takes place. After finalizing movement process clustering analysis of all data stored in database are provided.

STATBOX simulation process builds a simulation model for the analysed territory based on the clustering analysis data and the GPS information from sensors and available map data.

In order to build the simulation model, the concept of agentification is used to connect the simulation and geographical data. An important prerequisite is the existence of a GIS model. Even though GIS models ensure high degree of details, they are still static models. In order to analyse dynamics, additional modelling methods have to be used in the urban space. For this purpose the cell automata based modelling methods are useful. However, it has to be taken into account that this modelling method can be used in the urban environment only if the level of detail is comparably small, and cannot be applied for modelling tasks involving details to the level of houses and streets, whereas urban environment usually features non-regular construction structure. In order to solve the modelling tasks requiring a high level of details, microscopic modelling methods can be used, i.e. discrete event-based modelling or the Multi-agent simulation (MAS) or geosimulation (GAS) [11] that is based on the concept of Geographic Automata Systems, which tightly couples spatial data and process models within a single, integrated system. Geosimulation is concerned with automata-based methodologies for simulating discrete, dynamic, and action-oriented spatial systems, combining cellular automata and multi-agent systems in a spatial context.

According to [11] formally, a Geographic Automata System (GAS), G, may be defined as consisting of seven components:

... (1)

where

K - set of types of automata featured in the GAS,

S - set of states

... - set of state transition rules, used to determine how automata states should change over time.

L - the georeferencing conventions that dictate the location of automata in the system. ... the movement rules for automata, governing changes in their location in time.

N - represents the neighbours and their relations of the automata.

... - rules that govern changes of automata relations to the other automata in time.

Geosimulation model consists of static elements - houses, industrial buildings, roads, pathways, other structures, as well as moving objects, like cars, people, etc. Some of moving objects can carry other objects (e.g. people travelling in cars). These objects can move between the static objects, or they may stay in other objects, leave them or move inside static objects.

STATISTICS DATA PROCESSOR

Input objects are the objects in which the moving objects enter the simulation area. Based on this data, the moving objects are created in the simulation model. Output objects are objects through which the moving objects exit the simulation area and are thus deleted from simulation system. Distribution objects are the main objects in which the moving objects determine their next goal to another main static object. Main task of the statistics data processor is collection and classification of data used for simulation. Meta information is prepared for each object. It starts with the detection of the object using background segmentation method.

The picture of the segmented object is divided, and each part is analysed separately. Specific features are extracted for each object, and these include the colour histogram, orientation and size of texture texels using Gabor filter. Texture of the object is the partial or fully sequential change of the colour and its intensity, and thus for the texture it is possible to distinguish parts of the image that are associated with each other with a relatively well-defined rules [12].

On top of this additional information on specific markers is identified and added, like bicycles, carry-on bags, textures.

...(2)

Based on the data acquired in this process, meta information of the object image is created according to formula (2). As the object is moving, and each movement is fixed in video as a new image, new meta information is obtained on this object in each movement, and it creates a meta information data group on the particular object.

Data from the meta information on each exiting object is compared with the entering unit meta data. This takes into account GIS information on the minimum times necessary to get from one object to the other. This allows for a reasonable reduction of the amount of data to be compared.

Data comparison is done using machine learning methods, where the object input meta information data is used as the training data set, but the output meta information data comprises the result data set. In order to decrease the number of errors in the defined results, after the classification an additional comparison of met information is carried out using decision trees.

Entering time and STATBOX unit registering the entry of the object, as well as the exit time and the STATBOX unit registering the exit are saved as the time and movement direction in the given territory.

All object movement directions and times are processed using kNN clustering method.

EXAMPLE

The basis selected for the STATBOX approbation and model creation is Vermanes park in Riga. This park is not only a recreation green zone, but also a transit way for many pedestrians. The park is compact, closed area with the total of 11 exits.

The research process is divided into three steps:

1. Collection of statistical data at the exits;

2 Approbation of the STATBOX concept;

3 Creation of an adequate geosimulation model, and determination of the behaviour model with and without object recognition.

For the approbation purposes in two places of the park (see the white points in the Figure 3) at the same time video has been taken. Statistics Data processor functionality includes recognition of objects in video file; extracting objects from video frame and creating metadata information necessary for object comparison with already saved objects from any of STATBOX units (see Figure 2 video motion (a) un (b)). Metadata is created from a complex histogram. The histogram is created from several object parts (see Figure 2 video motion

(c)). Object comparison is based on a feed-forward neural network. In the current example, the first layer contains 192 neurons (inputs), the second (hidden) layer has 16 neurons, and output layer has 2 neurons. The recognised objects from all STATBOXes are compared to each other. For example, for object in the Figure 2 video motion 2 there are 207 pictures recognized from one video and 152 picture recognised from the other video. In the comparison process there are positive results in 80.92% cases. Than we identify average passing times and clusters of moving objects (see table 1)

The next task is creation of XML specifications for each block based on statistics database, GIS information and additional specifications. Basis for the simulation language for automated simulation approach is the GAMA simulation platform [13] and the GAML language.

The simulation model creation process is based on XML blocks specifications and this model creation process are based on patterns. Models are already prepared as GAML patterns, in which the changeable places have to be replaced by information on the GIS model objects.

Results

DISCUSSION AND CONCLUSION

Understanding objects in video data is of particular interest due to its enhanced automation in public security surveillance as well as in traffic control and pedestrian flow analysis.

In their common environment pedestrians tend to show the same basic behaviour; as people always try to find shortest and easiest ways to reach their destination.

The STATBOX concept was successfully used in pedestrian counting with recognition in the city park for flows simulation and analysis, in campus for student behaviour analysis and movement flow simulation, as well as in public open area load measurement and flow analysis in 8 neighbourhoods in Riga city for potential public open area identification and development.

The main challenge in use of STATBOX data acquisition module for analysis of human behaviour in large area parks is obtaining data from different location of activities, for example, children playground areas, active sport areas, walking areas, greenery areas etc.

Geosimulation is a time- and resource-consuming as well expensive process. At the same time, using STATBOX solution allows to create a dynamic spatial analysis model of the real situation even for a person without specific knowledge in the modelling. The benefits of the presented solution are its comparatively low costs, quick access to the data and adequate geosimulation model. Adaptive software has been developed for the video data processing, and it considerably accelerates collection and analysis of the statistical data. The innovative elements of this solution include the automated processing of statistical data obtained from video and automatic creation of geosimulation model.

There are multiform possibilities for use of this solution ranging from planning for small businesses to monitoring in the airports and at the border crossing points, it is a technology that offers a full cycle from data receiving to the model, which can be used for the spatial development planning of an area or for the security checks in the crisis situations.

Geographic Information Systems (GIS) and spatial data infrastructures are today considered to be a mature technology. The consumer community, as well as decision and policy makers have realized the importance of making sound decisions based on information derived from properly designed geospatial databases. GIS-based simulation and spatial analyses are used to characterize spatially related variables within a digital environment and help produce informative visualizations and simulations models.

An additional and innovative way for acquisition of video or image data is allowing citizens to contribute with the data collected by them, and this can be managed using the web-based crowdsourcing and public participation platforms. Moreover, mobile technologies, such as smartphones, allow for on-site geodata acquisition, for instance, in the form of geo-referenced images in combination with a supplementary textual description.

The approach of user-generated data that are collected in collaborative processes has its advantages. However, there is a challenge that has to be dealt with in the context of crowd-sourced data, namely, preservation of people's privacy when dealing with usergenerated information and partly personal data.

References

REFERENCES

[1]Hu WC., Chen CH., Chen CM., Chen TY. Effective Moving Object Detection from Videos Captured by a Moving Camera. In: Pan JS., Snasel V., Corchado E., Abraham A., Wang SL. (Eds) Intelligent Data analysis and its Applications, Volume I. Advances in Intelligent Systems and Computing, Taiwan, 297. Springer, 2014

[2] Wren C., Azarbayejani A., Darrell T., Pentland A. Pfinder: Real time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, USA, Vol. 19, No.7, pp.780-785, 1997.

[3] Mohan A., Papageorgiou C, Poggio. Example-Based Object Detection in Images by Components. IEEE Transactions on Pattern Analysis and Machine Intelligence, USA Vol. 23, No.4, pp. 349-361, 2001.

[4] Kalirajan K., Sudha M., Rajeshkumar V., Jamaesha S.S. Adaptive visual tracking system using artificial intelligence, in Proceedings of the IEEE/OSA/IAPR International Conference on Informatics, Electronics and Vision (ICIEV '12), Bangladesh, pp. 954957, IEEE, 2012.

[5] Schwartz W. R., Kembhavi A., Harwood D., Davis L.S. Human Detection Using Partial Least Squares Analysis. IEEE 12th International Conference on Computer Vision, USA, pp.24-31, 2009.

[6] Dalal N. & Triggs B. Histogram of oriented gradient for human detection. Cordelia Schmid and Stefano Soatto and Carlo Tomasi. International Conference on Computer Vision & Pattern Recognition (CVPR '05), USA. IEEE Computer Society, 1, pp.886893, 2005,

[7] Arun P. A CNN hybrid approach towards automatic image registration. Geodesy and Cartography. 39 (3), Lithuania, pp.121-128, 2013.

[8] WyawahareM.V., Patil P.M., Abhyankar H.K. Image registration techniques an overview. International journal of Signal Processing, Image Processing and Pattern Recognition, India, Vol 2(3), pp.11-28, 2009.

[9] Helbing D., Molnar P., Farkas I., Bolay K. Self-Organizing Pedestrian Movement. Environment and Planning B: Urban Analytics and City Science. Germany, Vol 28, Issue 3, pp. 361-383, 2001.

[10] Batty M., Jiang B., Thurstain-Goodwin M. Local movement: agent-based models of pedestrian flows. (CASA Working Papers 4), Centre of Advanced Spatial Analysis (UCL), UK, pp.87, 1998.

[11] Benenson I., Torrens P.M. Geosimulation: Automata - based Modeling of Urban Phenomena. John Wiley & SONS LTD., ISBN 10: 0-470-84349-7 (H/B), USA, pp.283, 2005.

[12] Ajoy, R., & Acharya, T. (2005). Image Processing Principles and Applications. New Jersey, USA: John Wiley & Sons, Inc., Hoboken.

[13] P. Taillandier and A. Drogoul, "From GIS Data to GIS Agents, Modelling with the GAMA simulation platform.," Technical Forum Group on Agent and Multi-agent-based Simulation, 1st meeting, Paris, France, 2010.

Word count: 3022

Show less

VIDEO – BASED HUMAN BEHAVIOUR INTERPRETATION AND MODELLING WITH THE STATBOX CONCEPT

Content area

Abstract

Full text