Content area
Full text
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Over the last few decades, human activity analysis has undergone rapid development receiving increasing attention in many fields, such as intelligent surveillance, human-computer interaction, and elder care management [1, 2]. Human activity can be categorized according to complexity as partial body action [3], simple action [4], interaction activity [5, 6], or group activity [7]. Motivated by the activity classes drawn from [5, 6], this paper focuses on two-person interaction recognition of six complex interactions: kicking, pointing, pushing, punching, exchanging an object, and shaking hands.
Much research has been done on two-person interactions [5–10] with respect to the kinds of complex action relationships and human features necessary for recognition. For example, [5] took into account whether one person’s hand is above another’s shoulder or whether one person’s foot is near another’s torso. Reference [6] used head-pose, arm-pose, leg-pose, and overall body-pose estimation with both people for recognition. However, these processes are complex and time consuming and the recognition results might not be as accurate as required for a particular application. This paper proposes a new definition for interactions based on one person’s behavior called Positive Action. In this method, one person’s action plays the key role in an interaction; thus, two-person interaction recognition can be simplified into Positive Action recognition. This approach is simpler than traditional methods, saves computing time, and improves recognition results.
The recent proliferation of a cheap but effective depth sensor, the Microsoft Kinect [11], has created more opportunities for quantitative analysis of complex human activities. As compared to the traditional video camera, Kinect has the advantage of synchronous acquisition of color and depth images; with the use of depth maps, 3D information about a scene from a particular point of view...





