Human Behavior Identification and Recognition System

The student article explains an artificial intelligence-based system to detect human body patterns and movements for multiple targets in real-time to recognize their behaviors and classify them as either normal or abnormal.

author avatar

02 Aug, 2022. 14 min read

This article is a part of our University Technology Exposure Program. The program aims to recognize and reward innovation from engineering students and researchers across the globe.

Introduction

This paper presents a research work carried out based on capturing human behavior. This topic is very important and exciting because it vastly deals with artificial intelligence, specifically image processing and machine learning. Human behavior is quite complex to understand since it is influenced by many factors such as culture, actions, environment, emotions and many more. Nowadays machines are designed and programmed in such a way that they can easily analyze and identify human behavior almost like a human does. 

A research previously done in [4], proposes the HMM technique to be used to recognize human behavior and with that technique, it classifies the different behaviors. We have considered a step further by applying an amended similar approach that focuses on the human behavior recognition to identify and classify human behavior; Normal and Abnormal behavior. The proposed approach is based on the VDMT, HMM and ANN methods used to perform three core phases in order to accomplish human behavior identification and recognition:

i) detection,

ii) recognition, and

iii) classification

The proposed system is destined for underprivileged people in underprivileged communities. The system is expected to detect human body patterns and movements, not just for a single person but for a group of people or multiple targets in real-time and recognize their behaviors and classify behavior as either normal or abnormal. It is also expected that results obtained must be accurate, reliable, robust, and in real-time. The proposed system needs to be effective and efficient. It needs to accurately detect the human body of above 95%, recognize the human behavior with above 90%, and over 95% of behavioral classification. Figure 1 shown below represents a high level representation of the proposed system.

Figure 1 System representation

Human behavior recognition is an area that is receiving an increase in interest according to various surveys [5]. Due to increased global security concerns, there is an ever increasing need for intelligent human behavior detection and recognition systems in public places such as sports arenas, shopping malls, schools, airports, old peoples’ homes, hospitals etc. [3]. The main objectives are to detect and recognize interesting events and then perform some methods of classifications on the type of behavior being showcased. Human behavior can either be classified as normal or abnormal. Abnormal behavior refers to behavior that may seem out of place, uncommon, suspicious or irregular. It is quite a challenging task to perform accurate and precise classifications due to the fact that human behavior is quite complex. Different people will perform the same types of task in a different manner. Therefore, it is desirable to use intelligent techniques or methods that are capable of learning what normal behavior is and then it should be able to distinguish between normal and abnormal behavior. Such a system could replace existing video surveillance techniques, which usually involve human labor. However, human labor has several limitations [2]. Human beings are subjected to factors such as fatigue, distractions and there is also a high probability that humans will be unable to accurately identify abnormal behavior. Furthermore, the cost of quality human labor does not come cheap. These are considered to be a waste of human resources, which also affect the efficiency of surveillance networks [4]. The proposed system will seek to address these problems. The design, modeling and implementation of an intelligent system will be able to eliminate the problems that are exhibited by existing methods of surveillance networks. In the following section we present a brief literature review of the technology.

Literature Review

In 2012, Popoola and Wang [5], carried out an in depth review on existing techniques for video based human behavior recognition. They stated that there indeed was an increase in interest in video based HBR, as shown in Figure 2. 

Figure 2 Growing Interest in HBR [5]

The main interest in applying HBR is to identify abnormal activities [5], and not to learn the different types of activities that are taking place in a scenario for example, walking, jumping, interacting etc. Apart from being commonly used as a security monitoring system, it has a wide variety of applications, including the monitoring of the elderly, disabled or in a more general case, the underprivileged of the society. This is the target audience that this project is focused on. The review shows that existing techniques usually are from disciplines like:

  • Data Mining

  • Statistical approaches

  • Machine learning

  • Spectral theory

  • Information theory

There are three main stages in video based HBR.The first is to detect the presence of humans in a scenario. Detection has been done by the use of sensors or some use cameras to acquire the data. The information from sensors or cameras is then used to extract the features. These features are used to represent events and construct behavioral models. Feature extractions are key to obtaining highly accurate and precise behavioral models. Most of the literature use object tracking and object motion trajectory, due to its simplicity. Tracking techniques however have drawbacks of occlusion and shadows [5]. These approaches usually fail when used in very crowded public places. Thus, this problem is addressed by the use of low-level features and their statistics to describe human behaviors [5]. 

The second is to recognize the type of interaction (whether single or group), the motions, pose, movements, gestures etc. Once detection and recognition is achieved, behavioral classifications are the final step to determine normal and abnormal behavior. 

In 2013, Maierdan, Watanabe and Maeyama, introduced a HMM that is able to recognize human behavior. They state that the HMM is a very robust model for classification techniques [4]. The main focus is to demonstrate the effectiveness of HMM with data processing techniques. HMM is a set of statistical models used to characterize the different properties of a signal [4]. An overview of the proposed approach is shown in Figure 3. 

Figure 3: Proposed Technique by [4]

A more recent survey in 2016 proposes a HBR method by the use of K Nearest Neighbor approach [2]. This technique looks to improve on previous methods that try to perform HBR based on motion analysis. The input to the system would be a video sequence, then the process of human detection, then human tracking and finally human action recognition. To perform human tracking, the method of backward subtraction by GMM is used to obtain the part of the image that we are interested in. The Kalman Filter is then used to perform the tracking of a moving human. Human beings are hard to monitor due to the fact that complex and unpredictable movements may occur. Human beings do not have a defined way of moving, it solely depends on the individual. Thus, this survey states that the use of the Kalman Filter is able to handle constraints such as distance, regularity of speed, rigidity and the smoothness of motion. The classification is then performed by the k-NN method. It is different from conventional ANN techniques due to the fact that K- NN determines the class of the new object by assigning the majority class of the k objects most like him in the learning base [2]. Indeed, this method is quite simple; however the experimental rate of recognition obtained from doing various experiments was only 71.1%. 

In 2009, a technique that deals with VLMM was proposed [6]. Human behavior usually contains a series of atomic actions [6]. This technique consists of two steps. Firstly, a posture selection algorithm is developed based on a modified shape context matching technique. This forms a codebook that converts various input posture into discrete symbol sequences that can be used for subsequent processing [6]. Then VLMM is used to learn the type of posture being exploited. The VLMM is then converted to HMM so that atomic actions are recognized with more robustness and efficiency. Experimentation results show that posture labeling achieved 85% accuracy, but with a quicker computation time. When compared with 2 existing techniques, the proposed method achieved 100% accuracy of recognizing and classifying the input posture [6]. This proposed technique tries to demonstrate the effectiveness of the use of HMM, due to the fact that HMM is able to update automatically on a continuous basis. However, this journal only performs classification on the type of human actions being performed. 

In 2018, Zhou Zhigang, Duan Guangxue, Lei Huan, Zhou Guangbing, Wang Nan, and Yang Wenjie [1] proposed a method that can be used to improve the poor robustness and the low accuracy of 2D image recognition. Method proposed is a double-branch deep convolutional neural network which includes the Hopcroft-Karp algorithm. This method outputs the human skeletal sequence diagram and behaviors identified by multi-classification support vector machines. Seven simple human behavior samples were tested such as standing, walking, waving, bending, squatting, and sitting etc. The experimental results show that this proposed approach has a good accuracy and robustness. Walking and running has a recognition rate of 83% while others are above 90%. Detection time for all is less than 179ms and recognition time for bending, waving, squatting and sitting is 110ms [1]. This model has a good real-time performance for detecting human behavior and has fast rate of accurately identifying and detecting the human body. In 2014, Tsung-Han Tsai and Chih-Hao Chang presented an algorithm that can be used on tracking multiple targets like a crowd or group of people. There are cases classified such as no match case, only-one case, match case, split case, and occlusion case. Object boundary box and object velocity are the two pieces of information needed to track people in a successive frame under those cases [3]. 

The proposed algorithm uses the Multi-Model Background Maintenance, Run- length Encoding, and Object labeling. The experimental results reveal that the correspondences match accuracy above 94% average meaning that this system has a high level performance of tracking. When the foreground is the same as the background, the algorithm can still solve the problem on the detection error [3]. Xian Tang [7] carried out research which was published in 2009 about a system designed using HMM and ANN methods for Automatic speech recognition (ASR). The two combined methods improved the flexibility and real-time performance for recognition. The approach of combining these two fundamental techniques is effective and efficient as they produce accurate results where accuracy is 95.9%, and 1.17% for inaccuracy. These two combined approaches can be used in other applications such as human behavior recognition or word recognition. 

In [9], a real time camera sign language recognition system is proposed and implemented. The system looks at image processing of the hand gestures followed by some feature extraction techniques to verify the gesture. Different classification techniques and logics are applied to classify the images and results are compared experimentally. Conditional classification is also used in the research to test for accuracy and is compared with previous results. Almohaimeed and Prince [10] discuss the various techniques that can be used to detect, represent and track individuals. From this review, it shows the advantages of using background subtraction to detect, blob analysis to detect and the power of the particle filter when it comes to the problem of tracking. The relevant materials showcase the various proposed methods as well as their efficiency. There is indeed an abundance of literature, but the team only considers the techniques that are achievable in the proposed timeline for this project. A few of the literature propose methods that use HMM due to its dynamic nature. It is proved that it can indeed change its statistical properties on a continuous basis. HMM is also very robust and reliable according to the literature materials. Methods that involve tracking techniques state that the major drawbacks are occlusion and shadows. Methods that are simple to implement do not have a very good recognition rate, as shown in [2]. 

Also, some techniques are only able to detect a single individual and then carry out recognition and classifications techniques. Such techniques are of no use in a practical situation. Some of the literature propose the use of sensors that can be used to detect human beings in a scenario. Although the proposed methodology is relatively simple, the use of sensors is not ideal. This is due to the fact that sensors respond to physical characteristics. Furthermore, one has to take into account the quantity of the sensors and where should their exact positioning be in a particular situation. The best methods identified from the literature review seem to be the ones that use HMM as well as those that use ANN. They have high rates of recognition for both single and group interactions. However, ANN techniques deal with activity or posture recognition, whereas HMM deals with human behavior recognition, which is the major objective for this particular project. The team has decided to integrate the concepts of HMM and ANN to perform the three phases of HBR, which are detection, recognition and classification. 

The concept of HBR is such a daunting task. There is no “all-purpose” algorithm that can be used in various applications of HBR [5]. The design will depend on specifications like budget, time, target audience, the availability of human resources etc. The methodology is presented next. 

System Model

The proposed System includes three main modules namely Detection, Recognition, and Classification. Figure 4 presents a high-level block diagram system representation.

Figure 4 Overview of System Model

Input data is obtained from a high definition camera that is interfaced with an Arduino microcontroller, and the generated output data is processed and classified as either normal or abnormal behavior. Input signals need to go through three major phases in order to classify human behavior. All different phases are described below as follows: 

I. Detection – In this phase, the goal is to detect the presence of a human being, develop a model of the human being and to apply tracking techniques to track the motion/trajectory of the human being from the input video frames in the environment. Video tracking techniques will be applied here. 

II. Recognition – The next phase is to try to understand and define actions/activities/behaviors that the detected individuals are engaged in. The concept of HMM is applied to develop and assign behavioral models to each and every individual that is detected. 

The behavioral models will contain the information that describes actions/activities/behaviors that are occurring in the target environment. 

III. Classification – The final phase is further process detected patterns by using ANN to determine whether human behavior are to be classified as normal or abnormal. Figure 5 shown below presents the overall architectural level system diagram. 

Figure 5 System architecture

A brief description of the system submodules is provided below: 

  • Background Subtraction via Mixture of Gaussians – given a video frame, we split the foreground and background of the frame. The foreground typically contains all objects that are in motion, while background contains the static objects, hence the name background subtraction. 

  • Blob detection – this technique is used to relate regions of pixels. If pixels have similar properties, then they are classed as one object. Thus pixels obtained by performing subtraction are classified by their weights, covariance and weights. If there are similarities, blob detection will class them together, hence allowing the detection of humans from the background subtracted frames. 

  • Tracking – here the main objective is to be able to estimate the state of the human with respect to time. For instance, if one would like to estimate the heartbeat of humans, then a heart rate sensor is used to obtain data about heartbeat. Now, the SIR particle filter will try to obtain an estimate to the actual path. 

  • HMM – the state estimated by the SIR particle filter is said to be hidden or unknown, for e.g. a person may be stationary in the target environment, but their actual state is unknown. It is unknown because the person may be standing still, sitting down, laying down etc. The concept of human behavior recognition requires that the actual behavior/activity/action be known. Thus, Hidden Markov Models can be used here. Given a set of observations (obtained by SIR particle filter) as well as a training/data set containing 

  • the prior probabilities of each behavior occurring, HMM can be used as a classifier to determine the actual behavior/action/activity that the detected individual may be engaged in. 

  • ANN – Finally, ANN is used to classify the developed behavioral model from the HMM phase and make decisions on whether it is normal or abnormal, which is the desired output of the proposed algorithm. In the next section, we present experimental results and discussion. 

Implementation Results 

The proposed system was implemented in software using the computer vision toolbox and obtained simulation results were analyzed. Video samples are loaded, then pre-processed to improve the quality of the video frame. Then after background subtraction is applied to perform video frame segmentation and to separate the video foreground from the background. The foreground usually contains all objects that are in motion, thus this is the area of interest. Then the morphological filter is used to eliminate any unwanted pixels from the subtracted image. Finally, the concept of blob detection is used to determine which regions of pixels are related or have similar characteristics. Figures 6, 7, 8 and 9 demonstrate the procedure of background subtraction with blob analysis for detection of human beings. 

Figure 6 Video frame sample

Figure 7 Performing segmentation to subtract video frame background

Figure 8 Applying morphological filter to remove video frame noise

Figure 9 Applying BLOB analysis to detect human in video frame

The state of moving subjects is obtained by using the SIR particle filter and proper sensors capable of detecting the changes in state of human behavior. This changes in state are said to be the observations, which is the required input to the HMM. A training set, which should contain features of human behavior, is used to match with the observations. The behavior which has the highest probability given the training is then considered to be the actual behavior of the detected human beings. This is the supervised learning phase. The concept of unsupervised learning has also been considered, in which learning does not make use of any training sets to identify the hidden state that has the highest probability.

Conclusion

The paper proposes a behavioral model for identifying and recognizing human behavior. The proposed method applies the concept of background subtraction via Gaussian mixture models and blob detection. The changes of states of the human being movement is estimated via the use of a SIR particle filter. Human behavior recognition is performed using HMM. 

HMM requires a set of observations (particle filter) and estimates of prior probabilities (from training set) in order to correct matching hidden to and actual states. The last processing phase is to perform classification to determine whether or not there is any abnormal behaviors or human activities. Computer simulation results showed that the proposed system is promising. 

References 

[1] Z. Zhigang, D. Guangxue, L. Huan, Z. Guangbing, W. Nan, Y. Wenjie. (2018, June). “ The 30th Chinese Control and Decision Conference”. Human Behavior Recognition Method Based on Double-Branch Deep Convolution Neural Network 

[online]. pp.5520-5524. Available: 

[2] K.C Chang, P.K Liu, C.S Yu. (2016, May). “ 2016 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW)”. Design of Real-Time Video Streaming and Object Tracking System for Home Care Services. 

[online]. pp. 1-2. Available: https://ieeexplore-ieee- org.ezproxy.usp.ac.fj/document/7521004 

[3] N. Jaouedi, N. Boujnah, O. Htiwich and M. Bouhlel, "Human Behavior Recognition to Human Behavior Analysis", 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2016. 

[Accessed 12 March 2019]. 

[4] T.H Tsai, C.H Chang. (2014, ). “ ”. A High Performance Object Tracking Technique with An adaptive Search Method in Surveillance System. 

[online]. 

[5] M. Maierdan, K. Watanabe and S. Maeyama, "Human Behavior Recognition System Based on 3-Dimensional Clustering Methods", in International Conference on Control, Automation and Systems, 2013, pp. 1-5. 

[6] M. Saito, K. Kitaguchi, H. Nishida and M. Hashimoto, "Human Behavior Recognition using Regression Models", in ICROS-SICE International Joint Conference, Japan, 2009, pp. 1-4. 

[7] Yu-Ming Liang, Sheng-Wen Shih, A. Chun-Chieh Shih, H. Liao and Cheng-Chung Lin, "Learning Atomic Human Actions Using Variable-Length Markov Models", IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 1, pp. 268-280, 2009. Available: 10.1109/tsmcb.2008.2005643. 

[8] X. Tang, “Hybrid Hidden Markov Model and Artificial Neural Network for Automatic Speech Recognition,” 2009 Pacific-Asia Conference on Circuits, Communications and Systems, 2009. 

[9] Kumar, Amit and Assaf, Mansour and Mehta, Utkal V. (2017) Real time classification of American sign language for finger spelling purpose. In: Internet of Vehicles – Technologies and Services. Lecture Notes in Computer Science, 10036 . Springer, Switzerland, pp. 128-137. ISBN 978-3-319-51968-5 

[10] N. Almohaimeed and M. Prince, "A Comparative Study of Different Object Tracking Methods in a Video", International Journal of Computer Applications, vol. 41, pp. 2-8, 2019. Available: http://www.researchgate.com. [Accessed 6 April 2019].

About the University Technology Exposure Program 2022

Wevolver, in partnership with Mouser Electronics and Ansys, is excited to announce the launch of the University Technology Exposure Program 2022. The program aims to recognize and reward innovation from engineering students and researchers across the globe. Learn more about the program here.

mouser-ansys