Content area
Automating productivity monitoring is crucial for improving the construction industry. To measure productivity, we should identify which worker works for what object and their relationship. The lack of understanding between human and object interaction in a large-scale format from video surveillance has become a significant challenge in construction sites. However, the existing vision-based studies only focus on object detection and activity recognition, which do not recognize workers, objects, and actions simultaneously. This situation makes managers unable to measure the productivity of the workers effectively. To address the issue, this study applies the HOI technique, which consists of object detection tasks and interaction prediction tasks through faster R-CNN and graph neural networks (GNN). There are two groups of actions in this interaction, including productive (installing, preparing, and transporting) and non-productive actions (no interaction) on the formwork structure. Our model achieves 0.674, 0.556, and 0.632 mAP scores of the local area, global area, and average area of the objects sequentially, indicating that the model can monitor construction productivity effectively. For future studies, utilizing more information, such as temporal and body postures of workers, can potentially improve the performance of the HOI model for the productivity monitoring process.