Abstract
As governments, education departments, and academic accreditation bodies have begun to encourage schools to develop evidence-based decision-making and innovation systems, learning analysis techniques have shown great advantages in decision-making aid and teaching evaluation. After integrating relevant algorithms and technologies in artificial intelligence and machine learning, learning analysis has achieved higher analysis accuracy. In order to realize the recognition of students' classroom behaviors such as standing up, sitting up, and raising hands and improve the recognition accuracy and recall rate, multi-modal data such as human key point information and RGB images are used for experiments. To further improve the feature extraction capability of the model, features are extracted from the improved ResNet-50 and EfficientNet-B0 models, and bilinear fusion is performed to further improve the recognition accuracy of the models.