Abstract
Facial expression recognition (FER) is of great value in affective computing. Currently, many FER models have difficulty distinguishing between primary and secondary features. To this end, we propose a FER model based on the semantic bottleneck attention module (BAM). The improvement is that the input image is processed through two branches. BAM uses spatial attention to construct feature maps that can highlight important feature areas and suppress secondary features. The moving flipped bottleneck convolution (MBConv) module introduces random depth, reducing training time and improving model performance. Random depth enables the model to reduce the risk of overfitting during training by randomly skipping some layers. Global average pooling replaces the fully connected layer and compresses the entire feature map (FM) into numerical values to obtain high-purity image features. Through experiments on Extended Cohn-Kanade, Static Facial Expressions in the Wild, and Real-world Affective Faces Database CK+, SFEW, and RAFDB databases, the accuracy of our model reaches 98.33%, 61.14%, and 89.43% respectively, which is better than the other three advanced comparison methods.