Cover Image for System.Linq.Enumerable+EnumerablePartition`1[System.Char]

Neural Semantic Video Analysis

OAI: oai:igi-global.com:331072 DOI: 10.4018/978-1-6684-7366-5.ch068

Abstract

Videos are a rich form of data intended for capturing, storing, and communicating information. The availability of inexpensive and accessible video-capturing sensors in smartphones, handheld cameras, and consumer security cameras has exponentially increased global video footage generation over the past decade. Since video is a popular form of widely consumed and produced data, it is essential to develop automated systems to analyze and identify relevant information within the large body of video material. This chapter demonstrates how the emergence of neural networks, including CNNs and transformers, has revolutionized semantic video analysis. Through convolutional filters, spatial patterns can be captured at the pixel level through this type of neural network. The learning capability of CNN-based models has been exceeded more recently by self-attention-based models. Both CNN-based and transformer-based semantic video analysis models take advantage of transfer learning, self-supervised learning, and more to compensate for the lack of large, supervised video datasets.