Cover Image for System.Linq.Enumerable+EnumerablePartition`1[System.Char]

An Approach to DNA Sequence Classification Through Machine Learning

OAI: oai:igi-global.com:299963 DOI: 10.4018/IJRQEH.299963
Published by: IGI Global

Abstract

Machine learning (ML) has been instrumental in optimal decision making through relevant historical data, including the domain of bioinformatics. In bioinformatics classification of natural genes and the genes that are infected by disease called invalid gene is a very complex task. In order to find the applicability of a fresh protein through genomic research, DNA sequences need to be classified. The current work identifies classes of DNA sequence using machine learning algorithm. These classes are basically dependent on the sequence of nucleotides. With a fractional mutation in sequence, there is a corresponding change in the class. Each numeric instance representing a class is linked to a gene family including G protein coupled receptors, tyrosine kinase, synthase, etc. In this paper, the authors applied the classification algorithm on three types of datasets to identify which gene class they belong to. They converted sequences into substrings with a defined length. That ‘k value' defines the length of substring which is one of the ways to analyze the sequence.