Cover Image for System.Linq.Enumerable+EnumerablePartition`1[System.Char]

A Modified Markov-Based Maximum-Entropy Model for POS Tagging of Odia Text

OAI: oai:igi-global.com:286690 DOI: 10.4018/IJDSST.286690

Abstract

POS (parts of speech) tagging, a vital step in diverse natural language processing (NLP) tasks, has not drawn much attention in the case of Odia, a computationally under-developed language. The proposed hybrid method suggests a robust POS tagger for Odia. Observing the rich morphology of the language and unavailability of sufficient annotated text corpus, a combination of machine learning and linguistic rules is adopted in the building of the tagger. The tagger is trained on tagged text corpus from the domain of tourism and is capable of obtaining a perceptible improvement in the result. Also, an appreciable performance is observed for news article texts of varied domains. The performance of the proposed algorithm experimenting on Odia language shows its manifestation in dominating existing methods like rule based, hidden Markov model (HMM), maximum entropy (ME), and conditional random field (CRF).