In this article we will be discussing about apache OpenNLP POS Tagger with an example. The example will be a maven based project and we will be using en-pos-maxent.bin model file to tag any part of speech. We will be using WhitespaceTokenizer provided by OpenNLP to tokenize the text.
What is Part-of-Speech Tagging
As per wiki, POS tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.
Other NLP Articles Standford NLP Named Entity Recognition Apache OpenNLP Maven Eclipse Example Standford NLP Maven Example Standford NLP POS Tagger Example Apache OpenNLP Named Entity Recognition Example
Different POS Tags Meanings
Following is the POS Tags with their corresponding meaning.
Maven Dependencies for OpenNLP
pom.xml<dependencies> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.8.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies>
Implementing POS Tagging using Apache OpenNLP
Following is the class that takes a chunk of text as an input parameter and tags each word. In this example, first we are using sentence detector to split a paragraph into muliple sentences and then the each sentence is then tagged using OpenNLP POS tagging. Here is the complete article on Sentence Detector.
WhitespaceTokenizer
tokenizer uses white spaces to tokenize the input text. en-pos-maxent.bin
is the maxent model with tag dictionary.
package com.devglan; import opennlp.tools.postag.POSModel; import opennlp.tools.postag.POSTaggerME; import opennlp.tools.sentdetect.SentenceDetector; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import opennlp.tools.tokenize.WhitespaceTokenizer; import java.io.IOException; import java.io.InputStream; /** * Created by only2dhir on 11-07-2017. */ public class POSTaggingExample { POSTaggerME tagger = null; POSModel model = null; public void initialize(String lexiconFileName) { try { InputStream modelStream = getClass().getResourceAsStream(lexiconFileName); model = new POSModel(modelStream); tagger = new POSTaggerME(model); } catch (IOException e) { System.out.println(e.getMessage()); } } public void tag(String text){ initialize("/en-pos-maxent.bin"); try { if (model != null) { POSTaggerME tagger = new POSTaggerME(model); if (tagger != null) { String[] sentences = detectSentences(text); for (String sentence : sentences) { String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE .tokenize(sentence); String[] tags = tagger.tag(whitespaceTokenizerLine); for (int i = 0; i < whitespaceTokenizerLine.length; i++) { String word = whitespaceTokenizerLine[i].trim(); String tag = tags[i].trim(); System.out.print(tag + ":" + word + " "); } } } } } catch (Exception e) { e.printStackTrace(); } } public String[] detectSentences(String paragraph) throws IOException { InputStream modelIn = getClass().getResourceAsStream("/en-sent.bin"); final SentenceModel sentenceModel = new SentenceModel(modelIn); modelIn.close(); SentenceDetector sentenceDetector = new SentenceDetectorME(sentenceModel); String sentences[] = sentenceDetector.sentDetect(paragraph); for (String sent : sentences) { System.out.println(sent); } return sentences; } }
Testing OpenNLP POS Tagger
Following is the test class to test the tagger class.
package com.devglan; import org.junit.Test; /** * Created by only2dhir on 11-07-2017. */ public class POSTaggerTest { @Test public void tag(){ POSTaggingExample tagging = new POSTaggingExample(); tagging.tag("If you have several test classes, you can combine them into a test suite. Running a test suite executes all test classes in that suite in the specified order. A test suite can also contain other test suites"); } }
Output
Conclusion
I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.