In this tutorial we will be discussing about Standford NLP POS Tagger with an example. We will be creating a simple project in eclipse IDE with maven as a building tool and look into how Standford NLP can be used to tag any part of speech. We will be using MaxentTagger provided by Standford to tag POS using english-left3words-distsim.tagger.
What is Part-of-Speech Tagging
As per wiki, POS tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.
Other NLP Articles Standford NLP Named Entity Recognition Apache OpenNLP Maven Eclipse Example Standford NLP Maven Example OpenNLP POS Tagger Example Apache OpenNLP Named Entity Recognition Example
Different POS Tags Meanings
Following is the POS Tags with their corresponding meaning.
Project Structure
Maven Dependencies for OpenNLP
pom.xml<dependencies> <dependency> <groupId>edu.stanford.nlp</groupId> <artifactId>stanford-corenlp</artifactId> <version>3.8.0</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies>
Implementing POS Tagging using Apache OpenNLP
Following is the class that takes text as an input parameter and tags each word.Here is an example of Apache OpenNLP POS Tagger Example if you are looking for OpenNLP taggger.
MaxentTagger
is the main class for users to run, train, and test the part of speech tagger.Here we are initialzing MaxentTagger with a constructor taking as argument the location of parameter file with a trained tagger as english-left3words-distsim.tagger
package com.devglan; import edu.stanford.nlp.tagger.maxent.MaxentTagger; import java.io.IOException; public class TaggerExample { public void tag(String text) throws IOException, ClassNotFoundException { MaxentTagger maxentTagger = new MaxentTagger("english-left3words-distsim.tagger");; String tag = maxentTagger.tagString(text); String[] eachTag = tag.split("\\s+"); System.out.println("Word " + "Standford tag"); System.out.println("----------------------------------"); for(int i = 0; i< eachTag.length; i++) { System.out.println(eachTag[i].split("_")[0] +" "+ eachTag[i].split("_")[1]); } } }
Testing OpenNLP POS Tagger
Following is the test class to test the tagger class.
package com.devglan; import org.junit.Test; import java.io.IOException; public class TaggerTest { @Test public void tag() throws IOException, ClassNotFoundException { TaggerExample tagging = new TaggerExample(); tagging.tag("If you have several test classes, you can combine them into a test suite."); } }
Output
Conclusion
I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.