In his article we will be discussing about OpenNLP named entity recognition(NER) with maven and eclipse project. We will be using NameFinderME class provided by OpenNLP for NER with different pre-trained model files such as en-ner-location.bin, en-ner-person.bin, en-ner-organization.bin.
What is Named Entity Recognition
As per wiki, Named-entity recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
Eclipse Project Structure
Maven Dependency
<dependencies> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.8.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies>
Other NLP Articles Standford NLP Named Entity Recognition Apache OpenNLP Maven Eclipse Example Standford NLP Maven Example OpenNLP POS Tagger Example Standford NLP POS Tagger Example
Apache OpenNLP Named Entity Recognition
There are many pre-trained model objects provided by OpenNLP such as en-ner-person.bin
,en-ner-location.bin
, en-ner-organization.bin
, en-ner-time.bin
etc to detect named entity such as person, locaion, organization etc from a piece of text. The complete list of pre-trained model objects can be found here.
There is a common way provided by OpenNLP to detect all these named entities.First, we need to load the pre-trained models and then instantiate TokenNameFinderModel
object. Following is an example.
InputStream inputStream = getClass().getResourceAsStream("/en-ner-person.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStream);
After this we need to initialise NameFinderME class and use find() method to find the respective entities. This method requires tokens of a text to find named entities, hence we first require to tokenise the text.Following is an example.
NameFinderME nameFinder = new NameFinderME(model);
String[] tokens = tokenize(paragraph);
Span nameSpans[] = nameFinder.find(tokens);
Finding Names Using OpenNLP
Based on the above undestanding, following is the complete code to find names from a text using OpenNLP.
package com.devglan; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; import java.io.IOException; import java.io.InputStream; /** * Created by only2dhir on 15-07-2017. */ public class NameFinder { public void findName(String paragraph) throws IOException { InputStream inputStream = getClass().getResourceAsStream("/en-ner-person.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStream); NameFinderME nameFinder = new NameFinderME(model); String[] tokens = tokenize(paragraph); Span nameSpans[] = nameFinder.find(tokens); for(Span s: nameSpans) System.out.println(tokens[s.getStart()]); } public String[] tokenize(String sentence) throws IOException{ InputStream inputStreamTokenizer = getClass().getResourceAsStream("/en-token.bin"); TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer); TokenizerME tokenizer = new TokenizerME(tokenModel); return tokenizer.tokenize(sentence); } }
Finding Location Name using Apache OpenNLP
Similar to name finder, following is an example to identify location from a text using OpenNLP
package com.devglan; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; import java.io.IOException; import java.io.InputStream; /** * Created by only2dhir on 15-07-2017. */ public class LocationFinder { public void findLocation(String paragraph) throws IOException { InputStream inputStreamNameFinder = getClass().getResourceAsStream("/en-ner-location.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStreamNameFinder); NameFinderME locFinder = new NameFinderME(model); String[] tokens = tokenize(paragraph); Span nameSpans[] = locFinder.find(tokens); for(Span span : nameSpans) System.out.println("Position - "+ span.toString() + " LocationName - " + tokens[span.getStart()]); } public String[] tokenize(String sentence) throws IOException{ InputStream inputStreamTokenizer = getClass().getResourceAsStream("/en-token.bin"); TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer); TokenizerME tokenizer = new TokenizerME(tokenModel); return tokenizer.tokenize(sentence); } }
Testing the Application
Following are some test cases to detect named entities using apache OpenNLP.
NERTester.javapackage com.devglan; import org.junit.Test; /** * Created by only2dhir on 15-07-2017. */ public class NERTester { @Test public void nameFinderTest() throws Exception{ NameFinder nameFinder = new NameFinder(); nameFinder.findName("Where is Charlie and Mike."); } @Test public void locationFinderTest() throws Exception{ LocationFinder locFinder = new LocationFinder(); locFinder.findLocation("Charlie is in California but I don't about Mike."); } }
Output
Conclusion
I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.