This tutorial is about setting up apache opennlp with maven in Eclipse or IntellijIdea. Here we will be creating an example using Sentence Detector componenet provided by apache opennlp.For this purpose we will be using en-sent.bin file that is trained on opennlp training data. So let us get started.
Other NLP Articles Apache OpenNLP Named Entity Recognition Example Standford NLP Maven Example Standford NLP POS Tagger Example OpenNLP POS Tagger Example Standford NLP Named Entity Recognition
What is NLP
NLP stands for Neuro-Linguistic Programming. Neuro refers to your neurology; Linguistic refers to language; programming refers to how that neural language functions. In other words, learning NLP is like learning the language of your own mind and its referred as Natural Language Processing.
There are many existing NLP libraries available online which are already trained on most common NLP tasks such as NLTK, OpenNLP, Standford CoreNLP. In this post we will be discussing about OpenNLP and provide a basic example to get started with OpenNLP to detect sentences using maven and eclipse IDE.
Project Structure
Maven Dependency
opennlp-tools
: It provides concrete implementations of NLP algorithms such as sentence splitting, POS-tagging etc.
<groupId>com.devglan</groupId> <artifactId>open-nlp-demo</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.8.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies>
Implementing OpenNLP SentenceDetector
SentenceDetector can detect sentences from a part of speech. OpenNLP has predefined componenet as en-sent.bin which is trained to identify sentences from a part of speech. We have this file - en-sent.bin present inside /resources folder. Once this file is loaded, we can call sentDetect() to detect the sentences from a part of speech.
SentencePosDetectorDemo.javapackage com.devglan; import opennlp.tools.sentdetect.SentenceDetector; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import java.io.IOException; import java.io.InputStream; /** * Created by only2dhir on 08-07-2017. */ public class SentenceDetectorDemo { public String[] detectSentence(String paragraph) throws IOException { InputStream modelIn = getClass().getResourceAsStream("/en-sent.bin"); final SentenceModel sentenceModel = new SentenceModel(modelIn); modelIn.close(); SentenceDetector sentenceDetector = new SentenceDetectorME(sentenceModel); String sentences[] = sentenceDetector.sentDetect(paragraph); for (String sent : sentences) { System.out.println(sent); } return sentences; } }
Implementing OpenNLP SentencePosDetector
OpenNlp also provides ways to detect the positions of the sentences in a raw text. We can use sentPosDetect() to identify the position of the sentences from a raw text. Following is an example.
SentencePosDetectorDemo.javapackage com.devglan; import opennlp.tools.sentdetect.SentenceDetector; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import opennlp.tools.util.Span; import java.io.IOException; import java.io.InputStream; /** * Created by only2dhir on 08-07-2017. */ public class SentencePosDetectorDemo { public Span[] detectSentencePos(String paragraph) throws IOException { InputStream modelIn = getClass().getResourceAsStream("/en-sent.bin"); final SentenceModel sentenceModel = new SentenceModel(modelIn); modelIn.close(); SentenceDetector sentenceDetector = new SentenceDetectorME(sentenceModel); Span[] spans = sentenceDetector.sentPosDetect(paragraph); for (Span span : spans) { System.out.println(span); } return spans; } }
Testing the Application
Following are some test cases to detect sentences and its position using apache OpenNLP.
SentenceDetectorTest.javapackage com.devglan; import opennlp.tools.util.Span; import org.junit.Assert; import org.junit.Test; import java.io.IOException; /** * Created by only2dhir on 08-07-2017. */ public class SentenceDetectorTest { @Test public void SentenceDetectorTest() throws IOException { SentenceDetectorDemo sentenceDetector = new SentenceDetectorDemo(); String[] sentences = sentenceDetector.detectSentence("If you have several test classes, you can combine them into a test suite. Running a test suite executes all test classes in that suite in the specified order. A test suite can also contain other test suites."); Assert.assertTrue(sentences != null && sentences.length > 0); } @Test public void SentencePosDetectorTest() throws IOException { SentencePosDetectorDemo sentenceDetector = new SentencePosDetectorDemo(); Span[] spans = sentenceDetector.detectSentencePos("If you have several test classes, you can combine them into a test suite. Running a test suite executes all test classes in that suite in the specified order. A test suite can also contain other test suites."); Assert.assertTrue(spans != null && spans.length > 0); } }
Output
Conclusion
I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.