Java Parser API to Extract Data

Java API to Parse & Extract Text (Raw & Formatted) with Metadata from Documents, Presentations, Zip Archives & Emails

  Download Free Trial
Java Text extraction API

GroupDocs.Parser for Java


GroupDocs.Parser for Java API is best choice for building business applications that support parsing of raw, structured & formatted text. It also allows retrieving file metadata of supported formats. GroupDocs.Parser for Java enables you to extract text and metadata from password protected files of all popular formats ranging from spreadsheets, presentations, PDFs, ZIP archives to more.

Previous Next

GroupDocs.Parser for Java Features



Count word occurrence for single or multiple documents statistically


Text extraction, without installing document reader


Fetch text from a file or stream


Pull out formatted text from a document


Use fast or standard text extraction mode


Extract text from password protected documents


Fetch formatted text from within emails & attachments programmatically


Drive out text from single or multiple pages of OneNote document


Pull out text from simple PDF file or a PDF Portfolio document


Get data from the forms in a PDF document


Draw out text from specific PowerPoint slide


Obtain formatted text from PowerPoint presentation


Extract raw or formatted text from Cells, Rows and Columns from Excel spreadsheet


Gather raw or formatted text from Word document


Get formatted table from Word document


Get text from Word document in HTML format


Pull out single sentence or whole text from EPUB, CHM, Markdown & FB2 files


Excerpt table of content from EPUB & CHM documents


Excerpt highlighted text from documents


Pull out text with its content structure intact


Retrieve text area from documents for text analysis


Obtain metadata from supported document formats


Draw out all or selected images from supported formats


Rotate extracted image(s)


Extract text from files within zip archives & OST containers


Fetch data from Email container (Exchange Web Server, POP3, IMAP)


Take out text from Database containers in fast, reliable and efficient manner


Find simple text, whole word & regular expression within documents


Look for simple text and regular expression in EPUB & FB2 files


Search & extract highlighted expressions in documents


Pull out text with plain text formatter (simple & ASCII)


Carry out custom formatting with edges, angles, and intersections to format plain text


Fetch & format text with Markdown formatter


Apply Markdown formatter to font, hyperlinks, headings, lists & tables


Get text with HTML formatter


Apply HTML Formatter to paragraph, hyperlink, font, headings, lists & tables

Extracting Text from a Document

Extracting text from a document using GroupDocs.Parser for Java API is a simple task and can be achieved with few lines of code.

Support and Learning Resources


GroupDocs.Parser offers document parsing APIs for other popular development environments as listed below: