Java Parser API to Extract Data

Java API to parse & extract images and text with metadata from documents, presentations, archives & emails.

  Download Free Trial
Java Text extraction API

GroupDocs.Parser for Java


GroupDocs.Parser for Java is a text, image and metadata extractor API for building business applications that support parsing of raw, structured & formatted text. It also allows retrieving file metadata of supported formats. GroupDocs.Parser for Java enables you to extract text and metadata from password protected files of all popular formats including Word processing documents, Excel spreadsheets, PowerPoint presentations, OneNote, PDF files and ZIP archives.

Previous Next

GroupDocs.Parser for Java Features



Count Word Occurrence for Single or Multiple Documents Statistically


Extract Text and Metadata from Excel Worksheets and Presentation Templates


Fetch Text from a File or Stream, Without Installing Document Reader


Pull Out Formatted Text from a Document Using Fast or Standard Text Extraction Mode


Detect the Media Type of Password Protected XML Documents & Extract Text from Them


Fetch Formatted Text from within Emails & Attachments Programmatically


Drive out Text from Single or Multiple Pages of OneNote Document


Pull out Text from Simple PDF File or a PDF Portfolio Document


Obtain Formatted Text from PowerPoint Presentation or Draw out Text from Specific Slide


Extract Raw or Formatted Text from Cells, Rows And Columns from Excel Spreadsheet


Gather Raw or HTML Formatted Text from Word Document & Excerpt Highlighted Text from Documents


Get Data from the PDF Forms & Obtain Formatted Table From a PDF or Word Document


Pull Out Single Sentence or Whole Text from EPUB, CHM, Markdown & FB2 Files


Excerpt Table of Content from EPUB & CHM Documents


Retrieve Text Area from Documents for Analysis & Pull Out text with its Content Structure Intact


Obtain Metadata from Supported Document Formats


Draw Out All or Selected Images from Supported Formats & Rotate Extracted Image(s)‎


Extract Text from Files within Zip Archives & OST Containers


Fetch Data from Email Container (Exchange Web Server, POP3, IMAP)‎


Take Out Text from Database Containers in Fast, Reliable and Efficient Manner


Find Simple Text, Whole Word & Regular Expression within Documents


Prepare Document Template, Extract Data from Document and Analyze Data Fields & Tables


Search & Extract Highlighted Expressions in Documents


Pull out Text with Plain Text Formatter (Simple & ASCII) or Custom Formatting with Edges, Angles, & Intersections


Fetch & Format Text (Font, Hyperlinks, Headings, Lists & Tables) with Markdown Formatter


Get Text with HTML Formatter & Apply Formatter to Paragraph, Hyperlink, Font, Headings, Lists & Tables


Move Table Layout & Detect Tables in a Rectangular Area by Column Separators

Get Text with Plain Text or HTML Formatters

With GroupDocs.Parser for Java, you can apply various formatters to the Text and HTML. You can pull text with Plain Text Formatter for both Simple and ASCII. You can also get Text with HTML Formatter and apply formatting to paragraph, hyperlink, font, headings, lists and tables.

Support and Learning Resources


GroupDocs.Parser offers document parsing APIs for other popular development environments as listed below: