Java Parser API to Extract Data

Java API to Parse & Extract Text (Raw & Formatted) with Metadata from Documents, Presentations, Zip Archives & Emails

  Download Free Trial
Java Text extraction API

GroupDocs.Parser for Java

 

GroupDocs.Parser for Java API is best choice for building business applications that support parsing of raw, structured & formatted text. It also allows retrieving file metadata of supported formats. GroupDocs.Parser for Java enables you to extract text and metadata from password protected files of all popular formats ranging from spreadsheets, presentations, PDFs, ZIP archives to more.

Previous Next

GroupDocs.Parser for Java Features

 

 

Count word occurrence for single or multiple documents statistically

 

Text extraction, without installing document reader

 

Fetch text from a file or stream

 

Pull out formatted text from a document

 

Use fast or standard text extraction mode

 

Extract text from password protected documents

 

Fetch formatted text from within emails & attachments programmatically

 

Drive out text from single or multiple pages of OneNote document

 

Pull out text from simple PDF file or a PDF Portfolio document

 

Get data from the forms in a PDF document

 

Draw out text from specific PowerPoint slide

 

Obtain formatted text from PowerPoint presentation

 

Extract raw or formatted text from Cells, Rows and Columns from Excel spreadsheet

 

Gather raw or formatted text from Word document

 

Get formatted table from Word document

 

Get text from Word document in HTML format

 

Pull out single sentence or whole text from EPUB, CHM, Markdown & FB2 files

 

Excerpt table of content from EPUB & CHM documents

 

Excerpt highlighted text from documents

 

Pull out text with its content structure intact

 

Retrieve text area from documents for text analysis

 

Obtain metadata from supported document formats

 

Draw out all or selected images from supported formats

 

Rotate extracted image(s)

 

Extract text from files within zip archives & OST containers

 

Fetch data from Email container (Exchange Web Server, POP3, IMAP)

 

Take out text from Database containers in fast, reliable and efficient manner

 

Find simple text, whole word & regular expression within documents

 

Look for simple text and regular expression in EPUB & FB2 files

 

Search & extract highlighted expressions in documents

 

Pull out text with plain text formatter (simple & ASCII)

 

Carry out custom formatting with edges, angles, and intersections to format plain text

 

Fetch & format text with Markdown formatter

 

Apply Markdown formatter to font, hyperlinks, headings, lists & tables

 

Get text with HTML formatter

 

Apply HTML Formatter to paragraph, hyperlink, font, headings, lists & tables

Extracting Text from a Document

Extracting text from a document using GroupDocs.Parser for Java API is a simple task and can be achieved with few lines of code.

Support and Learning Resources

 

GroupDocs.Parser offers document parsing APIs for other popular development environments as listed below: