GroupDocs.Parser for Java is a text, image and metadata extractor API, supporting more than 50 popular document types to help building business applications with features of parsing raw, structured & formatted text. It also supports parsing documents using predefined templates and allows extracting complex data from invoices and other typical documents with speed and accuracy. GroupDocs.Parser for Java enables you to extract text and metadata from password protected files of all popular formats including Word processing documents, Excel spreadsheets, PowerPoint presentations, OneNote, PDF files and ZIP archives.
Count Word Occurrence for Single or Multiple Documents Statistically
Extract Text and Metadata from Excel Spreadsheets and PowerPoint Presentation Templates
Fetch Text from a File or Stream, Without Installing Document Reader
Pull Out Formatted Text from a Document Using Fast or Standard Text Extraction Mode
Detect the Media Type of Password Protected XML Documents & Extract Text from Them
Fetch Formatted Text from PowerPoint Presentation, Emails & Attachments Programmatically
Drive out Text from Single or Multiple Pages of OneNote Document
Pull out Raw Text from Simple PDF File or a PDF Portfolio Document
Extract Data from PDF, MS Word, Excel and Presentation Documents
Extract Raw or Formatted Text from Cells, Rows And Columns from Excel Spreadsheet
Gather Raw or HTML Formatted Text from Word Document & Excerpt Highlighted Text from Documents
Get Data from the PDF Forms & Obtain Formatted Table From a PDF or Word Document
Pull Out Single Sentence or Whole Text from EPUB, CHM, Markdown & FB2 Files
Excerpt Table of Contents from Databases, PDF, EPUB, CHM & Word Processing Documents
Retrieve Text Area from Documents for Analysis & Pull Out text with its Content Structure Intact
Obtain Metadata from Supported Document Formats
Draw Out All or Selected Images from Supported Formats & Rotate Extracted Image(s)
Extract Text from Files within Zip Archives & OST Containers – Detect Media Types for Zip Container Items
Fetch Data from Email Container (Exchange Web Server, POP3, IMAP)
Take Out Text from Database Containers in Fast, Reliable and Efficient Manner
Find Simple Text, Whole Word & Regular Expression within Documents
Prepare Document Template, Extract Data from Document and Analyze Data Fields & Tables
Search & Extract Highlighted Expressions in Documents
Pull out Text with Plain Text Formatter (Simple & ASCII) or Custom Formatting with Edges, Angles, & Intersections
Fetch & Format Text (Font, Hyperlinks, Headings, Lists & Tables) with Markdown Formatter
Get Text with HTML Formatter & Apply Formatter to Paragraph, Hyperlink, Font, Headings, Lists & Tables
Move Table Layout & Detect Tables in a Rectangular Area by Column Separators
Extract Text from Shapes, WordArt Objects & Text Boxes within Microsoft Office File Formats
Extract Images to Files – Save to JPG, PNG, GIF, BMP, PNG or WEBP Formats
Extract Text from Email Servers and Databases via JDBC
With GroupDocs.Parser for Java, you can apply various formatters to the Text and HTML. You can pull text with Plain Text Formatter for both Simple and ASCII. You can also get Text with HTML Formatter and apply formatting to paragraph, hyperlink, font, headings, lists and tables.