Parser for Java

Parse documents
via Java API

Extract data from documents and images on any platform using our flexible APIs and app based solutions for programmers and end-users.

Free Maven Download Licensing

Version 23.11 released

See what’s new

// Create an instance of Parser class
try (Parser parser = new Parser(fileName)) {
    // Extract a text into the reader
    try (TextReader reader = parser.getText()) {
        // Print a text from the document
        System.out.println(reader == null 
                ? "" 
                : reader.readToEnd());
    }
}

<dependency>
<groupId>com.groupdocs</groupId>
<artifactId>groupdocs-parser</artifactId>
<version>23.11</version>
</dependency>

GroupDocs.Parser Overview

API for performing document parsing in Java applications

Extract data from documents

Java API enables you to retrieve text, metadata, and images from a wide range of file formats such as Office documents, emails, attachments, and archives. This powerful tool helps you efficiently access and process valuable information contained within these files for various applications like data analysis, search engine indexing, or content management systems.

Parse documents

Extract various elements such as hyperlinks, tables, QR codes, barcodes and data from PDF forms. Also parse any desired information from documents using custom templates.

Customizing results

Java API enables you to retrieve data in various formats such as raw, structured, HTML, or Markdown. Additionally, API offers a search functionality for locating specific words or phrases within the text of documents.

Platform independence

GroupDocs.Parser for Java supports the following operating systems, frameworks and package managers

Supported file formats

GroupDocs.Parser for Java supports operations with the following file formats.

Microsoft Office formats

Word: DOCX, DOC, DOCM, DOT, DOTX, DOTM, RTF
Excel: XLSX, XLS, XLSM, XLSB, XLTM, XLT, XLTM, XLTX, XLAM, SXC, SpreadsheetML
PowerPoint: PPT, PPTX, PPS, PPSX, PPSM, POT, POTM, POTX, PPTM

Images & Other Formats

Portable: PDF
Images: JPG, BMP, PNG, TIFF, GIF, DICOM, WEBP
Other office formats: ODT, OTT, OTS, ODS, ODP, OTP, ODG

Other formats

Web: HTML, MHTML
Archives: ZIP, TAR, 7Z
Ebooks: CHM, EPUB, FB2, MOBI

GroupDocs.Parser features

Extract data from PDFs, Office Documents, and Images swiftly and accurately.

Extract text

Extract textual information from various file formats such as office documents, PDF files and images for easy readability and analysis.

Extract images

Retrieve visual content from diverse sources like office documents, PDF files for convenient access and use.

Scan QR Codes

Detect and decode QR codes present within office documents, PDF files, or visual content for efficient information retrieval.

Extract data from email attachments and archives

Gather valuable information from email messages, file attachments, and compressed data sources for effective analysis and utilization.

Extract tables

Identify and extract tabular data from PDF documents for organized analysis and use.

Extract hyperlinks

Locate and extract hyperlinks and email addresses within office documents or PDF files for efficient access .

Parse PDF Forms

PDF Forms are digital documents featuring fillable fields for user interaction, allowing them to input information electronically. Java API can be utilized to extract data from these forms for efficient processing.

Parse data by templates

Create custom templates and utilize them with Java API to parse specific information from PDF files, simplifying data extraction processes.

Search a text in documents

Quickly locate specific words or patterns within documents.

Code sample

Some use cases of typical GroupDocs.Parser for Java operations

Extract images from PDF documents

Java API makes it easy for Java developers to extract images from documents by implementing a few easy steps.

Extract images from PDF documents in Java

// Create an instance of Parser class
try (Parser parser = new Parser(fileName)) {
    // Extract images
    Iterable<PageImageArea> images = parser.getImages();
    // Check if images extraction is supported
    if (images != null) {
        int imageIndex = 0;
        // Iterate over images
        for (PageImageArea image : images) {
            // Save the image to the file
            image.save(String.format("%s%s", imageIndex, image.getFileType().getExtension()));
        }
    }
}

Extract barcodes from images

Java API makes it easy for Java developers to extract barcodes from documents by implementing a few easy steps.

Extract barcodes from images

// Create an instance of Parser class
try (Parser parser = new Parser(fileName)) {
    // // Check if the file supports barcode extracting
    if (!parser.getFeatures().isBarcodes()) {
        // Extract barcodes from the file.
        Iterable<PageBarcodeArea> barcodes = parser.getBarcodes();
        // Iterate over barcodes
        for (PageBarcodeArea barcode : barcodes) {
            // Print the page index
            System.out.println("Page: " + barcode.getPage().getIndex());
            // Print the barcode value
            System.out.println("Value: " + barcode.getValue());
        }
    }
}

Parse documents
via Java API

Extract text from PDF files in Java

GroupDocs.Parser Overview

Extract data from documents

Parse documents

Customizing results

Platform independence

Supported file formats

Microsoft Office formats

Images & Other Formats

Other formats

GroupDocs.Parser features

Code sample

Extract images from PDF documents

Extract images from PDF documents in Java

Extract barcodes from images

Extract barcodes from images

Ready to get started?

Parse documentsvia Java API

Extract text from PDF files in Java

GroupDocs.Parser Overview

Extract data from documents

Parse documents

Customizing results

Platform independence

Supported file formats

Microsoft Office formats

Images & Other Formats

Other formats

GroupDocs.Parser features

Code sample

Extract images from PDF documents

Extract images from PDF documents in Java

Extract barcodes from images

Extract barcodes from images

Ready to get started?

Parse documents
via Java API