Products
/ GroupDocs.Parser
/ Java
/ Extract images from documents

Java API to Parse & Extract Images from Excel, Word, PowerPoint, PDF & Other Document’s Pages

GroupDocs.Parser for Java API allows programmers to extract images from PDF, DOC, DOCX, PPT, PPTX, EML, MSG, XLS, XLSX, CSV, ODT, RTF & EPUB documents or document’s Pages inside Java applications.

Learn How to Extract Images from {{EXT}} Documents or a Specific Page via Java API

An Image is worth a thousand words and cannot be ignored in today’s visual world while creating engaging content. Images can be a great source of information communication as well as grabbing user’s attention. It is often needed to get images from documents, journals or presentations and use them somewhere else. GroupDocs.Parser for Java is a powerful API that helps software developers and programmers to build solution for parsing and extracting images or other information from numerous documents types. It also support saving images in PNG, JPEG, WebP, GIF, BMP and other formats. The API has included support for some popular documents formats, such as PDF, Microsoft Office formats: Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), LibreOffice formats, Emails, Ebooks, and many more. It has also included support for some advanced features related to documents parsing, extracting plain and structured text, text searching by keywords, extract metadata or images, containers as well as attachments and many more.

Extract images from documents in Java

GroupDocs.Parser for Java makes it easy for Java developers to extract images from a documents by implementing a few easy steps.

Instantiate Parser object for the initial document;
Call getImages method and obtain collection of image objects;
Check if reader isn’t null (images extraction is supported for the document);
Iterate through the collection and get sizes, image types and image contents.

Learn more about the images extraction

How to extract images from documents using Java example code

// Extract images from documents using GroupDocs.Parser API
// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleImagesPdf)) {
    // Extract images
    Iterable<PageImageArea> images = parser.getImages();
    // Check if images extraction is supported
    if (images == null) {
        System.out.println("Images extraction isn't supported");
        return;
    }
    // Iterate over images
    for (PageImageArea image : images) {
        // Print a page index, rectangle and image type:
        System.out.println(String.format("Page: %d, R: %s, Type: %s", image.getPage().getIndex(), image.getRectangle(), image.getFileType()));
    }
}

System Requirements

GroupDocs.Parser for Java APIs are supported on all major platforms and operating systems. Before executing the code below, please make sure that you have the following prerequisites installed on your system.

Operating Systems: Microsoft Windows, Linux, MacOS
Development Environments: NetBeans, Intellij IDEA, Eclipse, etc.
Frameworks
Download the latest version of GroupDocs.Parser for Java from Maven