Java API to Parse & Extract Images from Excel, Word, PowerPoint, PDF & Other Document’s Pages

GroupDocs.Parser for Java API allows programmers to extract images from PDF, DOC, DOCX, PPT, PPTX, EML, MSG, XLS, XLSX, CSV, ODT, RTF & EPUB documents or document’s Pages inside Java applications.


Download Free Trial

Learn How to Extract Images from {{EXT}} Documents or a Specific Page via Java API

An Image is worth a thousand words and cannot be ignored in today’s visual world while creating engaging content. Images can be a great source of information communication as well as grabbing user’s attention. It is often needed to get images from documents, journals or presentations and use them somewhere else. GroupDocs.Parser for Java is a powerful API that helps software developers and programmers to build solution for parsing and extracting images or other information from numerous documents types. It also support saving images in PNG, JPEG, WebP, GIF, BMP and other formats. The API has included support for some popular documents formats, such as PDF, Microsoft Office formats: Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), LibreOffice formats, Emails, Ebooks, and many more. It has also included support for some advanced features related to documents parsing, extracting plain and structured text, text searching by keywords, extract metadata or images, containers as well as attachments and many more.

Extract images from documents in Java

GroupDocs.Parser for Java makes it easy for Java developers to extract images from a documents by implementing a few easy steps.

  • Instantiate Parser object for the initial document;
  • Call getImages method and obtain collection of image objects;
  • Check if reader isn’t null (images extraction is supported for the document);
  • Iterate through the collection and get sizes, image types and image contents.

How to extract images from documents using Java example code

// Extract images from documents using GroupDocs.Parser API
// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleImagesPdf)) {
    // Extract images
    Iterable<PageImageArea> images = parser.getImages();
    // Check if images extraction is supported
    if (images == null) {
        System.out.println("Images extraction isn't supported");
        return;
    }
    // Iterate over images
    for (PageImageArea image : images) {
        // Print a page index, rectangle and image type:
        System.out.println(String.format("Page: %d, R: %s, Type: %s", image.getPage().getIndex(), image.getRectangle(), image.getFileType()));
    }
}

System Requirements

GroupDocs.Parser for Java APIs are supported on all major platforms and operating systems. Before executing the code below, please make sure that you have the following prerequisites installed on your system.

  • Operating Systems: Microsoft Windows, Linux, MacOS
  • Development Environments: NetBeans, Intellij IDEA, Eclipse, etc.
  • Frameworks
  • Download the latest version of GroupDocs.Parser for Java from Maven

Why Use GroupDocs.Parser for Java

  • Plain text extraction support from any supported documents
  • Documents parsing via user-defined templates
  • Fully support structured text extraction
  • Text searching via keyword as well as regular expression
  • Extract formatted text, metadata, images, containers, and attachments
  • Extract table of contents for some supported document formats
  • Parse form data from PDF documents
  • Extract hyperlinks from the document

Live Demos - Extract images from documents Online

Extract images from documents right now by visiting GroupDocs.Parser Live Demos website. The live demo has the following benefits.

No need to download API

No need to write any code

Just upload the source file

Get download link to save the file

Extract Images From Other Document Formats

Java documents parse & images extraction API for file formats and images. Extract data for some of the popular file formats as stated below.

DOC

(Microsoft Word Binary Format)

DOCM

(Microsoft Word 2007 Marco File)

DOCX

(Office 2007+ Word Document)

DOT

(Microsoft Word Template Files)

DOTM

(Microsoft Word 2007+ Template File)

DOTX

(Microsoft Word Template File )

EPUB

(Open eBook File)

HTML

(Hyper Text Markup Language)

MHT

(MHTML Web Archive)

MHTML

(Web Page Archive Format)

ODP

(OpenDocument Presentation Format)

ODS

(OpenDocument Spreadsheet)

ODT

(OpenDocument Text File Format)

ONE

(OneNote Document)

OTP

(OpenDocument Standard Format)

OTT

(OpenDocument Standard Format)

PDF

(Portable Document Format)

Back to top
 English