GroupDocs.Parser for Java

Extract images from PDF using Java

Retrieve embedded images from files such as PDF, Word, Excel, and more using GroupDocs.Parser in your Java development environment.

Maven Download

Start Free Trial

How to extract images from Pdf in Java

Follow these steps to extract images from PDF documents using GroupDocs.Parser in your Java application:

Create a Parser instance and load the PDF file.
Extract image data from the loaded document.
Use or export the extracted images as needed.

Copy

// Initialize parser and load the document with images using Parser
try (Parser parser = new Parser("input.pdf"))
{
    // Collect all image elements embedded in the document
    Iterable<PageImageArea> images = parser.getImages();

    // Skip processing if the document has no images
    if (images == null) {
        return;
    }

    // Handle each image as required
    for (PageImageArea image : images) {
        System.out.println(String.format("Page: %d, R: %s, Type: %s", image.getPage().getIndex(), 
            image.getRectangle(), image.getFileType()));
    }
}

<dependencies>
<dependency>
<groupId>com.groupdocs</groupId>
<artifactId>groupdocs-parser</artifactId>
<version>24.9</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>repository.groupdocs.com</id>
<name>GroupDocs Repository</name>
<url>https://repository.groupdocs.com/repo/</url>
</repository>
</repositories>

click to copy

copied

More document parsing capabilities

In addition to image extraction, GroupDocs.Parser allows you to extract raw content like text, links, metadata, and structured data for processing and analysis.

Extract images and content from documents

Works with a variety of formats

Extract images from different document types including PDF, DOCX, PPTX, XLSX, and image formats like PNG, JPEG, and GIF.

Maintain image clarity and resolution

All extracted images retain their original resolution and file type to ensure consistent quality and usability.

Flexible configuration options

Customize the image extraction process by filtering images by type, size, page index, or file format.

Extract and save images from PDF files

This example shows how to extract images from a PDF document and save them individually on your device.

Java

//  Use Parser to open the PDF file
try (Parser parser = new Parser("input.pdf"))
{
    // Get the images from the document content
    Iterable<PageImageArea> images = parser.getImages();

    // Set output parameters like format (e.g., JPEG or PNG)
    ImageOptions options = new ImageOptions(ImageFormat.Png);

    // Save extracted images to a local directory
    int imageNumber = 0;
    for (PageImageArea image : images)
    {
        image.save(Constants.getOutputFilePath(String.format("%d.png", imageNumber)), options);
        imageNumber++;
    }
}

What is GroupDocs.Parser for Java?

GroupDocs.Parser is a feature-rich parsing API tailored for Java developers. It enables the extraction of images, text, links, and structured elements from various file formats including DOCX, XLSX, PDF, PNG, JPG, and many others — all without needing external libraries or applications.

Learn more