Extract Images from PDF, DOCX, PPTX, MSG, XLSX Documents & Pages via C#.NET API

GroupDocs.Parser .NET API allows programmers to extract images from PDF, DOC, DOCX, PPT, PPTX, EML, MSG, XLS, XLSX, CSV, ODT, RTF & EPUB documents or document’s Pages.


Download Free Trial

How to Extract Images from documents via .NET?

Images can be used to deliver information in such a way that may not be expressible by words. Images help us in grabbing user’s attention and explain tough concepts with ease. Sometimes while reading documents, journals or benefiting from presentations we often found some fascinating images and wanted to download it. GroupDocs.Parser for .NET is a powerful API that help users to develop useful applications for extracting images from different types of documents and save them in PNG, JPEG, WebP, GIF, BMP and other formats. The API has included supports for text as well images extraction from some of the most commonly used file formats, such as PDF, Emails, Ebooks, Microsoft Office formats: Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), LibreOffice formats and many more. The API also fully supports documents parsing, extracting plain and structured text, text searching by keywords, extract metadata or images, containers as well as attachments and many more.

Extract images from documents in .NET

GroupDocs.Parser for .NET makes it easy for C# developers to extract images from a documents by implementing a few easy steps.

  • Instantiate Parser object for the initial document;
  • Call GetImages method and obtain collection of image objects;
  • Check if reader isn’t null (images extraction is supported for the document);
  • Iterate through the collection and get sizes, image types and image contents.

How to extract images from documents using C# example code

// Extract images from documents using GroupDocs.Parser API
// Create an instance of Parser class
using (Parser parser = new Parser(filePath)) {
    // Extract images
    IEnumerable<PageImageArea> images = parser.GetImages();
    // Check if images extraction is supported
    if (images == null) {
        Console.WriteLine("Images extraction isn't supported");
        return;
    }
    // Iterate over images
    foreach (PageImageArea image in images) {
        // Print a page index, rectangle and image type:
        Console.WriteLine(string.Format("Page: {0}, R: {1}, Type: {2}", image.Page.Index, image.Rectangle, image.FileType));
    }
}

System Requirements

GroupDocs.Parser for .NET APIs are supported on all major platforms and operating systems. Before executing the code below, please make sure that you have the following prerequisites installed on your system.

  • Operating Systems: Microsoft Windows, Linux, MacOS
  • Development Environments: Microsoft Visual Studio, Xamarin, MonoDevelop
  • Frameworks
  • Download the latest version of GroupDocs.Parser for .NET from Nuget

Why Use GroupDocs.Parser for .NET

  • Plain text extraction support from any supported documents
  • Documents parsing via user-defined templates
  • Fully support structured text extraction
  • Text searching via keyword as well as regular expression
  • Extract formatted text, metadata, images, containers, and attachments
  • Extract table of contents for some supported document formats
  • Parse form data from PDF documents
  • Extract hyperlinks from the document

Live Demos - Extract images from documents Online

Extract images from documents right now by visiting GroupDocs.Parser Live Demos website. The live demo has the following benefits.

No need to download API

No need to write any code

Just upload the source file

Get download link to save the file

Extract Images From Other Document Formats

.NET documents parse & images extraction API for file formats and images. Extract data for some of the popular file formats as stated below.

DOC

(Microsoft Word Binary Format)

DOCM

(Microsoft Word 2007 Marco File)

DOCX

(Office 2007+ Word Document)

DOT

(Microsoft Word Template Files)

DOTM

(Microsoft Word 2007+ Template File)

DOTX

(Microsoft Word Template File )

EPUB

(Open eBook File)

HTML

(Hyper Text Markup Language)

MHT

(MHTML Web Archive)

MHTML

(Web Page Archive Format)

ODP

(OpenDocument Presentation Format)

ODS

(OpenDocument Spreadsheet)

ODT

(OpenDocument Text File Format)

ONE

(OneNote Document)

OTP

(OpenDocument Standard Format)

OTT

(OpenDocument Standard Format)

PDF

(Portable Document Format)

Back to top
 English