GroupDocs.Parser for .NET

Parse EPUB documents using C#

Efficiently extract text, metadata, tables, and images from PDF, Word, Excel, and image files using GroupDocs.Parser in your .NET projects.

NuGet Download

Start Free Trial

Steps to extract data from Epub in C#

Follow these steps to parse content from EPUB documents in your .NET apps using GroupDocs.Parser:

Load the EPUB document using a Parser instance.
Extract the desired content such as text, tables, or metadata.
Verify that the extracted data is valid.
Use the parsed output in your downstream processing, automation, or business systems.

Copy

// Load your document into Parser
using (Parser parser = new Parser("input.epub")) {

    // Extract all text content from the file
    using (TextReader reader = parser.GetText()) 
    {
        // If the text is unavailable, the result will be null
        // Use the extracted text in your application
        Console.WriteLine(reader == null ? 
            "Text extraction is unsupported for this format" : reader.ReadToEnd());
    }
}

dotnet add package GroupDocs.Parser

click to copy

copied

Comprehensive document parsing capabilities

GroupDocs.Parser enables more than just text reading — it supports barcode extraction, image parsing, metadata access, and structured data processing for advanced automation and data analysis.

Document content extraction and parsing capabilities

Support for diverse file content types

Extract data including text, images, tables, and fields from document formats like PDF, Word, Excel, HTML, and more.

Work with both scanned and digital files

Parse data from scanned documents and born-digital files alike, with support for OCR and layout-aware extraction.

Configurable extraction parameters

Adjust parsing logic with flexible options like page range selection, region targeting, and field detection templates.

How to parse PDF using templates

This example shows how to extract structured data from a PDF using a predefined parsing template with GroupDocs.Parser.

C#

//  Load the PDF file with the Parser class
using (Parser parser = new Parser("input.pdf"))
{
    // Parse the document by the template
    DocumentData data = parser.ParseByTemplate(GetTemplate());

    // Check if form extraction is supported
    if (data == null)
    {
        return;
    }

    // Process obtained fields
    for (int i = 0; i < data.Count; i++)
    {
        Console.Write(data[i].Name + ": ");
        PageTextArea area = data[i].PageArea as PageTextArea;
        Console.WriteLine(area == null ? "Not a template field" : area.Text);
    }
}

private static Template GetTemplate()
{
    // Create detector parameters for 'Details' table
    TemplateTableParameters detailsTableParameters = 
        new TemplateTableParameters(new Rectangle(new Point(35, 320), new Size(530, 55)), null);

    TemplateItem[] templateItems = new TemplateItem[]
    {
        new TemplateTable(detailsTableParameters, "details", null)
    };

    Template template = new Template(templateItems);
    return template;
}

About GroupDocs.Parser for .NET API

GroupDocs.Parser is a feature-rich document parsing API designed for .NET developers. It supports extracting plain and structured text, metadata, images, tables, and barcodes from popular formats like PDF, DOCX, XLSX, PPTX, and more — all without additional software dependencies.

Learn more

Ready to get started?

Download GroupDocs.Parser for free or get a trial license for full access!

NuGet Download

Start Free Trial

Useful resources

Explore documentation, code samples, and community support to enhance your experience.

Supported formats for data extraction

GroupDocs.Parser enables parsing across a broad set of document and image formats. Explore the supported file types commonly used in data extraction workflows.

Parse PDF
(Portable Document Format)
Parse DOCX
(Office 2007+ Word Document)
Parse PPTX
(Open XML presentation Format)
Parse XLSX
(Open XML Workbook)
Parse TXT
(Text file)
Parse RTF
(Rich Text Format)
Parse XML
(eXtensible Markup Language)