.NET API to Extract Document Data

‎Extract images, raw or formatted text and metadata from documents, spreadsheets, presentations, emails & archives from within .NET apps.‎

Download Free Trial

GroupDocs.Parser for .NET is a text, metadata and image extractor API for business applications developed using C#, ASP.NET, and other .NET technologies. It supports extraction of raw, formatted & structured text as well as metadata from the files of supported formats. Through GroupDocs.Parser for .NET, your applications can also perform parsing of password protected documents for popular formats, such as Word processing documents, Excel spreadsheets, PowerPoint presentations, OneNote, PDF files and ZIP archives.

GroupDocs.Parser for .NET Features

Statistically Count Word Occurrence in Single or Multiple Files

Extract Text and Metadata from Excel Worksheets and Presentation Templates

Extract Text Content from a File or Stream without Installing Document Reader

Get Formatted Text from a Document using Fast or Standard Text Extraction Mode

Detect the Media Type of Password Protected XML Documents & Pull Text from them

Programmatically Get Formatted Text from Within Emails & Attachments

Draw Out Text from Single or Multiple Pages of OneNote Document

Extract Data from PDF, MS Word, Excel and Presentation Documents‎

Extract Data from the PDF Forms & Take Out Text from Simple PDF File or a PDF Portfolio Document

Get Formatted Text from PowerPoint Presentation or Drive out Text from Specific Slide

Gather Raw or Formatted Text from Cells, Rows, and Columns from Excel Spreadsheet

Extract Raw or HTML Formatted Text from Word Document

HTML Formatter Supports Formatting of Paragraph, Hyperlink, Font, Headings, Lists & Tables

Pull Out Single Sentence or Whole Text from EPUB, CHM, Markdown & FB2 Files

Excerpt Table of Contents from Databases, PDF, EPUB, CHM & Word Processing Documents

Pull Out Text with its Content Structure Intact & Excerpt Highlighted Text from Documents

Obtain Text Area from Documents for Analysis & Draw out Metadata from Supported Document Formats

Obtain All or Selected Images from Supported Formats & Rotate Extracted Image(s)

Take Out Text from Files within Zip Archives & OST Containers & Detect file types of ZIP Container Items

Get Data from Email Container (Exchange Web Server, POP3, IMAP)

Search Simple Text, Whole Word & Regular Expression within Documents

Prepare Document Template, Extract Data from Document and Analyze Data Fields & Tables

Search and Extract Highlighted Expressions in Documents

Get Text with Plain Text Formatter (Simple & ASCII) or with Markdown Formatter

Markdown Formatter Supports Formatting of Font, Hyperlinks, Headings, Lists & Tables

Perform Custom Formatting with Edges, Angles, and Intersections to Format Plain Text

Move Table Layout & Detect Tables in a Rectangular Area by Column Separators

Extract Text from Shapes, WordArt Objects & Text Boxes within Microsoft Office File Formats

Extract Images to Files – Save to JPG, PNG, GIF, BMP, PNG or WEBP Formats

Extracting Text from a Document

Using GroupDocs.Parser for .NET API to extract text from a document is simple and achieved with just a few lines of code:

// Create an instance of Parser class
using(Parser parser = new Parser("sample.docx"))
{
  // Extract text into the reader
  using(TextReader reader = parser.GetText())
  {
    // Print text from the document
    // If text extraction isn't supported, reader is null
    Console.WriteLine(reader == null ? "Text extraction isn't supported." : reader.ReadToEnd());
  }
}

Support and Learning Resources

GroupDocs.Parser offers document viewing APIs for other popular development environments