.NET API to Extract Document Data

Extract images, raw or formatted text and metadata from documents, spreadsheets, presentations, emails & archives from within .NET apps.

  Download Free Trial
.NET Text extraction API

GroupDocs.Parser for .NET

 

GroupDocs.Parser for .NET is a text extractor API for business applications developed using C#, ASP.NET, and other .NET technologies. It supports extraction of raw, formatted & structured text as well as metadata from the files of supported formats. Through GroupDocs.Parser for .NET, your applications can also perform parsing of password protected documents for popular formats, such as spreadsheets, presentations, PDFs, ZIP archives and more.

Previous Next

GroupDocs.Parser for .NET Features

 

 

Statistically Count Word Occurrence in Single or Multiple Files

 

Extract Text Content from a File or Stream without Installing Document Reader

 

Get Formatted Text from a Document using Fast or Standard Text Extraction Mode

 

Detect the Media Type of Password Protected XML Documents & Pull Text from Them

 

Programmatically Get Formatted Text from Within Emails & Attachments

 

Draw Out Text from Single or Multiple Pages of OneNote Document

 

Take Out Text from Simple PDF File or a PDF Portfolio Document

 

Extract Data from the PDF Forms & Obtain Formatted Table from a PDF or Word Document

 

Get Formatted Text from PowerPoint Presentation or Drive out Text from Specific Slide

 

Gather Raw or Formatted Text from Cells, Rows, and Columns from Excel Spreadsheet

 

Extract Raw or HTML Formatted Text from Word Document

 

HTML Formatter Supports Formatting of Paragraph, Hyperlink, Font, Headings, Lists & Tables

 

Pull Out Single Sentence or Whole Text from EPUB, CHM, Markdown & FB2 Files

 

Excerpt Table of Content from EPUB & CHM Documents

 

Pull Out Text with its Content Structure Intact & Excerpt Highlighted Text from Documents

 

Obtain Text Area from Documents for Analysis & Draw out Metadata from Supported Document Formats

 

Obtain All or Selected Images from Supported Formats & Rotate Extracted Image(s)

 

Take Out Text from Files within Zip Archives & OST Containers & Extract Text from Database Containers

 

Get Data from Email Container (Exchange Web Server, POP3, IMAP)

 

Search Simple Text, Whole Word & Regular Expression within Documents

 

Search and Extract Highlighted Expressions in Documents

 

Get Text with Plain Text Formatter (Simple & ASCII) or with Markdown Formatter

 

Markdown Formatter Supports Formatting of Font, Hyperlinks, Headings, Lists & Tables

 

Perform Custom Formatting with Edges, Angles, and Intersections to Format Plain Text

Extracting Text from a Document

Using GroupDocs.Parser for .NET API to extract text from a document is simple and achieved with just a few lines of code.

Extracting Text from a Document using C#


string doc = "sample.docx";
// Extract text from the file
Console.WriteLine(Extractor.Default.ExtractText(doc));
// Extract text from the stream
using(Stream stream = File.OpenRead(doc))
{
   Console.WriteLine(Extractor.Default.ExtractText(stream));
}

Support and Learning Resources

 

GroupDocs.Parser offers document parsing APIs for other popular development environments as listed below: