.NET API to Extract Document Data

Extract Text (Raw & Formatted) and Metadata from Documents, Spreadsheets, Presentations, Emails & ZIP Archives

  Download Free Trial
.NET Text extraction API

GroupDocs.Parser for .NET


GroupDocs.Parser for .NET is a text extractor API for business applications developed using C#, ASP.NET, and other .NET technologies. It supports extraction of raw, formatted & structured text as well as metadata from the files of supported formats. Through GroupDocs.Parser for .NET, your applications can also perform parsing of password protected documents for popular formats, such as, spreadsheets, presentations, PDFs, ZIP archives and more.

Previous Next

GroupDocs.Parser for .NET Features



Statistically count word occurrence in single or multiple files


Extract text content, without installing document reader


Draw out text from a file or stream


Get formatted text from a document


Choose between fast or standard text extraction mode


Pull text from password protected documents


Programmatically get formatted text from within emails & attachments


Draw out text from single or multiple pages of OneNote document


Take out text from simple PDF file or a PDF Portfolio document


Extract data from the forms in a PDF document


Drive out text from a specific PowerPoint slide


Get formatted text from PowerPoint presentation


Gather raw or formatted text from Cells, Rows and Columns of an Excel spreadsheet


Extract raw or formatted text from Word document


Obtain formatted table from Word document


Get text from Word document in HTML format


Pull out single sentence or whole text from EPUB, CHM, Markdown & FB2 files


Excerpt table of content from EPUB & CHM documents


Excerpt highlighted text from documents


Pull out text with its content structure intact


Obtain text area from documents for text analysis


Draw out metadata from supported document formats


Obtain all or selected images from supported formats


Rotate extracted image(s)


Take out text from files within zip archives & OST containers


Get data from Email container (Exchange Web Server, POP3, IMAP)


Extract text from Database containers in fast, reliable and efficient manner


Search simple text, whole word & regular expression within documents


Search simple text and regular expression in EPUB & FB2 files


Search and extract highlighted expressions in documents


Get text with plain text formatter (simple & ASCII)


Perform custom formatting with edges, angles, and intersections to format plain text


Draw out text with Markdown formatter


Markdown formatter supports formatting of font, hyperlinks, headings, lists & tables


Obtain text with HTML formatter


HTML Formatter supports formatting of paragraph, hyperlink, font, headings, lists & tables

Extracting Text from a Document

Using GroupDocs.Parser for .NET API to extract text from a document is simple and achieved with just few lines of code.

Extracting Text from a Document using C#

string doc = "sample.docx";
// Extract text from the file
//Extract text from the stream
using(Stream stream = File.OpenRead(doc))

Support and Learning Resources


GroupDocs.Parser offers document parsing APIs for other popular development environments as listed below: