Extract text from PDF In C#

Extract text from PDF with a few lines of .NET code.


Download Free Trial

How to extract a text from PDF files .NET API?

GroupDocs.Parser for .NET is a text, metadata and image extractor API for business applications developed using C#, ASP.NET, and other .NET technologies. It supports extraction of raw, formatted & structured text as well as metadata from the files of supported formats. Through GroupDocs.Parser for .NET, your applications can also perform parsing of password protected documents for popular formats, such as Word processing documents, Excel spreadsheets, PowerPoint presentations, OneNote, PDF files and ZIP archives.

GroupDocs.Parser API is a right choice for corporate solutions which needs file text extraction feature. These APIs are well supported on all major operating systems and platforms including Frameworks: .NET Framework, .NET Standard, .NET Core, Mono.

Extract text from PDF in .NET

GroupDocs.Parser for .NET makes it easy for C# developers to extract a text from a PDF file by implementing a few easy steps.

  • Instantiate Parser object for the initial document;
  • Call GetText method and obtain TextReader object;
  • Check if reader isn’t null (text extraction is supported for the document);
  • Read a text from reader.

How to extract text from PDF file using C# example code

// Extract text from PDF file using GroupDocs.Parser API
// Create an instance of Parser class
using (Parser parser = new Parser(filePath)) {
    // Extract a text into the reader
    using (TextReader reader = parser.GetText()) {
        // Print a text from the document
        // If text extraction isn't supported, a reader is null
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

System Requirements

GroupDocs.Parser for .NET APIs are supported on all major platforms and operating systems. Before executing the code below, please make sure that you have the following prerequisites installed on your system.

  • Operating Systems: Microsoft Windows, Linux, MacOS
  • Development Environments: Microsoft Visual Studio, Xamarin, MonoDevelop
  • Frameworks
  • Download the latest version of GroupDocs.Parser for .NET from Nuget

Why Use GroupDocs.Parser for .NET

  • Plain text extraction support from any supported documents
  • Documents parsing via user-defined templates
  • Fully support structured text extraction
  • Text searching via keyword as well as regular expression
  • Extract formatted text, metadata, images, containers, and attachments
  • Extract table of contents for some supported document formats
  • Parse form data from PDF documents
  • Extract hyperlinks from the document

Live Demos - Extract text from PDF Online

Extract text from PDF file right now by visiting GroupDocs.Parser Live Demos website. The live demo has the following benefits.

No need to download API

No need to write any code

Just upload the source file

Get download link to save the file

Extract Text From Other Document Formats

.NET documents parse & text extraction API for file formats and images. Extract data for some of the popular file formats as stated below.

PPSX

(PowerPoint Slide Show)

PPT

(Microsoft PowerPoint 97-2003)

PPTX

(Open XML presentation Format)

RTF

(Rich Text Format)

TEX

(LaTeX Source Document)

VDX

(The 7th Guest Video File)

VSDM

(Visio Macro-Enabled Drawing)

VSDX

(Visio Drawing)

VSSM

(Visio Macro-Enabled Stencil File)

VSSX

(Visio Stencil File)

VSTM

(Visio Macro-Enabled Drawing Template)

VSTX

(Visio Drawing Template)

VSX

(Visio Stencil XML File)

VTX

(Anim8or 3D Model)

XLAM

(Excel Macro-Enabled Add-In)

XLS

(Microsoft Excel Spreadsheet (Legacy))

Back to top
 English