Extract Tables from Excel, Word, PDF & PowerPoint Documents via C#.NET API

GroupDocs.Parser .NET API allows programmers to extract tables from PDF, DOC, DOCX, PPT, PPTX, EML, MSG, XLS, XLSX, CSV, ODT, RTF & EPUB documents or pages.


Download Free Trial

How to Extract Tables from DOC files via .NET API?

Table is the collection of cells arranged in rows and columns. Tables play a very important role in storing as well as organizing detailed or complicated data allowing the users to easily read and view it. Tables can be used in many ways, such as making lists, comparing information, align data, group information, highlight trends or patterns in data and many more. GroupDocs.Parser for .NET is a useufly API that allows software programmers to develop solution for extracting tables, text and images from various kinds of supported documents formats, such as such as PDF, Emails, Ebooks, Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), Emails (EML, MSG) formats and many more. The .NET API has included several important features for working with tables, such as extract all tables from a documents, extract table from a particular page, get table cell data, get total number of a table rows and columns, get row height, print data of a table and may more.

Extract tables from DOC in .NET

GroupDocs.Parser for .NET makes it easy for C# developers to extract tables from a DOC file by implementing a few easy steps.

How to extract tables from DOC file using C# example code

// Extract tables from DOC file using GroupDocs.Parser API
// Create an instance of Parser class
using (Parser parser = new Parser(filePath)) {
    // Check if the document supports table extraction
    if (!parser.Features.Tables) {
        Console.WriteLine("Document isn't supports tables extraction.");
        return;
    }
    // Create the layout of tables
    TemplateTableLayout layout = new TemplateTableLayout(
        new double[] { 50, 95, 275, 415, 485, 545 },
        new double[] { 325, 340, 365, 395 });
    // Create the options for table extraction
    PageTableAreaOptions options = new PageTableAreaOptions(layout);
    // Extract tables from the document.
    IEnumerable<PageTableArea> tables = parser.GetTables(options);
    // Iterate over tables
    foreach (PageTableArea t in tables) {
        // Iterate over rows
        for (int row = 0; row < t.RowCount; row++) {
            // Iterate over columns
            for (int column = 0; column < t.ColumnCount; column++) {
                // Get the table cell
                PageTableAreaCell cell = t[row, column];
                if (cell != null) {
                    // Print the table cell text
                    Console.Write(cell.Text);
                    Console.Write(" | ");
                }
            }
            Console.WriteLine();
        }
        Console.WriteLine();
    }
}

System Requirements

GroupDocs.Parser for .NET APIs are supported on all major platforms and operating systems. Before executing the code below, please make sure that you have the following prerequisites installed on your system.

  • Operating Systems: Microsoft Windows, Linux, MacOS
  • Development Environments: Microsoft Visual Studio, Xamarin, MonoDevelop
  • Frameworks
  • Download the latest version of GroupDocs.Parser for .NET from Nuget

Why Use GroupDocs.Parser for .NET

  • Plain text extraction support from any supported documents
  • Documents parsing via user-defined templates
  • Fully support structured text extraction
  • Text searching via keyword as well as regular expression
  • Extract formatted text, metadata, images, containers, and attachments
  • Extract table of contents for some supported document formats
  • Parse form data from PDF documents
  • Extract hyperlinks from the document

Extract Tables From Other Document Formats

.NET documents parse & table scanning API for file formats and images. Extract data for some of the popular file formats as stated below.

DOCM

(Microsoft Word 2007 Marco File)

DOCX

(Office 2007+ Word Document)

DOT

(Microsoft Word Template Files)

DOTM

(Microsoft Word 2007+ Template File)

DOTX

(Microsoft Word Template File )

EPUB

(Open eBook File)

HTML

(Hyper Text Markup Language)

MHT

(MHTML Web Archive)

MHTML

(Web Page Archive Format)

ODP

(OpenDocument Presentation Format)

ODS

(OpenDocument Spreadsheet)

ODT

(OpenDocument Text File Format)

ONE

(OneNote Document)

OTP

(OpenDocument Standard Format)

OTT

(OpenDocument Standard Format)

PDF

(Portable Document Format)

Back to top
 English