Extract Tables from Excel, Word, PDF & PowerPoint Documents via Java API

GroupDocs.Parser Java API allows programmers to extract tables from PDF, DOC, DOCX, PPT, PPTX, EML, MSG, XLS, XLSX, CSV, ODT, RTF & EPUB documents or pages.


Download Free Trial

How to Extract Tables from RTF files via Java API?

Table is the collection of cells arranged in rows and columns. Tables play a very important role in storing as well as organizing detailed or complicated data allowing the users to easily read and view it. Tables can be used in many ways, such as making lists, comparing information, align data, group information, highlight trends or patterns in data and many more. GroupDocs.Parser for Java is a useufly API that allows software programmers to develop solution for extracting tables, text and images from various kinds of supported documents formats, such as such as PDF, Emails, Ebooks, Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), Emails (EML, MSG) formats and many more. The Java API has included several important features for working with tables, such as extract all tables from a documents, extract table from a particular page, get table cell data, get total number of a table rows and columns, get row height, print data of a table and may more.

Extract tables from RTF in Java

GroupDocs.Parser for Java makes it easy for Java developers to extract tables from a RTF file by implementing a few easy steps.

How to extract tables from RTF file using Java example code

// Extract tables from RTF file using GroupDocs.Parser API
// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleInvoicePagesPdf)) {
    // Check if the document supports table extraction
    if (!parser.getFeatures().isTables()) {
        System.out.println("Document isn't supports tables extraction.");
        return;
    }
    // Create the layout of tables
    TemplateTableLayout layout = new TemplateTableLayout(
            java.util.Arrays.asList(new Double[]{50.0, 95.0, 275.0, 415.0, 485.0, 545.0}),
            java.util.Arrays.asList(new Double[]{325.0, 340.0, 365.0, 395.0}));
    // Create the options for table extraction
    PageTableAreaOptions options = new PageTableAreaOptions(layout);
    // Extract tables from the document.
    Iterable<PageTableArea> tables = parser.getTables(options);
    // Iterate over tables
    for (PageTableArea t : tables) {
        // Iterate over rows
        for (int row = 0; row < t.getRowCount(); row++) {
            // Iterate over columns
            for (int column = 0; column < t.getColumnCount(); column++) {
                // Get the table cell
                PageTableAreaCell cell = t.getCell(row, column);
                if (cell != null) {
                    // Print the table cell text
                    System.out.print(cell.getText());
                    System.out.print(" | ");
                }
            }
            System.out.println();
        }
        System.out.println();
    }
}

System Requirements

GroupDocs.Parser for Java APIs are supported on all major platforms and operating systems. Before executing the code below, please make sure that you have the following prerequisites installed on your system.

  • Operating Systems: Microsoft Windows, Linux, MacOS
  • Development Environments: NetBeans, Intellij IDEA, Eclipse, etc.
  • Frameworks
  • Download the latest version of GroupDocs.Parser for Java from Maven

Why Use GroupDocs.Parser for Java

  • Plain text extraction support from any supported documents
  • Documents parsing via user-defined templates
  • Fully support structured text extraction
  • Text searching via keyword as well as regular expression
  • Extract formatted text, metadata, images, containers, and attachments
  • Extract table of contents for some supported document formats
  • Parse form data from PDF documents
  • Extract hyperlinks from the document

Extract Tables From Other Document Formats

Java documents parse & tables extraction API for file formats and images. Extract data for some of the popular file formats as stated below.

VSDM

(Visio Macro-Enabled Drawing)

VSDX

(Visio Drawing)

VSSM

(Visio Macro-Enabled Stencil File)

VSSX

(Visio Stencil File)

VSTM

(Visio Macro-Enabled Drawing Template)

VSTX

(Visio Drawing Template)

VSX

(Visio Stencil XML File)

VTX

(Anim8or 3D Model)

XLAM

(Excel Macro-Enabled Add-In)

XLS

(Microsoft Excel Spreadsheet (Legacy))

XLSB

(Excel Binary Workbook)

XLSM

(Macro-enabled Spreadsheet)

XLSX

(Open XML Workbook)

XLT

(Excel 97 - 2003 Template)

XLTM

(Excel Macro-Enabled Template)

XLTX

(Excel Template)

Back to top
 English