GroupDocs.Parser for Java

Retrieve table data from XML using Java

Seamlessly detect and extract tables from formats like PDF, DOCX, and XLSX with GroupDocs.Parser in your Java workflows.

How to retrieve tables from Xml in Java

To parse tables from XML documents using GroupDocs.Parser, follow these easy steps in your Java environment:

  1. Create a Parser instance and load the target XML file.
  2. Verify that the file supports structured table extraction.
  3. Use the API to retrieve table elements from the document.
  4. Leverage the extracted data in analytics, reporting, or automation systems.
// Load the input document with Parser that includes table elements
try (Parser parser = new Parser("input.xml"))
{
    // Verify that the document type allows table recognition
    if (!parser.getFeatures().isTables()) {
        System.out.println("Add logic for files that don’t support tables");
        return;
    }

    // Define rules for interpreting table structure
    TemplateTableLayout layout = new TemplateTableLayout(
            java.util.Arrays.asList(new Double[]{50.0, 95.0, 275.0, 415.0, 485.0, 545.0}),
            java.util.Arrays.asList(new Double[]{325.0, 340.0, 365.0, 395.0}));

    // Set parameters to extract tables
    PageTableAreaOptions options = new PageTableAreaOptions(layout);

    //  Run table extraction on the loaded document
    Iterable<PageTableArea> tables = parser.getTables(options);

    //  Process each extracted table from the result
    for (PageTableArea t : tables) 
    {
    }
}
<dependencies> <dependency> <groupId>com.groupdocs</groupId> <artifactId>groupdocs-parser</artifactId> <version>24.9</version> </dependency> </dependencies> <repositories> <repository> <id>repository.groupdocs.com</id> <name>GroupDocs Repository</name> <url>https://repository.groupdocs.com/repo/</url> </repository> </repositories>
click to copy
copied
More examples Documentation

Advanced content extraction tools

Beyond reading tables, GroupDocs.Parser supports capturing plain text, visual elements, embedded metadata, and structured objects to enhance document processing tasks.

Extracting structured content and tabular data

Precise table parsing across formats

Support for extracting tables from standard document types like PDF, Word, Excel, and HTML with high accuracy.

Read tabular structures from diverse sources

Retrieve table data from spreadsheets, documents, and reports while preserving the structure and alignment.

Customizable table extraction settings

Control layout detection, manage headers and footers, and fine-tune extraction with flexible configuration options.

Sample: extract tables from an Excel document

This example shows how to extract and loop through table content in an Excel (XLSX) file using GroupDocs.Parser.

Java

//  Initialize Parser with the Excel file
try (Parser parser = new Parser("input.pdf"))
{
    // Exit if table extraction isn’t supported for this document
    if (!parser.getFeatures().isTables())
    {
        return;
    }

    // Apply rules to locate table layout
    TemplateTableLayout layout = new TemplateTableLayout(
            java.util.Arrays.asList(new Double[]{50.0, 95.0, 275.0, 415.0, 485.0, 545.0}),
            java.util.Arrays.asList(new Double[]{325.0, 340.0, 365.0, 395.0}));

    // Configure settings for table extraction
    PageTableAreaOptions options = new PageTableAreaOptions(layout);

    // Invoke the extraction process
    Iterable<PageTableArea> tables = parser.getTables(options);

    // Loop over all parsed table structures
    for (PageTableArea t : tables)
    {
        // Iterate over each row within the table
        for (int row = 0; row < t.getRowCount(); row++)
        {
            // Process each cell in the current row
            for (int column = 0; column < t.getColumnCount(); column++) 
            {
                // Access and read the current cell's content
                PageTableAreaCell cell = t.getCell(row, column);
                if (cell != null)
                {
                    // Output the textual value of each table cell
                    System.out.print(cell.getText());
                    System.out.print(" | ");
                }
            }
        }
    }
}

Introduction to GroupDocs.Parser for Java API

GroupDocs.Parser is a feature-rich content extraction API for Java platforms. It allows developers to accurately parse tables, text, graphics, links, and structured data from PDFs, Word documents, Excel sheets, PowerPoint presentations, and more—without requiring third-party plugins.
Learn more
About illustration

Ready to get started?

Download GroupDocs.Parser for free or get a trial license for full access!

Useful resources

Explore documentation, code samples, and community support to enhance your experience.

Document types supported for table extraction

GroupDocs.Parser provides reliable table detection across multiple file types. Here’s a list of the most widely supported document formats for extracting tables.

Temporary license tips

1
Sign up with your work email.
Free mail services are not allowed.
2
Use Get a temporary license button on the second step.
 English