GroupDocs.Parser for .NET

Extract hyperlinks from XML using C#

Easily detect and extract URLs and hyperlinks from PDF, Word, Excel, and other document types using GroupDocs.Parser in your .NET applications.

NuGet Download

Start Free Trial

Steps to extract hyperlinks from Xml in C#

GroupDocs.Parser enables .NET developers to extract hyperlinks from XML files by following these simple steps:

Load the XML file using a Parser instance.
Check if the document supports hyperlink extraction.
Retrieve the list of hyperlinks from the document.
Loop through the results and work with the extracted URLs.

Copy

// Load the document containing hyperlinks using the Parser class
using (Parser parser = new Parser("input.xml")) {

    // Verify that the file supports hyperlink extraction
    if (!parser.Features.Hyperlinks)
    {
        Console.WriteLine("Hyperlink extraction is not available for the file");
        return;
    }

    // Retrieve and process the extracted hyperlinks
    IEnumerable<PageHyperlinkArea> hyperlinks = parser.GetHyperlinks();

    foreach (PageHyperlinkArea h in hyperlinks)
    {
        Console.WriteLine(h.Text);
        Console.WriteLine(h.Url);
    }
}

dotnet add package GroupDocs.Parser

click to copy

copied

Advanced document parsing capabilities

In addition to hyperlink extraction, GroupDocs.Parser allows you to extract text, metadata, images, and structured data—supporting powerful data processing workflows.

Hyperlink detection and document parsing

Hyperlink detection from documents

Quickly extract URLs and link annotations from documents like PDFs, Word files, spreadsheets, and more.

Support for web and embedded links

Detect and extract both standard web URLs and embedded document links across multiple formats.

Flexible parsing options

Customize extraction settings for scanning specific sections or pages to improve performance and accuracy.

How to extract hyperlinks from a PDF using link options

This code example shows how to extract all hyperlinks from a PDF file using custom options.

C#

//  Initialize the Parser with the PDF document
using (Parser parser = new Parser("input.docx"))
{
    // Check if hyperlink extraction is supported
    if (!parser.Features.Hyperlinks)
    {
        return;
    }

    // Set link extraction options to narrow results
    PageAreaOptions options = new PageAreaOptions(new Rectangle(new Point(380, 90), new Size(150, 50)));

    // Extract hyperlink data from the document
    IEnumerable<PageHyperlinkArea> hyperlinks = parser.GetHyperlinks(options);

    // Handle the list of extracted links
    foreach (PageHyperlinkArea h in hyperlinks)
    {
        Console.WriteLine(h.Text);
        Console.WriteLine(h.Url);
    }
}

About GroupDocs.Parser for .NET API

GroupDocs.Parser is a versatile document parsing API for .NET developers. It supports extracting hyperlinks, text, images, and structured content from various file formats such as PDF, Word, Excel, HTML, and more—without relying on external software.

Learn more