Products
/ GroupDocs.Parser
/ Java
/ Extract hyperlinks from PST

Java API to Extract Hyperlinks from Documents, Pages or Particular page Area

GroupDocs.Parser for Java API makes developers job easy by allowing them to extract hyperlinks from documents, document’s page or specific page Area of PDF, DOCX, PPTX, EML, MSG, XLS, XLSX, CSV, RTF, EPUB and many more.

How to Parse & Extract Hyperlinks from PST documents via Java API?

A hyperlink is a piece of text or an image or icon that points to an entire document or to a particular part within a document. The use of hyperlinks allows users to navigate to a web page or document. It is often required to extract hyperlinks from a document and use it to access external document or webpage. GroupDocs.Parser for Java is a fascinating document text extraction API that provides complete functionality for implementing text and metadata extraction solutions. It supports text & hyperlinks extraction from PDF, Emails, Ebooks, Microsoft Office formats: Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), LibreOffice formats and many more. It supports several advanced features for documents parsing, extracting plain and structured text, text searching by keywords, extract metadata or images, containers as well as attachments and many more.

Extract hyperlinks from PST in Java

GroupDocs.Parser for Java makes it easy for Java developers to extract hyperlinks from a PST file by implementing a few easy steps.

Instantiate Parser object for the initial document;
Check if the document supports hyperlink extraction;
Call getHyperlinks method and obtain collection of PageHyperlinkArea objects;
Iterate through the collection and get a hyperlink text and URL.

Learn more about the hyperlinks extraction

How to extract hyperlinks from PST file using Java example code

// Extract hyperlinks from PST file using GroupDocs.Parser API
// Create an instance of Parser class
try (Parser parser = new Parser(Constants.HyperlinksPdf)) {
    // Check if the document supports hyperlink extraction
    if (!parser.getFeatures().isHyperlinks()) {
        System.out.println("Document isn't supports hyperlink extraction.");
        return;
    }
    // Extract hyperlinks from the document
    Iterable<PageHyperlinkArea> hyperlinks = parser.getHyperlinks();
    // Iterate over hyperlinks
    for (PageHyperlinkArea h : hyperlinks) {
        // Print the hyperlink text
        System.out.println(h.getText());
        // Print the hyperlink URL
        System.out.println(h.getUrl());
        System.out.println();
    }
}

System Requirements

GroupDocs.Parser for Java APIs are supported on all major platforms and operating systems. Before executing the code below, please make sure that you have the following prerequisites installed on your system.

Operating Systems: Microsoft Windows, Linux, MacOS
Development Environments: NetBeans, Intellij IDEA, Eclipse, etc.
Frameworks
Download the latest version of GroupDocs.Parser for Java from Maven

Why Use GroupDocs.Parser for Java

Plain text extraction support from any supported documents
Documents parsing via user-defined templates
Fully support structured text extraction
Text searching via keyword as well as regular expression
Extract formatted text, metadata, images, containers, and attachments
Extract table of contents for some supported document formats
Parse form data from PDF documents
Extract hyperlinks from the document