Java API to Extract Hyperlinks from Documents, Pages or Particular page Area

GroupDocs.Parser for Java API makes developers job easy by allowing them to extract hyperlinks from documents, document’s page or specific page Area of PDF, DOCX, PPTX, EML, MSG, XLS, XLSX, CSV, RTF, EPUB and many more.


Download Free Trial

How to Parse & Extract Hyperlinks from PST documents via Java API?

A hyperlink is a piece of text or an image or icon that points to an entire document or to a particular part within a document. The use of hyperlinks allows users to navigate to a web page or document. It is often required to extract hyperlinks from a document and use it to access external document or webpage. GroupDocs.Parser for Java is a fascinating document text extraction API that provides complete functionality for implementing text and metadata extraction solutions. It supports text & hyperlinks extraction from PDF, Emails, Ebooks, Microsoft Office formats: Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), LibreOffice formats and many more. It supports several advanced features for documents parsing, extracting plain and structured text, text searching by keywords, extract metadata or images, containers as well as attachments and many more.

Extract hyperlinks from PST in Java

GroupDocs.Parser for Java makes it easy for Java developers to extract hyperlinks from a PST file by implementing a few easy steps.

  • Instantiate Parser object for the initial document;
  • Check if the document supports hyperlink extraction;
  • Call getHyperlinks method and obtain collection of PageHyperlinkArea objects;
  • Iterate through the collection and get a hyperlink text and URL.

How to extract hyperlinks from PST file using Java example code

// Extract hyperlinks from PST file using GroupDocs.Parser API
// Create an instance of Parser class
try (Parser parser = new Parser(Constants.HyperlinksPdf)) {
    // Check if the document supports hyperlink extraction
    if (!parser.getFeatures().isHyperlinks()) {
        System.out.println("Document isn't supports hyperlink extraction.");
        return;
    }
    // Extract hyperlinks from the document
    Iterable<PageHyperlinkArea> hyperlinks = parser.getHyperlinks();
    // Iterate over hyperlinks
    for (PageHyperlinkArea h : hyperlinks) {
        // Print the hyperlink text
        System.out.println(h.getText());
        // Print the hyperlink URL
        System.out.println(h.getUrl());
        System.out.println();
    }
}

System Requirements

GroupDocs.Parser for Java APIs are supported on all major platforms and operating systems. Before executing the code below, please make sure that you have the following prerequisites installed on your system.

  • Operating Systems: Microsoft Windows, Linux, MacOS
  • Development Environments: NetBeans, Intellij IDEA, Eclipse, etc.
  • Frameworks
  • Download the latest version of GroupDocs.Parser for Java from Maven

Why Use GroupDocs.Parser for Java

  • Plain text extraction support from any supported documents
  • Documents parsing via user-defined templates
  • Fully support structured text extraction
  • Text searching via keyword as well as regular expression
  • Extract formatted text, metadata, images, containers, and attachments
  • Extract table of contents for some supported document formats
  • Parse form data from PDF documents
  • Extract hyperlinks from the document

Extract Hyperlinks From Other Document Formats

Java documents parse & hyperlinks extraction API for file formats and images. Extract data for some of the popular file formats as stated below.

Back to top
 English