A hyperlink is a piece of text or an image or icon that points to an entire document or to a particular part within a document. The use of hyperlinks allows users to navigate to a web page or document. It is often required to extract hyperlinks from a document and use it to access external document or webpage. GroupDocs.Parser for Java is a fascinating document text extraction API that provides complete functionality for implementing text and metadata extraction solutions. It supports text & hyperlinks extraction from PDF, Emails, Ebooks, Microsoft Office formats: Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), LibreOffice formats and many more. It supports several advanced features for documents parsing, extracting plain and structured text, text searching by keywords, extract metadata or images, containers as well as attachments and many more.
GroupDocs.Parser for Java makes it easy for Java developers to extract hyperlinks from a PST file by implementing a few easy steps.
// Extract hyperlinks from PST file using GroupDocs.Parser API
// Create an instance of Parser class
try (Parser parser = new Parser(Constants.HyperlinksPdf)) {
// Check if the document supports hyperlink extraction
if (!parser.getFeatures().isHyperlinks()) {
System.out.println("Document isn't supports hyperlink extraction.");
return;
}
// Extract hyperlinks from the document
Iterable<PageHyperlinkArea> hyperlinks = parser.getHyperlinks();
// Iterate over hyperlinks
for (PageHyperlinkArea h : hyperlinks) {
// Print the hyperlink text
System.out.println(h.getText());
// Print the hyperlink URL
System.out.println(h.getUrl());
System.out.println();
}
}
GroupDocs.Parser for Java APIs are supported on all major platforms and operating systems. Before executing the code below, please make sure that you have the following prerequisites installed on your system.