How to Remove Text from RTF Files using Regular Expressions and Java

GroupDocs.Redactions Java API allows to redact, hide or remove sensitive text from word processing documents, worksheets, presentations, PDFs & images using regular expressions.


Download Free Trial

What is Text Sanitization?

Text Redaction or Sanitization is the process of removing the confidential or unwanted text or information from digital documents while leaving intact the rest of the document or paragraph containing it. Redaction helps users as well as organization to protect their sensitive information by hiding or permanently removing them. Using GroupDocs.Redaction Java API users can now redact, hide or remove sensitive text from word processing documents, worksheets, presentations, PDF and raster image files. The API provides a wide range of options and methods for the redaction of private information in the documents. It supports search and redact using regular expressions, usage of textual (exemption codes) or graphical (colored rectangles) redactions and many more. So why not give it a try and automate your document redaction process by downloading the API and explore its basic and advanced features.

Redact RTF using Regular Expressions in Java

GroupDocs.Redaction allows to easily redact data of sensitive or private nature from your documents. The most popular redaction case is to remove a text from a document.

The following code can be used to apply textual redaction to a particular part of a document using regular expression. It allows users to replace all numbers, matching pattern “AA BB CCCCCC” with a Blue color rectangle,

Remove Sensitive Data from RTF

  • Create an instance of Redactor class & upload RTF file
  • Create an instance of RegexRedaction class
  • Call redactor.apply method with object of RegexRedaction class
  • Call redactor.save method to save the changes


// For complete examples and data files, please go to https://github.com/groupdocs-search/GroupDocs.Redaction-for-Java
//Load document
Document doc = Redactor.load(Utilities.mapSourceFilePath(FilePath));
// Perform regular expression redaction
doc.redactWith(new RegexRedaction("\\d{2}\\s*\\d{2}[^\\d]*\\d{6}", new ReplacementOptions(java.awt.Color.BLUE)));
// Save the document in original format and overwriting original file
SaveOptions so = new SaveOptions();
so.setAddSuffix(false);
so.setRasterizeToPDF(false);
doc.save(so);
doc.close();

System Requirements

GroupDocs.Redaction for Java APIs are supported on all major platforms and operating systems. For complete system requirements guide, please visit system requirements Before executing the code below, please make sure that you have the following prerequisites installled on your system:

  • Operating Systems: Microsoft Windows, Linux, MacOS
  • Development Environment: NetBeans, Intellij IDEA, Eclipse etc
  • Java Runtime Environment: J2SE 6.0 and above
  • Get the latest version of GroupDocs.Redaction for Java from Maven

Why Use GroupDocs.Redaction

  • Allow users to add custom document formats and types of redactions
  • No additional software is required to remove sensitive information
  • Ability to set page range rendering document as PDF
  • Easy way to redact different types of metadata: author name, version, title, subject, description and many more
  • Document information extraction - file type, page count etc.


What is RTF File Format?

Introduced and documented by Microsoft, the Rich Text Format (RTF) represents a method of encoding formatted text and graphics for use within applications. The format facilitates cross-platform document exchange with other Microsoft Products, thus serving the purpose of interoperability. This capability makes it a standard of data transfer between word processing software and, hence, contents can be transferred from one operating system to another without losing document formatting. The file format specifications are available by Microsoft for public download and can be referred to from developer’s perspective.

Read More

Popular Redaction Options

Redact CSV Files

(Comma Seperated Values)

Redact DOC Files

(Microsoft Word Binary Format)

Redact DOCM Files

(Microsoft Word 2007 Marco File)

Redact DOCX Files

(Office 2007+ Word Document)

Redact DOT Files

(Microsoft Word Template Files)

Redact DOTM Files

(Microsoft Word 2007+ Template File)

Redact DOTX Files

(Microsoft Word Template File )

Redact PDF Files

(Portable Document Format)

Redact POT Files

(Microsoft PowerPoint Template Files)

Redact POTM Files

(Microsoft PowerPoint Template File)

Redact PPS Files

(PowerPoint Slide Show)

Redact PPSM Files

(Macro-enabled Slide Show)

Redact PPSX Files

(PowerPoint Slide Show)

Redact PPT Files

(Microsoft PowerPoint 97-2003)

Redact PPTM Files

(Macro-enabled Presentation File)

Redact PPTX Files

(Open XML presentation Format)

Redact XLS Files

(Microsoft Excel Spreadsheet (Legacy))

Redact XLSM Files

(Macro-enabled Spreadsheet)

Redact XLSX Files

(Open XML Workbook)

Redact XLT Files

(Excel 97 - 2003 Template)

Redact XLTM Files

(Excel Macro-Enabled Template)

Redact XLTX Files

(Excel Template)

Back to top
 English