Document sanitization is the process of removing metadata from a document to avoid sensitive information falling into unauthorized people’s hands. Document metadata refers to invisible information in a document, such as its creation date, author’s name, revision history, and the comments exchanged by the author and editor.
Even when some of the said data is deleted, there’s still a chance it has been digitally stored within the document. And since metadata can contain sensitive information, removing it is crucial before distributing the document to other people.
For example, a contract drawn by a law office may have undergone several editing and revision phases. The author, editor, and everyone involved may have left and addressed comments in the document before the contract was finalized. Before sending the contract to the necessary parties, it has to go through document sanitization first so that only the intended information is sent.
Read More about Document Sanitization
Document sanitization initially applies to printed documents. For instance, intelligence reports may contain classified information, such as names and locations. To enable the document’s distribution outside the intelligence department, it has to be sanitized—sensitive data is covered or removed from the report.
Below is an example of a publicly available sanitized document printed in 2015.
What Does It Mean to Sanitize a Document?
Today, document sanitization extends to digital files. In a nutshell, document sanitization means removing any classified or sensitive details from a document. It is synonymous with document redaction.
To effectively sanitize documents, you may start by asking the question, “What information should not fall outside your premises?”
That data can include passwords, email addresses, phone numbers, suppliers’ names, and side comments that others can take out of context. To be safe, it’s best to erase everything that is not intended to be distributed.
Why Is Document Sanitization Important?
A significant advantage the digital age brings is that errors can be deleted quickly. But the drawback is that everything can’t be erased entirely. There may be digital residues that have been stored and remain retrievable.
Even when you cover sensitive information on PDFs distributed digitally, there’s no guarantee that other people won’t find a way to see what’s behind the covered areas. Hence, document sanitization is more complicated for digital files than printed materials.
A classic example is AT&T’s legal brief about its participation in the National Security Agency (NSA) wiretapping filed in 2006. Some pages were redacted, meaning the text was covered in black, as shown in the screenshot below. However, some users could retrieve the redacted text by copying them to some PDF readers.
Proper document sanitization is vital to avoid divulging national, corporate, or personal secrets.
How Do You Sanitize a Document?
In 2005, the NSA published its recommendations on sanitizing a Word document. Here’s a summary.
- Create a copy of the original document and make changes to that copy.
- Turn off track changes, comments, and other visible markups.
- Delete sensitive text, images, diagrams, and other information by covering them with shapes.
- Rename the document.
- Open a blank Word document and copy the revised file onto it.
- Convert the Word document to PDF.
If you’re using Acrobat Professional, you can use the Sanitize Document tool as an added protection.
How to Turn Off Track Changes on a Word Document
To turn off tracking on Word, go to the Review tab and make sure that Track Changes is not highlighted or selected.
While on the Review tab, you can also disable markup by clicking the drop-down arrow and selecting No Markup.
In today’s highly digital world, we send and receive essential files via email and instant messaging all the time. It may be helpful to pause and think about the data embedded in these documents.
Did you type your birthday, Social Security number, passwords, and other personal details onto the document? If so, it’s crucial to sanitize the document before sending it. You wouldn’t want scammers and other threat actors to get hold of those details.
- Document sanitization refers to the process of removing sensitive information—visible or not—from a file before sending it to other people.
- Document metadata may contain classified data, including passwords, device details, and internal comments.
- Document sanitization and redaction are the same. However, sanitization also involves removing invisible information like metadata.
- It’s easier to sanitize printed materials than digital documents. You can cover classified text and images with a colored rectangle on printed materials.
- There are ways to retrieve deleted data or covered blocks of text from digital files.
- The NSA released some recommendations on effectively sanitizing a document before digital distribution.