February 19, 2019 Contact Us At (559) 733-1940       Login   
ResourcesHidden information in your documents    

Torian Group, Inc. - Live Support


Newsletter Sign-Up
Hidden information in your documents

Hidden information in your documents
Technology with Integrity

By Tim Torian, Torian Group, Inc.


Imagine the scene: you are sitting at your desk copying a pasting some new information into a company template. All the old content has been removed and the new detail has been inputted. The document is ready to go and gets sent out as an email attachment. But stop! Do you know that the document you have just sent contains all the confidential information you thought had been deleted?

When you save a word document, it doesn’t just save the text. It also saves the layout of the page, the fonts used, and everything needed to recreate and edit the document. This information is called Metadata – data about the data in the document.  Newer versions of Word, Wordperfect and other programs are including more and more information with the documents. This can include version and revision information – what you changed and deleted. When you send the document to someone, all this information goes with it. If they know how, they can look to see what you changed and decided to leave out.

In 2003 the UK Labor government made the fatal mistake of releasing a document that contained metadata, thus revealing information about the 'original author' who embarrassingly turned out to be a university student.

When SCO Group, a litigious Lindon (Utah) software company, filed a breach-of-contract suit in Michigan against DaimlerChrysler in March 2004, it revealed a lot more than it intended. A CNET News reporter, poking through the Microsoft Word filing, discovered that the case had originally been drawn up as a suit against Bank of America in a California court.

In financial reporting documents such as spreadsheets, metadata can be saved with Microsoft Excel files. A review of the Fortune 1000 Web sites showed that 33% of these Web sites contained Microsoft Excel documents publicly posted either directly on the company's corporate Web site or linked to a third party site for SEC filings. Accidental posting of Microsoft Excel documents that contain potentially harmful document metadata can be easily viewed by anyone who downloads these documents.

Metadata can have a positive function when creating large documents. It can provide information on who has contributed to the document, any specific changes that have been made and information about the company, all of which is of great importance when considering key compliance areas such as access control, audit trails and archiving.

As we increasingly rely on word documents and email as tools for collaboration and communication the risks posed by metadata becomes increasingly apparent. Largely the problem stems from a lack of awareness. Simply most people do not realize that by using old documents as templates it enables anyone to tap into the hidden layers of the document and expose the history of the text.

Business owners are responsible for what leaves the office via electronic resources. With the ever-expanding nature of compliance regulations such as Sarbanes Oxley and Basel II, managers have to be confident that the documents that leave the organization comply with company regulations. It is unrealistic to expect the owner to check every document that leaves the organization.  The company is reliant on their workers to be aware of what they are sending out and the potential security and legal implications of sending out documents full of metadata. It is therefore essential to educate your staff on the existence of metadata and how it can be used.

One useful, but potentially dangerous, feature of Word is "track changes." When turned on, it keeps a record of who made what alterations and when. Unless you carefully clean up the document, which can sometimes be done with a single mouse click, anyone receiving it can see the record of changes by using the "show markup" mode. This is what trapped SCO. Another feature lets you keep an audit trail by saving versions of a Word document as it goes through revisions. Unless all but the last version is deleted before the file is circulated, a recipient will be able to see them. The problem is that users must go to several different places within the application to remove different types of metadata. Here are the 2 most important:

  1. Turn ‘Highlight Track Changes’ on before sending documents to others to review if a history of changes exists.
  2. Click View-Comments from your Microsoft Office application before sending documents to others to review if a history of comments exists.

One tool that helps protect your outgoing documents is Microsoft's Remove Hidden Data, a free add-in to Office XP and 2003. You can get it from the Microsoft Office Download site here:


Commercial tools are also available. For Word, Excel and PowerPoint documents, the most widely used metadata scrubber is the Metadata Assistant, sold by Payne Consulting Group (www.payneconsulting.com $79). A free demo version of the program will show you the metadata within a Word document, but won’t clean it out. Other metadata removal programs for the Microsoft suite of products include Out-of-Sight by SoftWise Consulting (www.softwise.net), ezClean from KKL Software (www.kkl.com) and Workshare Protect by Workshare (www.workshare.net). For Word only there is Doc Scrubber (www.docscrubber.com).

For more detailed information on removing metadata from Word 97, 2000 or 2002, see respectively, knowledge Base articles 223790, 237361 or 290945.

WordPerfect has a feature called Undo/Redo History. It can allow you to view hundreds of past changes in terms of what text was cut, copied and even deleted from the document.  To turn it off, click on Options, then uncheck “Save Undo/Redo items with document”.  Unfortunately, there is program for cleaning metadata from WordPerfect documents.

It's a better idea to avoid distributing Office files unless absolutely necessary. People need to share information, but they don't need to give the recipients access to the original documents. One way to do that is to use Adobe Systems' portable document format (PDF), which can be displayed in the free and ubiquitous Acrobat Reader. It retains the formatting of the original document but strips all hidden data. Adobe's program for creating PDFs, Acrobat 7.0 (About $150), has extensive document-management features but is overkill for many uses. However, many other products can convert Office files to it, including Macromedia's RoboPDF ($79) and ScanSoft's  PDF Converter for Word ($50). Apple Computer's Mac OS X has PDF conversion built in.

While converting a file to PDF format will help strip out metadata from the original document, remember that PDF files can also contain their own metadata. This is usually basic information such as the name of the person who created the file, date of creation, file location etc. Select File, then Document Properties to view the summary metadata information within a PDF file. In this same dialog box you can add further restrictions on how the document can be accessed, used, copied and printed in the Security Options settings.

To learn more about metadata, and how to manage it, go to Metadatarisk.org, a public service site offering information for IT professionals and business users interested in protecting their corporations from exposing confidential information.

Tim Torian has taught computer networking at the College of Sequoias and Cal Poly Extension. He has a BS in Computer Science, and has been consulting on computer networks for the past 18 Years. His industry certifications include: Cisco CCNA and CCNI, Microsoft MCSE, and Novell CNE.  He is president of Torian Group, Inc. which provides a full range of Technology Consulting services to local business, including computer services, networking, and custom software development. They can be reached at (559) 733-1940 or on the web at http://www.toriangroup.com


Torian Group, Inc. Phone: (559) 733-1940  Fax: (559) 532-0207  Contact us