Metadata XMP: Delivering Content in Context


By Mike Kadell and Aaron Schnarr November 2015


“Metadata is data that describes the characteristics or properties of a document. It can be distinguished from the main contents of a document.” –XMP Specification (2005, Pg11)


Simply put, metadata is information about the content, not the content itself. For example, you may know nothing about the contents of a book, but you may know the book’s author and publisher.


A picture may be worth a thousand words, but context is king.

“If you bring me a dead cat, all I can tell you is that it’s dead and it was a cat. But if you hand me a dead cat and you tell me you found it in the middle of the road, what killed it? Hit by a car? Hit by a truck? Okay, so you find a dead cat in the kitchen of your favorite restaurant, what killed it? The chef? What are we talking about? Context! The difference between road-kill and a meal.”

– Prof. Lee Silver (California Institute of Technology, 1969), Earth to the Moon: Galileo was Right (1998)


 Metadata, XMP, and PDF documents

Many of us are familiar with the basic metadata common to all PDF documents. Open any PDF file with Adobe Reader and go to

File> Properties> Description (shortcut: Ctrl-D)


And you will see the standard PDF metadata;

  • Title / Author / Subject / Keywords
  • Created / Modified / Application
  • PDF Producer / Version / File & Page size


Open the same file in Adobe Acrobat, and you have access to additional (XMP) metadata;

 

Definition -

XMPExtensible Metadata Platform

       Extensible: able to be extended, extendable


In information technology, extensible describes something that is designed so that users or developers can expand or add to its capabilities.


Additional Metadata reveals more Descriptions:

  • Author Title
  • Description Writer
  • Copyright Status
  • Copyright Notice
  • Copyright Info URL


Selecting the Additional Metadata button opens a whole new world of information about the document;


The Advanced tab exposes the default XMP structure and provides access to adding your own custom XMP metadata;

 Default metadata Advanced Properties:

  • Dublin Core Properties
  • XMP Core Properties
  • PDF Properties
  • XMP Media Management Properties


 How to add your own XMP data

XMP is serialized using the Resource Description Framework (RDF) standard, a subset of XML. The RDF standard is an open W3C standard and is fully detailed in the W3C document Resource Description Framework (RDF) Model and Syntax Specification.


Metadata is made up of a collection of properties corresponding to a resource.

  • Resources can be documents or portions of documents (i.e. pages).
  • Properties have a name and a value
    • Used in the form "The property name of resource is property value."
    • Example: The creator of Star Wars is George Lucas.

 


A discussion of the XMP syntax and structure is beyond the scope of this article, but a simple example demonstrates the functionality:


       Input file StarWars.XMP & Appended metadata in PDF


Are there any tools out there to help developers to programmatically manage XMP data?


Absolutely! Toolkit by ActivePDF is a developer’s library with hundreds of methods and properties that can knock out almost any PDF-related task.  Toolkit offers an entire object dedicated to managing XMP metadata through an easy-to-use SDK. With the XMP manager object, you can define your namespace, add fields to the XMP, Get/Set properties, and assign them either to page level or to the document. For complete sample code that you can download and run, visit https://github.com/activepdf


Make your documents smarter! Use XMP data to greatly extend the context of the document. Include things such as a Job ID, workflow state, invoice numbers; patient, student, or customer names; Loan IDs, or any other data that fits within your ERP system. Custom XMP data provides the ability to classify and index documents for processing and archiving so you may know what the document is about, without having to expose the document content.