Hidden data to JPEG files. For different reasons, one might want to remove these data before publishing the files on the Internet.
Digital cameras and image manipulation programs add hiddeMetadata in JPEG Files.
The JPEG file format is the format most used for storing and transmitting photographs on the Internet. In addition, a large number of digital cameras store pictures as JPEG files. However, many users are likely to be unfamiliar with the fact that a JPEG file can contain other data besides the actual photograph.
The JPEG file format allows it to embed additional information called "metadata" in the file header. (Other image file formats can contain metadata, too.) The purpose of these metadata is to provide additional and useful information along with the picture. Image manipulation programs and especially digital cameras take advantage of this feature.
Metadata can be embedded in different ways. A common way is to store them according to the Exif specification, which has been created by the Japan Electronic Industry Development Association (JEIDA). Other popular specifications are the IPTC headers defined by the International Press Telecommunications Council (IPTC) and XMP developed by Adobe Systems. More detailed information about these metadata formats as well as descriptions of other metadata formats can be found on ExifTool's Tag Names page.
Among other things, the metadata section of a file can contain information about:
make and model of the digital camera
time and date the picture was taken
distance the camera was focused at
location information (GPS) where the picture was taken
small preview image (thumbnail) of the picture
firmware version, serial numbers, name and version of the image manipulation program, etc. ...
Should Metadata Be Removed?
If you intend to publish JPEG files on the Internet, you might want to remove all metadata to reduce the file size of the JPEG files. Depending on what kinds of metadata are stored in the file, the reduction can range between a few bytes and several kilobytes. For example, if you have a website with metered bandwidth or if you have visitors with dialup modems, you might be interested in saving as much bytes as possible.
Another reason why you might want to consider removing all metadata beforehand is that metadata can give away potentially sensitive information. This information can mean a thread to your privacy or to other legitimate interests (e. g. the interest of journalists to protect their sources). The following fictitious and real-life examples try to illustrate the problematic nature of metadata information:
Many digital cameras embed a small preview image (thumbnail) of the picture in the header of each JPEG file. This makes it possible to quickly browse the pictures. Not all image manipulation programs update this thumbnail along with the main picture. The consequence could be that an edited picture retains the original unmodified version of the picture as an Exif datum. In some cases, this may only be inconvenient; in other cases, this could create a significant information leak. For example, a supposedly anonymized picture of a person still shows his or her identity in the thumbnail. Another, more embarrassing example is the case of television personality Cat Schwartz (e.g. TechTV). Schwartz had published a photograph of herself on her personal blog. Because the program she had used to edit the picture did not update the thumbnail, the thumbnail revealed more nude facts than originally intended.
The following real-life case happened in February 2006: The Washington Post published an interview with a computer hacker: Invasion of the Computer Snatchers. The hacker had agreed to be interviewed only if he was not identified by name or hometown. In addition to the interview, a disguised picture of the hacker was published. Unfortunately, the picture contained IPTC metadata about the city and state where it was taken. With all the details mentioned in the article, it could be possible to track down the hacker.
Other kinds of metadata could have meant a comparable thread: The Exif datum "location information (GPS) where the picture was taken" enables one to exactly locate the place where the picture was taken. The Exif datum "distance the camera was focused at" allows at least to calculate the exact position of the photographer if one knows the location of the photographed object.
A fictitious example: Bill does not want to go to uncle Linus' birthday party. He would rather go to a concert of the Rolling Stones. He tells his uncle that his boss wants him to work overtime to finish an important project. At the concert, Bill's friend Steve takes a picture of Bill. Bill publishes the picture on his homepage. Weeks later, uncle Linus visits Bill's homepage. He examines the Exif data "time and date the picture was taken" and discovers that Bill did not work overtime, but went to a concert on the day of the birthday party.
Fingerprint of Digital Cameras
Many users may also not know that digital cameras leave an individual fingerprint in each picture. This allows to reliably link pictures to the camera with which they were taken -- in much the same way that forensic examiners can link bullets to the gun that fired them.
Professor Jessica Fridrich and two members of her Binghamton University research team exploit the fact that every digital camera produces tiny imperfections (noise) within a picture. Each camera has a characteristic way of producing noise (even cameras of the same make and model) due to inevitable irregularities during the manufacturing process of the camera and its sensors. Although the digital noise is largely invisible to the human eye, the team around Fridrich have developed algorithms to analyze the noise and thus to determine the individual fingerprint. According to Fridrich, the technique is accurate 99.99 percent of the time. A limitation is that it requires multiple pictures taken by the same camera to determine the fingerprint; a single picture is not sufficient.
With the help of the fingerprint, it is possible to tell if a picture was taken by a certain camera. It is even possible to detect image tampering. While unchanged regions of a picture keep their digital fingerprint, regions that have been tampered with lose their characteristic noise. Even if a picture has been compressed to a smaller file size (e.g. to send it by email), the fingerprint remains detectable.
Whereas Fridrich needs multiple pictures for her analysis, a technique developed by Nasir Memon of Polytechnic University in Brooklyn requires only a single picture. Memon's technique relies on the fact that different digital camera manufacturers use different interpolation algorithms. An interpolation algorithm is used by digital cameras to give each pixel of a digital photograph the correct color. As these algorithms leave telltale traces in the pictures and vary from company to company, Memon can match a picture to a camera brand with an accuracy of 90 percent.
Software tools that are capable of removing digital fingerprints do not seem to exist.
Digital Watermarks
Something that should be distinguished from digital fingerprints is digital watermarking. Among other things, digital watermarking is used to prevent -- or at least expose -- picture altering. Digital cameras equipped with digital watermarking technology append an extra stream of identifying data to each picture, which is usually invisible. If the picture is changed, these data and therefore the digital watermark are corrupted.
Cameras with watermarking technology are mainly purchased by professionals who need to prove that the pictures they have taken are unaltered (e. g. crime scene investigators). Just like digital fingerprints, digital watermarks could make it possible to determine if a picture was taken by a certain camera.
Although software tools that can remove digital watermarks do not seem to exist, digital watermarks are not really a problem. Simple countermeasures are to use only digital cameras without watermarking technology or with the option to disable watermarking.
Comments