The Descriptive Camera works a lot like a regular camera—point it at subject and press the shutter button to capture the scene. However, instead of producing an image, this prototype uses crowd sourcing to output a text description of the scene. Modern digital cameras capture gobs of "parsable" metadata about photos such as the camera's settings, the location of the photo, the date, and time, but they don't output any information about the content of the photo. The Descriptive Camera only outputs the metadata about the content.
As we amass an incredible amount of photos, it becomes increasingly difficult to manage our collections. Imagine if descriptive metadata about each photo could be appended to the image on the fly—information about who is in each photo, what they're doing, and their environment could become incredibly useful in being able to search, filter, and cross-reference our photo collections. Of course, we don't yet have the technology that makes this a practical proposition, but the Descriptive Camera uses crowd sourcing to explore these possibilities.
Technology
Descriptive Camera The technology at the core of the Descriptive Camera is Amazon's Mechanical Turk API. It allows a developer to submit Human Intelligence Tasks (HITs) for workers on the internet to complete. The developer sets the guidelines for each task and designs the interface for the worker to submit their results. The developer also sets the price they're willing to pay for the successful completion of each task. An approval and reputation system ensures that workers are incented to deliver acceptable results. For faster and cheaper results, the camera can also be put into "accomplice mode," where it will send an instant message to any other person. That IM will contain a link to the picture and a form where they can input the description of the image.
The camera itself is powered by the BeagleBone, an embedded Linux platform from Texas Instruments. Attached to the BeagleBone is a USB webcam, a thermal printer from Adafruit, a trio of status LEDs and a shutter button. A series of Python scripts define the interface and bring together all the different parts from capture, processing, error handling, and the printed output. My mrBBIO module is used for GPIO control (the LEDs and the shutter button), and I used open-source command line utilities to communicate with Mechanical Turk. The device connects to the internet via Ethernet and gets power from an external 5 volt source, but I would love to make a another version that's battery operated and uses wireless data. Ideally, The Descriptive Camera would look and feel like a typical digital camera.
Acknowledgements
Philip Heron's fswebcam was the easiest way I found to capture images from a USB webcam on the BeagleBone.
Without Dan Watts' writeup on how to get UART serial working on BeagleBone using Python, I would've found it very difficult to get the printer working.
Nuno Alves wrote an indispensable tutorial which helped me set my code up to run as a system service when the BeagleBone boots.
Dan O'Sullivan and my fellow students in Computational Cameras, who provided much-needed conceptual feedback for the project.