It is very rare for me to read about photography on blogs that have to do with computer science and programming, but this week I came across a very interesting study that combines computer science and picture organization methods. The study was done by Cornell University faculty and students and was presented at the 2009 edition of the International World Wide Web conference. It was entitled “Mapping the world’s photos” (See resources at the end for the PDF link).
What is this about?
While it was not a simple read, it was definitely very captivating …I know I’m weird because I like reading research papers. Well, here is the abstract of the paper to give you an idea what it is about.
“We investigate how to organize a large collection of geotagged photos, working with a dataset of about 35 million images collected from Flickr. Our approach combines content analysis based on text tags and image data with structural analysis based on geospatial data. We use the spatial distribution of where people take photos to define a relational structure between the photos that are taken at popular places. We then study the interplay between this structure and the content, using classification methods for predicting such locations from visual, textual and temporal features of the photos. We find that visual and temporal features improve the ability to estimate the location of a photo, compared to using just textual features. We illustrate using these techniques to organize a large photo collection, while also revealing various interesting properties about popular cities and landmarks at a global scale.”
35 million pictures ?! That’s a lot of pictures…a lot more than what any of us would have to ever deal with on our own computers at home.
What does this mean?
While I can’t claim that I have studied the paper in depth or that I understand all its implications, I believe that I can summarize a few things. Here’s my take on some aspects of this paper. We all know how to tag our images, and how to add image geotags. We have also learned that our cameras add EXIF image metadata that contains the date and time when each picture was taken. So, what happens when you put all this information together? Well, initially you get a large amount of data and that’s about it! But what happens if you divide all these pictures and data by individual photographers? This is when things start to get interesting…actually very interesting. If one is able to map all the pictures of one photographer taken at a particular tourist attraction and combine that with the time when each picture was taken, you can essentially follow that photographer as he or she was taking pictures. Now, throw in the mix a very large number of pictures…about 35 million pictures from Flickr. Then you can create maps of cities with popular landmarks. Not only that but these maps are also showing the route tourists take when visiting the city. You also start getting a really good sense about what landmark is popular and what the relationship between a landmark and any other spot around that landmark is.
EXIF and IPTC are a powerful combination.
Here is another quote from the paper that shows what results can be obtained from combining EXIF and IPTC image metadata. This really shows that when you combine EXIF (geotags and time information) and IPTC (keywords and maybe geolocation) you can obtain powerful results…if you know how to put it all together and interpret it correctly.
As researchers discovered a decade ago with large-scale collections of Web pages, studying the connective structure of a corpus at a global level exposes a fascinating picture of what the world is paying attention to. In the case of global photo collections, it means that we can discover, through collective behavior, what people consider to be the most significant landmarks both in the world and within specific cities; which cities are most photographed which cities have the highest and lowest proportions of attention-drawing landmarks; which views of these landmarks are the most characteristic; and how people move through cities and regions as they visit different locations within them. These resulting views of the data add to an emerging theme in which planetary-scale datasets provide insight into different kinds of human activity — in this case those based on images; on locales, landmarks, and focal points scattered throughout the world; and on the ways in which people are drawn to them.
Top ten most photographed cities in the world
Ten most used city tags in Flickr pictures:
Top seven most photographed landmarks on Earth
I will only present one more of their findings here, namely the most photographed landmarks. You can see all the other findings in their paper (see resources at the end). Seven most used tags on images of tourist attractions:
- eiffel – Eiffel Tower in Paris, France.
- trafalgarsquare – Trafalgar Square in London, England
- tatemodern – The Tate Modern museum in London, England
- bigben – Big Ben in London, England
- notredame – Notre Dame in Paris, France
- londoneye – The Eye in London, England
- empirestatebuilding – The Empire State Building in New York.
It is obvious that when dealing with such quantity of pictures free programs and even consumer software is not adequate. However, with the proper tools, we can see how powerful EXIF and IPTC image metadata can become if used together. And guess what: maybe my own Flickr pictures have been used in this study ! So, have become part of some very cool statistics…or even better my pictures have become really important 🙂
Independent Course Reference book Basics book Picasa book