AI can automatically generate stories using pictures

Jun 6, 2016 08:21 GMT  ·  By

Microsoft’s Research division has developed a new artificial intelligence system that can automatically generate stories using your photos, trying not only to explain what exactly is happening in the pictures but also to detail the context and people’s feelings.

The process of developing storytelling capabilities for photos is based on AI systems that can identify objects after learning by example, as LiveScience explains. The researchers first configured the systems to analyze a series of similar images and then look online to identify more objects that fall in the same category.

Furthermore, the Microsoft Research team turned to the Amazon Mechanical Turk, a service where people describe scenes of a batch of photos, thus having the AI systems learn them and then generate new ones by matching different pictures with different descriptions.

A total of 8,100 photos were involved in the testing stage to determine how well the AI technology can create a story based on the information it previously learned.

Making the difference between words

The new software should be able to advance from the typical photo recognition system that tells you something like this:

“This is a picture of a family; this is a picture of a cake; this is a picture of a dog; this is a picture of a beach”

to a more story-like description like this:

“The family got together for a cookout; they had a lot of delicious food; the dog was happy to be there; they had a great time on the beach; they even had a swim in the water.”

For the moment, however, these are just the first stages of the new tech, so it takes a little bit longer until it improves, as researchers have already found issues that they need to deal with before it can go mainstream.

The system still has a hard time distinguishing words, and in the early tests, it described everything as being “awesome.”

“All the people had a great time; everybody had an awesome time; it was a great day. Now maybe that's true, but we also want the system to focus on what's salient,” one of the researchers working on the technology explained.

Certainly, there’s plenty of room for improvement, and Microsoft hopes the same technology to be used not only on photos but also on videos and other multimedia content. It can help visually impaired people and could have a number of other implementations, including in social media.