With the help of Artificial Intelligence, a team of researchers at Microsoft (Microsoft AI) has developed a machine learning system that allows an image to be described and captioned with extreme precision. This new device will be particularly useful for people with visual impairments.
Precision at the service of people with visual impairments
The accuracy of this new Machine Learning system (Microsoft AI) would exceed that of humans. This conclusion was made after an evaluation test of the system carried out on “nocaps” images. This term indicates all the images that the machine does not know yet.
This innovative technology would be of great help to people with visual impairments. This system will complete oral text reading software, which precisely lacks a description device for images. John Roach, CTO at Microsoft Digital Advisory Services says:
this is an important step in Microsoft’s drive to make its products and services inclusive and accessible to all users.
Describe the content or action of an image
Of course, describing and captioning images while browsing the Internet is a major challenge. This action effectively requires a solution that would describe the content or action of the image. Lijuan Wang, research director at Microsoft’s lab, explains that the user needs to understand what is going on. It would then be necessary to know the relationship between objects and actions to summarize and describe the image in natural language.
To develop this technology, the researchers used the technique of teaching children to read. This amounts to associating an image with one or more words. Also, The researchers thus succeeded in exceeding human capacity. The system can even describe images called “nocaps”, that is to say, images that are not part of the database of the system. “Our challenge was really to know how to describe these new objects which were not present in our incoming data ”, according to Lijuan Wang.
Create image datasets with keywords
Lijuan Wang clarified that it is more efficient to create image datasets with keywords instead of having full captions, which is why the team put together a dataset of images associated with keywords. images. This system was called “ visual vocabulary ” by the team. it is exploited by AI to describe an image presenting unseen objects.
This system is twice as precise as the devices presented in 2015. This new system is integrated into the Azure Cognitive Services solution. Therefore, This allows developers to easily add “cognitive functionality” to their applications. It is also integrated with Seeing AI and will soon be rolled out to Word, Outlook, and PowerPoint during the year.
Finally, here is Microsoft’s video which explains better how to work this system: