The Vision APIs bring advanced image and video understanding skills to your bots. They are powered by state-of-the-art algorithms, which allow you to process images or videos and get back information you can transform into actions. For example, you can use them to recognize objects, people’s faces, age, gender or even feelings. The Vision APIs support a variety of image understanding features, such as identifying mature or explicit content, estimating dominant and accent colors, categorizing the content of images, performing optical character recognition, as well as describing an image with complete english sentences. Additionally, the Vision APIs support several image and video processing capabilities, such as intelligently generating image or video thumbnails, or stabilizing the output of a video. You can play with the popular CaptionBot.ai to see some of the Vision APIs in action, or read the examples below for step-by-step instructions to get started.
There are 4 APIs available in Cognitive Services that can process images or videos:
- The Computer Vision API extracts rich information about objects and people in images, determines if the image contains mature or explicit content, and also processes text (OCR) in images.
- The Emotion API analyzes human faces and recognizes their emotion across eight possible categories of human emotions.
- The Face API detects human faces, compares them to similar faces, and can even organize people into groups according to visual similarity.
- The Video API analyzes and processes video to stabilize video output, detect motion, track faces, as well as intelligently generate a motion thumbnail summary of the video.
Use Cases for Bots
The Vision APIs are useful for any bot that receives images as input from users and wants to distill actionable information from them. Here are a few examples:
- You can use the Computer Vision API to understand objects or even celebrities in an image. For example, CaptionBot.ai is using the Computer Vision API to identify objects, people (celebrities), in order to generate a human-readable caption of the image.
- You can use the Face API to detect faces, along with infomation about people’s age, gender and facial landmarks, and even match faces to similar ones. So your bot can respond appropriately according to a user’s unique facial attributes.
- You can use the Emotion API to identify people’s emotions. So, if a user uploads a sad selfie, the bot can reply with an appropriate message.
Before you get started, you need to obtain your own subscription key from the Microsoft Cognitive Services site. Our Getting Started guides (available for C# and Python) describe how to obtain the key and start making calls to the APIs. You can find detailed documentation about each API, including developer guides and API references by navigating to the Cognitive Services documentation site and selecting the API you are interested in from the navigation bar on the left side of the screen.
Example: Vision Bot
For our first example, we will build a simplified version of CaptionBot.ai. The Vision Bot can receive an image, either as an attachment or url, and then return a computer-generated caption of the image via the Computer Vision API. We will use the Bot Application .NET template as our starting point.
Chat with Vision bot
After you create your project with the Bot Application.NET template, install the Microsoft.ProjectOxford.Vision package from nuGet. Next, go to MessagesController.cs class file and add the following namespaces.
On the same file, replace the code in the Post task with the one in the snippet below. The code initializes the Computer Vision SDK classes that take care most of the hard work.
Continue by adding the code below that reads the image sent by the user as an attachment or url and sends it to the Computer Vision API for analysis.
Finally, add the following code to read the analysis results from the Computer Vision API and respond to the user.
Example: Emotion Bot
For our second example, we will build an Emotion Bot that receives an image url, detects if there’s at least one face in the image, and finally responds back with the dominant emotion of that face. To keep the example simple, the bot will only return the emotion for only one face, and ignore other faces in the image. The example requires the Microsoft.ProjectOxford.Emotion package, which can be obtained via NuGet.
Create a new project with the Bot Application .NET template. Install the Microsoft.ProjectOxford.Emotion package from nuGet. Next, go to MessagesController.cs class file and add the following namespaces.
Then, replace the code in the Post task with the one in the snippet below. The code reads the image url from the user, sends it to the Emotion API and finally replies back to the user with the dominant emotion it recognized for a face in the image, including also the confidence score.