Computer Vision

linking:: AI-900

Analyzing Images

Computer Vision cognitive service uses pre-trained models to analyze images

Resource

  • Computer Vision
  • Cognitive Services

Access

  • Key
  • Endpoint

Capabilities

  • Describing an image (highest confidence first)
  • Tagging visual features
  • Detecting objects (bounding box)
  • Detecting brands (bounding box, confidence)
  • Detecting faces (bounding box, age)
  • Categorizing an image
  • Detecting domain-specific content (confidence)
  • Optical character recognition
  • Detect image types
  • Detect image color schemes
  • Generate thumbnails
  • Moderate content

Image Classification

You can use a machine learning classification technique (convulational neural network) to predict which category, or class, something belongs to. Classification machine learning models use a set of inputs, which we call features, to calculate a probability score for each possible class and predict a label that indicates the most likely class that an object belongs to.

Resource

  • Custom Vision (training, prediction, or both)
  • Cognitive Services

Evaluation

  • Precision: What percentage of the class predictions made by the model were correct? For example, if the model predicted that 10 images are oranges, of which eight were actually oranges, then the precision is 0.8 (80%).
  • Recall: What percentage of class predictions did the model correctly identify? For example, if there are 10 images of apples, and the model found 7 of them, then the recall is 0.7 (70%).
  • Average Precision (AP): An overall metric that takes into account both precision and recall).

Access

  • Project ID
  • Model name
  • Prediction endpoint
  • Prediction key

Object Detection

A form of machine learning based computer vision in which a model is trained to recognize individual types of object in an image, and to identify their location in the image.

  • Class
  • Probability
  • Bounding box

Resource

  • Custom Vision (training, prediction, or both (both separates into two resources))
  • Cognitive Services

Evaluation

  • Precision: What percentage of class predictions did the model correctly identify? For example, if the model predicted that 10 images are oranges, of which eight were actually oranges, then the precision is 0.8 (80%).
  • Recall: What percentage of the class predictions made by the model were correct? For example, if there are 10 images of apples, and the model found 7 of them, then the recall is 0.7 (70%).
  • Mean Average Precision (mAP): An overall metric that takes into account both precision and recall across all classes).

Access

  • Project ID
  • Model name
  • Prediction endpoint
  • Prediction key

Face Analysis

Face detection involves identifying regions of an image that contain a human face, typically by returning bounding box coordinates that form a rectangle around the face.
Facial analysis uses landmarks to get details such as age or emotions.
Facial Recognition identifies known users based on their features.

Services

  • Computer Vision, which offers face detection and some basic face analysis, such as determining age.
  • Video Indexer, which you can use to detect and identify faces in a video.
  • Face, which offers pre-built algorithms that can detect, recognize, and analyze faces.
  • Cognitive Services

Features of Face include Face Detection, Face Verification, Find Similar Faces, Group faces based on similarities, Identify people

Access

  • Key
  • Endpoint

Improving accuracy

  • Image format - supported images are JPEG, PNG, GIF, and BMP
  • File size - 6 MB or smaller
  • Face size range - from 36 x 36 up to 4096 x 4096. Smaller or larger faces will not be detected
  • Other issues - face detection can be impaired by extreme face angles, occlusion (objects blocking the face such as sunglasses or a hand). Best results are obtained when the faces are full-frontal or as near as possible to full-frontal

Optical Character Recognition

There is machine reading comprehension (MRC), in which an AI system not only reads the text characters, but can use a semantic model to interpret what the text is about.

Services

  • Computer Vision
  • Cognitive Services

Access

  • Key
  • Endpoint

OCR API

Small and synchronous hierarchy of information that consists of bounding boxes for:

  • Regions in the image that contain text
  • Lines of text in each region
  • Words in each line of text

Read API

Large and asyncrhonously returns Operation Id for:

  • Pages - One for each page of text, including information about the page size and orientation.
  • Lines - The lines of text on a page.
  • Words - The words in a line of text.

Form Recognizer

Uses prebuilt receipt models or custom models to match, process, and identifiy data.

Services

  • Form Recognizer
  • Cognitive Services

Custom Training

  • Images must be JPEG, PNG, BMP, PDF, or TIFF formats
  • File size must be less than 50 MB
  • Image size between 50 x 50 pixels and 10000 x 10000 pixels
  • For PDF documents, no larger than 17 inches x 17 inches

Next