Computer Vision

linking:: AI-900

Analyzing Images

Computer Vision cognitive service uses pre-trained models to analyze images

Resource

Computer Vision
Cognitive Services

Access

Key
Endpoint

Capabilities

Describing an image (highest confidence first)
Tagging visual features
Detecting objects (bounding box)
Detecting brands (bounding box, confidence)
Detecting faces (bounding box, age)
Categorizing an image
Detecting domain-specific content (confidence)
Optical character recognition
Detect image types
Detect image color schemes
Generate thumbnails
Moderate content

Image Classification

You can use a machine learning classification technique (convulational neural network) to predict which category, or class, something belongs to. Classification machine learning models use a set of inputs, which we call features, to calculate a probability score for each possible class and predict a label that indicates the most likely class that an object belongs to.

Resource

Custom Vision (training, prediction, or both)
Cognitive Services

Evaluation

Precision: What percentage of the class predictions made by the model were correct? For example, if the model predicted that 10 images are oranges, of which eight were actually oranges, then the precision is 0.8 (80%).
Recall: What percentage of class predictions did the model correctly identify? For example, if there are 10 images of apples, and the model found 7 of them, then the recall is 0.7 (70%).
Average Precision (AP): An overall metric that takes into account both precision and recall).

Access

Project ID
Model name
Prediction endpoint
Prediction key

Object Detection

A form of machine learning based computer vision in which a model is trained to recognize individual types of object in an image, and to identify their location in the image.

Class
Probability
Bounding box

Resource

Custom Vision (training, prediction, or both (both separates into two resources))
Cognitive Services

Evaluation

Precision: What percentage of class predictions did the model correctly identify? For example, if the model predicted that 10 images are oranges, of which eight were actually oranges, then the precision is 0.8 (80%).
Recall: What percentage of the class predictions made by the model were correct? For example, if there are 10 images of apples, and the model found 7 of them, then the recall is 0.7 (70%).
Mean Average Precision (mAP): An overall metric that takes into account both precision and recall across all classes).

Access

Project ID
Model name
Prediction endpoint
Prediction key

Face Analysis

Face detection involves identifying regions of an image that contain a human face, typically by returning bounding box coordinates that form a rectangle around the face.
Facial analysis uses landmarks to get details such as age or emotions.
Facial Recognition identifies known users based on their features.

Services

Computer Vision, which offers face detection and some basic face analysis, such as determining age.
Video Indexer, which you can use to detect and identify faces in a video.
Face, which offers pre-built algorithms that can detect, recognize, and analyze faces.
Cognitive Services

Features of Face include Face Detection, Face Verification, Find Similar Faces, Group faces based on similarities, Identify people

Access

Key
Endpoint

Improving accuracy

Image format - supported images are JPEG, PNG, GIF, and BMP
File size - 6 MB or smaller
Face size range - from 36 x 36 up to 4096 x 4096. Smaller or larger faces will not be detected
Other issues - face detection can be impaired by extreme face angles, occlusion (objects blocking the face such as sunglasses or a hand). Best results are obtained when the faces are full-frontal or as near as possible to full-frontal

Optical Character Recognition

There is machine reading comprehension (MRC), in which an AI system not only reads the text characters, but can use a semantic model to interpret what the text is about.

Services

Computer Vision
Cognitive Services

Access

Key
Endpoint

OCR API

Small and synchronous hierarchy of information that consists of bounding boxes for:

Regions in the image that contain text
Lines of text in each region
Words in each line of text

Read API

Large and asyncrhonously returns Operation Id for:

Pages - One for each page of text, including information about the page size and orientation.
Lines - The lines of text on a page.
Words - The words in a line of text.

Form Recognizer

Uses prebuilt receipt models or custom models to match, process, and identifiy data.

Services

Form Recognizer
Cognitive Services

Custom Training

Images must be JPEG, PNG, BMP, PDF, or TIFF formats
File size must be less than 50 MB
Image size between 50 x 50 pixels and 10000 x 10000 pixels
For PDF documents, no larger than 17 inches x 17 inches

🪴 Quartz 4.0

Explorer

03 - Computer Vision

Computer Vision

Analyzing Images

Resource

Access

Capabilities

Image Classification

Resource

Evaluation

Access

Object Detection

Resource

Evaluation

Access

Face Analysis

Services

Access

Improving accuracy

Optical Character Recognition

Services

Access

OCR API

Read API

Form Recognizer

Services

Custom Training

Graph View

Table of Contents