Image Annotation For Computer Vision Guide
Image Annotation For Computer Vision Guide
Image Annotation For Computer Vision Guide
Annotation for
Computer Vision
A Guide to Labeling Visual
Data for Your Machine
Learning Project
1
Image Annotation for Computer Vision
A Guide to Labeling Visual Data for Your Machine Learning Project
The images you use to train, validate, and test your computer vision algorithms will
have a significant effect on the success of your AI project. Each image in your dataset
must be thoughtfully and accurately labeled to train an AI system to recognize objects
similar to the way a human can. The higher the quality of your annotations, the better
your machine learning models are likely to perform.
While the volume and variety of your image data is likely growing every day, getting
images annotated according to your specifications can be a challenge that slows your
project and, as a result, your speed to market. The choices you make about your image
annotation techniques, to ols, and workforce are worth thoughtful consideration.
In this guide, we’ll cover image annotation for computer vision using
supervised learning.
First, we’ll explain image annotation in greater detail, introducing you to key terms and
concepts. Next, we’ll explore how image annotation is used for machine learning and
some of the techniques that are available for annotating visual data, including images
and videos.
Finally, we’ll share why decisions about your workforce are an important success
factor for any machine learning project. We’ll give you considerations for selecting the
right workforce, and you’ll get a short list of critical questions to ask a potential image
annotation service provider.
> You have visual data (i.e., images, videos) from imaging technology
that you want to prepare for use in training machine learning or deep
learning models.
> You have annotated visual data but it does not meet your project’s
quality requirements.
> You want to learn how you can use visual data to train high-performance
machine learning or deep learning models.
3
THE BASICS: IMAGE ANNOTATION FOR MACHINE LEARNING
In machine learning and deep learning, image annotation is the process of labeling or
classifying an image using text, annotation tools, or both, to show the data features you
want your model to recognize on its own. When you annotate an image, you are adding
metadata to a dataset.
Image annotation is a type of data labeling that is sometimes called tagging, transcribing,
or processing. You also can annotate videos continuously, as a stream, or frame by
frame.
Image annotation marks the features you want your machine learning system to
recognize, and you can use the images to train your model using supervised learning.
Image annotation is most commonly used to recognize objects and boundaries and
to segment images for instance, meaning, or whole-image understanding. For each of
these uses, it takes a significant amount of data to train, validate, and test a machine
learning model to achieve the desired outcome.
Images and multi-frame images, such as video, can be annotated for machine learning.
Videos can be annotated continuously, as a stream, or frame by frame.
These are the most common types of data used with image annotation:
> 2-D images and video (multi-frame), including data from cameras or
other imaging technology, such as a SLR (single lens reflex) camera or
an optical microscope
> 3-D images and video (multi-frame), including data from cameras
or other imaging technology, such as electron, ion, or scanning probe
microscopes
5
How are images annotated?
You can annotate images using commercially-available, open source, or freeware data
annotation tools. If you are working with a lot of data, you also will need a trained
workforce to annotate the images. Tools provide feature sets with various combinations
of capabilities, which can be used by your workforce to annotate images, multi-frame
images, or video, which can be annotated as stream or frame by frame.
Yes; there are image annotation services. If you are doing image annotation in-house or
using contractors, there are services that can provide crowdsourced or professionally-
managed team solutions to assist with scaling your annotation process. We’ll address
this area in more detail later in this guide.
There are four primary types of image annotation you can use to train your computer
vision AI model.
Each type of image annotation is distinct in how it reveals particular features or areas within
the image. You can determine which type to use based on the data you want your algorithms
to consider.
Image classification is a form of image annotation that seeks to identify the presence of
similar objects depicted in images across an entire dataset. It is used to train a machine
to recognize an object in an unlabeled image that looks like an object in other labeled
images that you used to train the machine. Preparing images for image classification is
sometimes referred to as tagging.
Classification applies across an entire image at a high level. For example, an annotator
could tag interior images of a home with labels such as “kitchen” or “living room.” Or,
an annotator could tag images of the outdoors with labels such as “day” or “night.”
2. Object Recognition/Detection
Object recognition is a form of image annotation that seeks to identify the presence,
location, and number of one or more objects in an image and label them accurately.
It also can be used to identify a single object By repeating this process with different
images, you can train a machine learning model to identify the objects in unlabeled
images on its own.
You can label different objects within a single image with object recognition-compatible
techniques, such as bounding boxes or polygons. For instance, you may have images
of street scenes, and you want to label trucks, cars, bikes, and pedestrians. You could
annotate each of these separately in the same image.
3. Segmentation
7
There are three types of segmentation:
1. Semantic segmentation
2. Instance segmentation
3. Panoptic segmentation
You would use semantic segmentation when you want objects to be grouped, and
it is typically reserved for objects you don’t need to count or track across multiple
images, because the annotation may not reveal size or shape. For example, if you were
annotating images that included both the stadium crowd and the playing field at a
baseball game, you could annotate the crowd to segment the seating from the field.
2. Instance segmentation tracks and counts the presence, location, count, size, and
shape of objects in an image. This type of image annotation is also referred to as object
class. Using the same example of images of a baseball game, you could label each
individual in the stadium and use instance segmentation to determine how many people
were in the crowd.
You can perform either semantic or instance as pixel-wise segmentation, which means
every pixel inside the outline is labeled. You can also perform them with boundary
segmentation, where only the border coordinates are counted.
4. Boundary Recognition
Boundary recognition can be used to train a machine to identify lines and splines,
including traffic lanes, land boundaries, or sidewalks. Boundary recognition is particularly
important for safe operation of autonomous vehicles. For example, the machine learning
models used to program drones must teach them to follow a particular course and
avoid potential obstacles, such as power lines.
9
How do you do image annotation?
To apply annotations to your image data, you will use a data annotation tool. The
availability of data annotation tools for image annotation use cases is growing fast.
Some tools are commercially available, while others are available via open source or
freeware. In most cases, you will have to customize and maintain an open source tool
yourself; however, there are tool providers that host open source tools.
If your project and resources allow it, you may wish to build your own image annotation
tool. This is generally the choice when existing tools don’t meet your requirements or
when you want to build into your tool features that you value as intellectual property
(IP). If you choose this route, be sure that you have the people and resources to maintain,
update, and make improvements to the tool over time.
There are many excellent tools available today for image annotation. Some tools are
narrowly optimized to focus on specific types of labeling, while others offer a broad mix
of capabilities to enable many different kinds of use cases. Making the choice between a
specialized tool or one with a wider set of features or functionality will depend on your
current and anticipated image annotation needs. Keep in mind that there is no tool that
can do it all, so you’ll want to choose a tool that you can grow into as your requirements
change.
Image annotation involves one or more of these techniques, which are supported by
your data annotation tool, depending on its feature sets.
1. Bounding box
2. Landmarking
3. Masking
4. Polygon
5. Polyline
6. Tracking
7. Transcription
This is an example of image annotation using a bounding box. The dog is the object
of interest.
11
2. Landmarking
This is used to plot characteristics in the data, such as with facial recognition to detect
facial features, expressions, and emotions. It also used to annotate body position and
alignment, using pose-point annotations. In annotating images for sports analytics, for
example, you can determine where a baseball pitcher’s hand, wrist, and elbow are in
relation to one another while the pitcher throws the baseball.
This is pixel-level annotation that is used to hide areas in an image and to reveal other
areas of interest. Image masking can make it easier to hone in on certain areas of the
image.
4. Polygon
This is used to mark each of the highest points (vertices) of the target object and
annotate its edges: These are used when objects are more irregular in shape, such as
houses, areas of land, or vegetation.
This is an example of image annotation using a polygon. The dog is the object of interest.
13
5. Polyline
This plots continuous lines made of one or more line segments: These are used when
working with open shapes, such as road lane markers, sidewalks, or power lines.
This is an example of image annotation using a polyline. The street’s lane line is the
object of interest.
This is used to label and plot an object’s movement across multiple frames of video.
Some image annotation tools have features that include interpolation, which allows an
annotator to label one frame, then skip to a later frame, moving the annotation to the
new position, where it was later in time. Interpolation fills in the movement and tracks,
or interpolates, the object’s movement in the interim frames that were not annotated.
This is an example of image annotation using tracking. The car is the object of interest,
spanning multiple frames of video.
15
7. Transcription
This is used to annotate text in images or video when there is multimodal information
(i.e., images and text) in the data.
> Employees: These are individuals on your payroll, full-time or part-time. This option
allows you to build in-house expertise and, typically, respond quickly to change. However,
often those tasked with annotation were not hired to do annotation. It becomes an
addition to their original job description, which means your employees are distracted
from the reason you hired them in the first place. Additionally, scaling an internal team
can be a challenge, as you bear the responsibility and expense of hiring, managing and
training workers - as well as ensuring low churn.
> Contractors: These are temporary or freelance workers who you train to do the work.
Their domain knowledge of your use case can increase over time, and they have the
agility to incorporate changes quickly. With contractors, you often have the flexibility
to scale your team up or down as needed. However, as with employees, you will bear
the responsibility of management burdens and ensuring low worker churn.
> Crowdsourcing: This is an anonymous, ad hoc source of labor. You use a third-party
platform to access large numbers of freelance workers at once, and typically users
of the platform volunteer to do the work you describe. Domain knowledge, or even
annotation experience, is limited, and you are never sure who is working on your data.
Quality tends to be lower with crowdsourced teams because the workers are not vetted
the same way they are with in-house, contracted, or managed teams.
> Managed teams: These are an outsourcing option. Teams are strategically selected,
trained, and professionally managed individuals who work on teams. You share your
requirements and annotation process, and they help you to scale it. Their understanding
of domain knowledge with your use case is likely to increase over time, and they are
likely to have the agility to incorporate changes to your image annotation process.
17
The advantages of outsourced, managed teams
There are three characteristics of outsourced, professionally managed teams that make
them an ideal choice for image annotation, particularly for machine learning use cases.
2. Agility
Machine learning is an iterative process. Your workflow and rules may change as you test
and validate your models and learn from their outcomes. A managed team of annotators
provides the flexibility to incorporate changes in data volume, task complexity, and
task duration. The more adaptive your workforce is, the more machine learning projects
you can work through. The best managed teams for image annotation can provide your
team with valuable insights about data features - that is, the properties, characteristics,
or classifications - that will be analyzed for patterns that help predict the target, or
answer what you want your model to predict.
3. Communication
Managed image annotation teams can use technology to create a closed feedback loop
with you that will establish reliable communication and collaboration between your
project team and annotators. Workers should be able to share what they’re learning as
they work with your data, so you can use their insights to adjust your approach.
19
The best image annotation teams
If you are building machine learning models, the primary reason you will need an image
annotation workforce is to achieve quality image annotation at scale. Using image
data to train machine learning models requires a lot of data - in fact, high-performance
machine learning and deep learning models require massive amounts of data labeled
with high quality. For most AI project teams, that requires a human-in-the-loop approach.
The best image annotation teams are professionally managed teams that
can provide:
This kind of expertise comes with experience doing the many types of annotations
described above, across multiple use cases, clients, and industries. Teams with expertise
have developed processes and workflow best practices. They also know which
annotation tool is best for a particular task or use case. Expertise is important to scaling
your process. Teams with expertise understand how to transform complex tasks into
distributed workflows that support high-quality image annotation.
Quality
Your machine learning models will only be as good as the data that trains them. The
best image annotation services monitor quality and can support, augment, or lead your
team’s quality-assurance efforts. Their domain knowledge and proficiency with your
rules, process, and use cases improves over time, as they work with your images and
learn how you want to resolve edge cases. All of these contribute to higher quality
image annotation and a better performing AI model.
Agility
One constant in AI projects is change. Tasks, workflows, and use cases change. The
best services have experience with many kinds of image annotation. Their teams can
work with yours to manage task iterations as everyone learns during the process, so
you can make improvements that increase throughput and quality. They also can make
changes quickly to your image annotation process to counteract bias or to optimize
your model’s performance.
If you need an image annotation workforce, you may be overwhelmed by the options
available online. It can be challenging to evaluate image annotation services.
Here are questions to keep in mind when you’re speaking with an image
annotation service provider:
Expertise
1. What kind of images can your workforce annotate? How long has your workforce
been annotating images?
2. What types of annotations does your workforce have experience with? Does your
workforce have experience annotating data in my specific domain? (e.g., medical,
agriculture)
3. What tools can your workforce use? What if we have built our own, proprietary
image annotation tool - can you use that?
4. How quickly can you scale the work? What kind of experience does your team have
with a project like this?
Quality
6. What processes are in place to ensure high quality throughout the annotation
process?
7. How do you share quality metrics with our team? What happens when quality
measures aren’t met?
8. If workers change, who trains new team members? Describe how you transfer
context and domain knowledge as individuals transition on or off our image
annotation team.
21
Agility
9. How will our team communicate with your data labeling team?
10. How does your team handle changes to our annotations or workflow? How
quickly can changes be incorporated into our process?
11. Can you scale my image annotation work up or back, per our needs?
Contract Terms
12. What is your pricing model (e.g., per annotation, task, hour)?
14. How do changes in the scale of the work, task definition, or project scope change
pricing for our project? Can we revise task instructions without renegotiating our
contract?
Expertise
Quality
Agility
We have experience with a wide variety of tasks and use cases, and we know how
to manage workflow changes. We put you directly in contact with a team lead, who
works alongside the team and communicates with you via a closed feedback loop.
This allows us to ensure task iterations, problems, and new use cases are managed
quickly.
Our monthly subscription model allows you to scale the work up or down as needed.
We don’t lock you into rigid contract terms or limit your speed to market by requiring
lengthy contract renegotiations if the work changes.
23
Together, we make a positive change.
Reviewers
Anthony Scalabrino, sales engineer at CloudFactory, a provider of professionally managed
teams for image annotation for computer vision.
Tristan Rouillard and Alex Wennman, who are co-founders at Hasty, an AI-powered
image annotation tooling provider that offers tools for a wide variety of use cases and
the flexibility to adapt the tool to support your workflow needs.
By marking the features you want your machine learning system to recognize, you
can use the images to train your model using supervised learning. Once your model is
deployed, you want it to be able to identify those features in images that have not been
annotated and, as a result, make a decision or take some action as a result.
An image annotation tool is a software solution that can be used to label production-
grade image data for machine learning. While some organizations take a do-it-yourself
approach and build their own tools, there are many commercially-available image
annotation tools, as well as open source and freeware tools. Some tools are narrowly
optimized to focus on specific types of labeling, while others offer a broad mix of
capabilities to enable many different kinds of use cases. Making the choice between a
specialized tool or one with a wider set of features or functionality will depend on your
current and anticipated image annotation needs.
Amazon Mechanical Turk is an online platform that allows you to access crowdsourced
workers to do your image annotation work. You use Amazon’s platform to submit
the image annotations you need, and Amazon’s platform distributes that work to
anonymous workers. Also known as Amazon mTurk, this option is best for simple one-
time projects when your tasks can be easily communicated in writing once, without
having additional communication with annotators, and little to no domain expertise or
experience is required.
25
Are there image annotation services?
Yes; there are image annotation services. If you are doing image annotation in-house
or using contractors, there are services that can provide crowdsourced or managed-
team solutions to assist with scaling your process. The best image annotation services
can provide expertise, quality work, agility to evolve tasks and use cases, and a flexible
contracting model to scale work up or down as your needs require.
There are many excellent software tools available for image annotation. The tool you
choose will be dependent on four things:
1) The kind (e.g., image, video) of visual data you are working with;
2) The dimension of that data (i.e., 2-D, 3-D); and
3) How you want the tool to be deployed (e.g., cloud, container, on-premise)
4) The feature sets you want your tool to have (e.g., dataset management, annotation
methods, workforce management, data quality control, security)
In machine learning, an annotated image is one that has been labeled using text,
annotation tools, or both to show the data features you want your model to recognize
on its own. When you annotate an image, you are adding metadata to a dataset. Image
annotation is a type of data labeling that is sometimes called tagging, transcribing, or
processing. You also can annotate videos continuously, as a stream, or by frame.
Image annotation for machine learning is the process of labeling or classifying an image
using text, drawing tools, or both to show the data features you want your model to
recognize on its own. When you annotate an image, you are adding metadata to a
dataset. Image annotation is sometimes called data labeling, tagging, transcribing, or
processing. You also can annotate videos continuously, as a stream, or by frame.
To annotate images for deep learning, you can use commercially-available, open source
or freeware tools. If you are working with a lot of data, you likely will need a workforce
to assist. Tools provide feature sets with various combinations of capabilities, which
can be used to annotate images or video. There are image annotation services that
can provide crowdsourced or managed-team solutions to assist with scaling your
process. The process of image annotation for machine learning and for deep learning
are substantially the same, while the way algorithms are built and trained is different
with deep learning.
Image annotation involves using one or more of these techniques: bounding boxes,
landmarking, masking, polygons, polylines, tracking, or transcription. Techniques will be
supported by your annotation tool. Tools provide feature sets with various combinations
of capabilities, which can be used by your workforce to annotate images or video.
There are image annotation services that can provide crowdsourced or managed-team
solutions to assist with scaling your process.