Edit

Create a dataset

To ensure a machine learning model you create performs well, you need to train it on a variety of images that cover the range of things your machine should be able to recognize.

To train a model, you need a dataset that meets the following criteria:

the dataset contains at least 15 images
at least 80% of the images have labels
for each selected label, at least 10 bounding boxes exist

This page explains how to create a dataset that meets these criteria for your training purposes.

Prerequisites

a machine connected to the Viam app

Add a new machine in the Viam app. On the machine’s page, follow the to install viam-server on the computer you’re using for your project. Wait until your machine has successfully connected to the Viam app.

a camera, connected to your machine, to capture images

Follow the guide to configure a webcam or similar camera component.

Create a dataset

To create a dataset, use the Viam CLI or the Viam app:

Viam app
CLI

Open the DATASETS tab on the DATA page of the Viam app.
Click the + Create dataset button.
Enter a unique name for the dataset.
Click the Create dataset button to create the dataset.

First, install the Viam CLI and authenticate:
To download the Viam CLI on a macOS computer, install brew and run the following commands:
```
brew tap viamrobotics/brews
brew install viam
```
To download the Viam CLI on a Linux computer with the aarch64 architecture, run the following commands:
```
sudo curl -o /usr/local/bin/viam https://storage.googleapis.com/packages.viam.com/apps/viam-cli/viam-cli-stable-linux-arm64
sudo chmod a+rx /usr/local/bin/viam
```
To download the Viam CLI on a Linux computer with the amd64 (Intel x86_64) architecture, run the following commands:
```
sudo curl -o /usr/local/bin/viam https://storage.googleapis.com/packages.viam.com/apps/viam-cli/viam-cli-stable-linux-amd64
sudo chmod a+rx /usr/local/bin/viam
```
You can also install the Viam CLI using brew on Linux amd64 (Intel x86_64):
```
brew tap viamrobotics/brews
brew install viam
```
Download the binary and run it directly to use the Viam CLI on a Windows computer.
If you have Go installed, you can build the Viam CLI directly from source using the go install command:
```
go install go.viam.com/rdk/cli/viam@latest
```
To confirm viam is installed and ready to use, issue the viam command from your terminal. If you see help instructions, everything is correctly installed. If you do not see help instructions, add your local go/bin/* directory to your PATH variable. If you use bash as your shell, you can use the following command:
```
echo 'export PATH="$HOME/go/bin:$PATH"' >> ~/.bashrc
```
For more information see install the Viam CLI.
Log in to the CLI.
Run the following command to create a dataset, replacing the <org-id> and <name> placeholders with your organization ID and a unique name for the dataset:
```
viam dataset create --org-id=<org-id> --name=<name>
```

You can add images to a dataset directly from a camera or vision component feed in the CONTROL or CONFIGURATION tabs of the Viam app.

To add an image directly to a dataset from a visual feed, complete the following steps:

Open the TEST panel of any camera or vision service component to view a feed of images from the camera.
Click the button marked with the camera icon to save the currently displayed image to a dataset:
Select an existing dataset.
Click Add to add the image to the selected dataset.
When you see a success notification that reads “Saved image to dataset”, you have successfully added the image to the dataset.

To view images added to your dataset, go to the DATA page’s DATASETS tab in the Viam app and select your dataset.

To capture a large number of images for training an ML model, Capture and sync image data using the data management service with your camera.

Viam stores the images saved by capture and sync on the DATA page, but does not add the images to a dataset. We recommend you tag the images first and then use the CLI to add the tagged images to a dataset.

Tip

Once you have enough images, consider disabling data capture to avoid incurring fees for capturing large amounts of training data.

Once you’ve captured enough images for training, you must annotate them to train a model.

Annotate images

Use the interface on the DATA page to annotate your images. Always follow best practices when you label your images:

More data means better models: Incorporate as much data as you practically can to improve your model’s overall performance.
Include counterexamples: Include images with and without the object you’re looking to classify. This helps the model distinguish the target object from the background and reduces the chances of false positives by teaching the model what the object is not.
Avoid class imbalance: Don’t train excessively on one specific type or class, make sure each category has a roughly equal number of images. For instance, if you’re training a dog detector, include images of various dog breeds to avoid bias towards one breed. An imbalanced dataset can lead the model to favor one class over others, reducing its overall accuracy.
Match training images to intended use case: Use images that reflect the quality and conditions of your production environment. For example, if you plan to use a low-quality camera in production, train with low-quality images. Similarly, if your model will run all day, capture images in daylight, nighttime, dusk, and dawn conditions.
Vary angles and distances: Include image examples from every angle and distance that you expect the model to handle.

Viam enables you to annotate images for the following machine learning methods:

Classification
Object detection

Classification determines a descriptive tag or set of tags for an image. For example, classification could help you identify:

whether an image of a food display appears full, empty, or average
the quality of manufacturing output: good or bad
what combination of toppings exists on a pizza: pepperoni, sausage and pepper, or pineapple and ham and mushroom

Viam supports single and multiple label classification. To create a training set for classification, annotate tags to describe your images.

To tag an image:

Click on an image, then click the + next to the Tags option.
Add one or more tags to your image.

Repeat these steps for all images in the dataset.

Object detection identifies and determines the location of certain objects in an image. For example, object detection could help you identify:

how many pizza objects appear on a counter
the number of bicycle and pedestrian objects on a greenway
which plant objects are popular with deer in your garden

To create a training set for object detection, annotate bounding boxes to teach your model to identify objects that you want to detect in future images.

To label an object with a bounding box:

Click on an image, then click the Annotate button in right side menu.
Choose an existing label or create a new label.
Holding the command key (on macOS), or the control key (on Linux and Windows), click and drag on the image to create the bounding box:

Tip

Once created, you can move, resize, or delete the bounding box.

Repeat these steps for all images in the dataset.

Add tagged images to a dataset

Open the DATA page of the Viam app.
Navigate to the ALL DATA tab.
Use the checkbox in the upper left of each image to select labeled images.
Click the Add to dataset button, select a dataset, and click the Add … images button to add the selected images to the dataset.

Use the Viam CLI to filter images by label and add the filtered images to a dataset:

First, create a dataset, if you haven’t already.
If you just created a dataset, use the dataset ID output by the creation command. If your dataset already exists, run the following command to get a list of dataset names and corresponding IDs:
```
viam dataset list
```
Run the following command to add all images labeled with a subset of tags to the dataset, replacing the <dataset-id> placeholder with the dataset ID output by the command in the previous step:
```
viam dataset data add filter --dataset-id=<dataset-id> --tags=red_star,blue_square
```

The following script adds all images captured from a certain machine to a new dataset. Complete the following steps to use the script:

Copy and paste the following code into a file named add_images_from_machine_to_dataset.py on your machine.

import asyncio
from typing import List, Optional

from viam.rpc.dial import DialOptions, Credentials
from viam.app.viam_client import ViamClient
from viam.utils import create_filter

# Configuration constants – replace with your actual values
DATASET_NAME = "" # a unique, new name for the dataset you want to create
ORG_ID = "" # your organization ID, find in your organization settings
PART_ID = "" # id of machine that captured target images, find in machine config
API_KEY = "" # API key, find or create in your organization settings
API_KEY_ID = "" # API key ID, find or create in your organization settings

# Adjust the maximum number of images to add to the dataset
MAX_MATCHES = 500

async def connect() -> ViamClient:
    """Establish a connection to the Viam client using API credentials."""
    dial_options = DialOptions(
        credentials=Credentials(
            type="api-key",
            payload=API_KEY,
        ),
        auth_entity=API_KEY_ID,
    )
    return await ViamClient.create_from_dial_options(dial_options)


async def fetch_binary_data_ids(data_client, part_id: str) -> List[str]:
    """Fetch binary data metadata and return a list of BinaryData objects."""
    data_filter = create_filter(part_id=part_id)
    all_matches = []
    last: Optional[str] = None

    print("Getting data for part...")

    while len(all_matches) < MAX_MATCHES:
        print("Fetching more data...")
        data, _, last = await data_client.binary_data_by_filter(
            data_filter,
            limit=50,
            last=last,
            include_binary_data=False,
        )
        if not data:
            break
        all_matches.extend(data)

    return all_matches


async def main() -> int:
    """Main execution function."""
    viam_client = await connect()
    data_client = viam_client.data_client

    matching_data = await fetch_binary_data_ids(data_client, PART_ID)

    print("Creating dataset...")

    try:
        dataset_id = await data_client.create_dataset(
            name=DATASET_NAME,
            organization_id=ORG_ID,
        )
        print(f"Created dataset: {dataset_id}")
    except Exception as e:
        print("Error creating dataset. It may already exist.")
        print("See: https://app.viam.com/data/datasets")
        print(f"Exception: {e}")
        return 1

    print("Adding data to dataset...")

    await data_client.add_binary_data_to_dataset_by_ids(
        binary_ids=[obj.metadata.binary_data_id for obj in matching_data],
        dataset_id=dataset_id
    )

    print("Added files to dataset.")
    print(f"See dataset: https://app.viam.com/data/datasets?id={dataset_id}")

    viam_client.close()
    return 0


if __name__ == "__main__":
    asyncio.run(main())

Fill in the placeholders with values for your own organization, API key, machine, and dataset.
Install the Viam Python SDK by running the following command:
```
pip install viam-sdk
```
Finally, run the following command to add the images to the dataset:
```
python add_images_from_machine_to_dataset.py
```

Was this page helpful?

Glad to hear it! If you have any other feedback please let us know:

We're sorry about that. To help us improve, please tell us what we can do better:

Thank you!