how can I modify Dataset class to make the mask RCNN work with multiple objects?

I am currently working on instance segmentation. I follow these two tutorials:

  1. https://haochen23.github.io/2020/06/fine-tune-mask-rcnn-pytorch.html

  2. https://colab.research.google.com/github/dlmacedo/starter-academic/blob/master/content/courses/deeplearning/notebooks/pytorch/torchvision_finetuning_instance_segmentation.ipynb#scrollTo=mTgWtixZTs3X

However, these two tutorials work perfectly with one class like person + background. But in my case, I have two classes like a person and car + background. I didn't find any resources about making the Mask RCNN work with multiple objects.

Notice that:

  1. I am using PyTorch ( torchvision ), torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0

  2. I am using a Pascal VOC annotation

  3. i used segmentation class (not the XML file) + the images

and this is my dataset class

class PennFudanDataset(torch.utils.data.Dataset):
def __init__(self, root, transforms=None):
    self.root = root
    self.transforms = transforms
    # load all image files, sorting them to
    # ensure that they are aligned
    self.imgs = list(sorted(os.listdir(os.path.join(root, "img"))))
    self.masks = list(sorted(os.listdir(os.path.join(root, "imgMask"))))

def __getitem__(self, idx):
    # load images ad masks
    img_path = os.path.join(self.root, "img", self.imgs[idx])
    mask_path = os.path.join(self.root, "imgMask", self.masks[idx])
    img = Image.open(img_path).convert("RGB")
    # note that we haven't converted the mask to RGB,
    # because each color corresponds to a different instance
    # with 0 being background
    mask = Image.open(mask_path)

    mask = np.array(mask)
    # instances are encoded as different colors
    obj_ids = np.unique(mask)
    # first id is the background, so remove it
    obj_ids = obj_ids[1:]

    # split the color-encoded mask into a set
    # of binary masks
    masks = mask == obj_ids[:, None, None]

    # get bounding box coordinates for each mask
    num_objs = len(obj_ids)
    boxes = []
    for i in range(num_objs):
        pos = np.where(masks[i])
        xmin = np.min(pos[1])
        xmax = np.max(pos[1])
        ymin = np.min(pos[0])
        ymax = np.max(pos[0])
        boxes.append([xmin, ymin, xmax, ymax])

    boxes = torch.as_tensor(boxes, dtype=torch.float32)
    # there is only one class
    labels = torch.ones((num_objs,), dtype=torch.int64) 
    masks = torch.as_tensor(masks, dtype=torch.uint8)

    image_id = torch.tensor([idx])
    area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
    # suppose all instances are not crowd
    iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

    target = {}
    target["boxes"] = boxes
    target["labels"] = labels
    target["masks"] = masks
    target["image_id"] = image_id
    target["area"] = area
    target["iscrowd"] = iscrowd
    if self.transforms is not None:
        img, target = self.transforms(img, target)

    return img, target

def __len__(self):
    return len(self.imgs)

anyone can help me?

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum