Multiple instance image classification in pytorch: model won't learn

Question

0.00/5 (No votes)

See more:

I am working on an image classification model in Pytorch. The setup is the following:
My training instances are bags of images (i.e. one training instance = one bag), where each bag contains a varying amount of images. Each bag has one label associated with it (0 or 1), that indicates whether at least one of the images contains a certain property (e.g. tumor as in my case). The objective is to learn a classifier that classifies new bags as accurately as possible. However the following problem occurs when I try to train my model: when I feed the bags one by one into the CNN architecture, the prediction (the probability that bag has label 1) is instantly either 0 or 1.

Now before I delve into the code itself, which is quite long, I have a similar setup where instead of tissue images I use numbers. So each bag contains a number MNIST like images (just pictures with a number), and the bag gets a positive label (i.e. 1) if one of the bag contains the number 9. Strangely enough, this task with the digits works very well (you can see that the learning works and in the end good classification performance is obtained), even though the setup compared to the tissue images is near identical.

Below I post code sections that differ between these 2 tasks, but they are essentially the following: the input shape of the bags differ, the digit images are much smaller (28x28) and have only one channel while the tissue images are 224x224 and have three channels. Therefor the convolutional layers also vary a little bit in specification.

First code section is of the tissue images, which has the problem that somehow this model won't learn

Python

class Attention(nn.Module):
    def __init__(self):
        super(Attention, self).__init__()
        self.L = 500
        self.D = 128
        self.K = 1


        self.feature_extractor_part1 = nn.Sequential(
            nn.Conv2d(3, 4, kernel_size=4), # 3 because three color channels, each kernel has size 3X4X4
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2),
            nn.Conv2d(4, 8, kernel_size=3), # combine all input for one output
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )

        self.feature_extractor_part2 = nn.Sequential(
            nn.Linear(8 * 54 * 54, self.L),    # y = Ax + b
            nn.ReLU(),
            #add dropout
        )

        self.attention = nn.Sequential(
            nn.Linear(self.L, self.D),
            nn.Tanh(),
            nn.Linear(self.D, self.K)
        )

        self.classifier = nn.Sequential(
            nn.Linear(self.L*self.K, 1),
            nn.Sigmoid()
        )

    # X is input and is one bag
    def forward(self, x):
        x = x.squeeze(0) #remove first dimension the bag tensor

        # feature extraction part
        H = self.feature_extractor_part1(x)  
        H = H.view(-1, 8 * 54 * 54)  
        H = self.feature_extractor_part2(H)  # NxL
        

        # aggregation part
        A = self.attention(H) 
        A = torch.transpose(A, 1, 0)  # KxN
        A = F.softmax(A, dim=1)  # softmax over N

        # H gets multiplied with A, where A is some kind of multiplied H
        M = torch.mm(A, H)  # KxL #so KxL is the feature of the bag
        print(M.shape) #torch.Size([1, 500])
      

        # final transformation part
        Y_prob = self.classifier(M)  # KxL to a one dim output for probability bag label
        Y_hat = torch.ge(Y_prob, 0.5).float()
 

        return Y_prob, Y_hat, A

Second code section is of the digit images, which works perfectly

Python

class Attention(nn.Module):
    def __init__(self):
        super(Attention, self).__init__()
        self.L = 500
        self.D = 128
        self.K = 1

        self.feature_extractor_part1 = nn.Sequential(
            nn.Conv2d(1, 10, kernel_size=5), # 1 because one color channel, 20 output feature #20
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2),
            nn.Conv2d(10, 20, kernel_size=5), #50
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )

        self.feature_extractor_part2 = nn.Sequential(
            nn.Linear(20 * 4 * 4, self.L),    #y= Ax + b   #50 feature maps and size 4x4
            nn.ReLU(),
        )

        self.attention = nn.Sequential(
            nn.Linear(self.L, self.D),
            nn.Tanh(),
            nn.Linear(self.D, self.K)
        )

        self.classifier = nn.Sequential(
            nn.Linear(self.L*self.K, 1),
            nn.Sigmoid()
        )

    # X is input and is one bag
    def forward(self, x):
        
        x = x.squeeze(0) #remove first dimension the bag tensor

        # feature extraction part
        H = self.feature_extractor_part1(x)  
        H = H.view(-1, 20 * 4 * 4) 
        H = self.feature_extractor_part2(H)  # NxL

        # aggregation part
        A = self.attention(H)  # NxK
        A = torch.transpose(A, 1, 0)  # KxN
        A = F.softmax(A, dim=1)  # softmax over N

        # H gets multiplied with A, where A is some kind of multiplied H
        M = torch.mm(A, H)  # KxL #so KxL is the feature of the bag
        
        # final transformation part
        Y_prob = self.classifier(M)  # KxL to a one dim output for probability bag label
        Y_hat = torch.ge(Y_prob, 0.5).float()

        return Y_prob, Y_hat, A

What I have tried:

I have tried changing the convolutional layers (dimension) and the learning rate.

Posted 23-Jun-21 9:58am

Pieter Dujardin

Updated 23-Jun-21 10:23am

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)