added 2990 characters in body

Source Link

edited Apr 9 at 20:21

175
1
3
13

<ipython-input-64-08560ac86bab>:2: UserWarning: To copy  construct from a tensor, it is recommended to use    sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than   torch.tensor(sourceTensor).
  images_tensor = torch.tensor(images, requires_grad=True)
  <ipython-input-64-08560ac86bab>:3: UserWarning: To copy     construct from a tensor, it is recommended to use     sourceTensor.clone().detach() or   sourceTensor.clone().detach().requires_grad_(True), rather than    torch.tensor(sourceTensor).
  labels_tensor = torch.tensor(labels)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most   recent call last)
<ipython-input-65-49bfbb2b28f0> in <cell line: Given20>()
 groups=118     plt.show()
 19 
---> 20 show_attention_maps(X, y)

9 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py  in batch_norm(input, running_mean, running_var, weight, ofbias,  training, momentum, eps)
   2480         _verify_batch_size(input.size())
 [64  2481 
-> 2482     return torch.batch_norm(
   2483         input, 3weight, 7bias, 7]running_mean, expected input[1   running_var, 15training, 224momentum, eps, torch.backends.cudnn.enabled
224] to have 32484 channels, but got 15 channels)

RuntimeError: insteadrunning_mean should contain 1 elements not 64

My code is here: https://colab.research.google.com/drive/1oRSbYDxIsUIssFGs3iXzvKw3ziK1UHOl#scrollTo=ysoteG6DvsnR

Is there anything fundamental I am missing here? Do I have to do additional stepstried changing the image size in preprocessing? Any help would be greatly appreciated and would help me learnchanging the model to resnet152 instead of resnet18. My understanding from the research I have done is that the batchnorm in the first layer expects input size 1, but I have 64. I am not sure how that can be changed.

My code is here:

model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
import torch.nn as nn
new_conv1 = nn.Conv2d(15, 1, kernel_size=1, stride=1, padding=112)     
nn.init.constant_(new_conv1.weight, 1)
model.conv1 = new_conv1
model.eval()

for param in model.parameters():
    param.requires_grad = False

def show_attention_maps(X, y):
X_tensor = torch.cat([preprocess(Image.fromarray(x)) for x in X], dim=0)
y_tensor = torch.LongTensor(y)
attention = compute_attention_maps(X_tensor, y_tensor, model)
attention = attention.numpy()

N = X.shape[0]
for i in range(N):
    plt.subplot(2, N, i + 1)
    plt.imshow(X[i])
    plt.axis('off')
    plt.title(class_names[y[i]])
    plt.subplot(2, N, N + i + 1)
    plt.imshow(attention[i], cmap=plt.cm.gray)
    plt.axis('off')
    plt.gcf().set_size_inches(12, 5)
plt.suptitle('Attention maps')
plt.show()

show_attention_maps(X, y)

def compute_attention_maps(images, labels, model):
    images_tensor = torch.tensor(images, requires_grad=True)
    labels_tensor = torch.tensor(labels)
    predictions = model(images_tensor.unsqueeze(0))
    criterion = torch.nn.CrossEntropyLoss()
    loss = criterion(predictions, labels_tensor)
    model.zero_grad()
    loss.backward()
    gradients = images_tensor.grad
    attention_maps = torch.mean(gradients.abs(), dim=1)
    return attention_maps

Thank you very much in advance.

Edit: I changed my question because I was able to solve my previous problem by changing the resnet's conv1 (in line 3 of my code provided) and I am still trying to compute attention maps.

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[1, 15, 224, 
224] to have 3 channels, but got 15 channels instead

My code is here: https://colab.research.google.com/drive/1oRSbYDxIsUIssFGs3iXzvKw3ziK1UHOl#scrollTo=ysoteG6DvsnR

Is there anything fundamental I am missing here? Do I have to do additional steps in preprocessing? Any help would be greatly appreciated and would help me learn.

Thank you very much in advance.

<ipython-input-64-08560ac86bab>:2: UserWarning: To copy  construct from a tensor, it is recommended to use    sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than   torch.tensor(sourceTensor).
  images_tensor = torch.tensor(images, requires_grad=True)
  <ipython-input-64-08560ac86bab>:3: UserWarning: To copy     construct from a tensor, it is recommended to use     sourceTensor.clone().detach() or   sourceTensor.clone().detach().requires_grad_(True), rather than    torch.tensor(sourceTensor).
  labels_tensor = torch.tensor(labels)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most   recent call last)
<ipython-input-65-49bfbb2b28f0> in <cell line: 20>()
 18     plt.show()
 19 
---> 20 show_attention_maps(X, y)

9 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py  in batch_norm(input, running_mean, running_var, weight, bias,  training, momentum, eps)
   2480         _verify_batch_size(input.size())
   2481 
-> 2482     return torch.batch_norm(
   2483         input, weight, bias, running_mean,     running_var, training, momentum, eps, torch.backends.cudnn.enabled
   2484     )

RuntimeError: running_mean should contain 1 elements not 64

I have tried changing the image size in preprocessing and changing the model to resnet152 instead of resnet18. My understanding from the research I have done is that the batchnorm in the first layer expects input size 1, but I have 64. I am not sure how that can be changed.

My code is here:

model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
import torch.nn as nn
new_conv1 = nn.Conv2d(15, 1, kernel_size=1, stride=1, padding=112)     
nn.init.constant_(new_conv1.weight, 1)
model.conv1 = new_conv1
model.eval()

for param in model.parameters():
    param.requires_grad = False

def show_attention_maps(X, y):
X_tensor = torch.cat([preprocess(Image.fromarray(x)) for x in X], dim=0)
y_tensor = torch.LongTensor(y)
attention = compute_attention_maps(X_tensor, y_tensor, model)
attention = attention.numpy()

N = X.shape[0]
for i in range(N):
    plt.subplot(2, N, i + 1)
    plt.imshow(X[i])
    plt.axis('off')
    plt.title(class_names[y[i]])
    plt.subplot(2, N, N + i + 1)
    plt.imshow(attention[i], cmap=plt.cm.gray)
    plt.axis('off')
    plt.gcf().set_size_inches(12, 5)
plt.suptitle('Attention maps')
plt.show()

show_attention_maps(X, y)

def compute_attention_maps(images, labels, model):
    images_tensor = torch.tensor(images, requires_grad=True)
    labels_tensor = torch.tensor(labels)
    predictions = model(images_tensor.unsqueeze(0))
    criterion = torch.nn.CrossEntropyLoss()
    loss = criterion(predictions, labels_tensor)
    model.zero_grad()
    loss.backward()
    gradients = images_tensor.grad
    attention_maps = torch.mean(gradients.abs(), dim=1)
    return attention_maps

Thank you very much in advance.

Edit: I changed my question because I was able to solve my previous problem by changing the resnet's conv1 (in line 3 of my code provided) and I am still trying to compute attention maps.

edited tags

Link

edited Apr 8 at 13:02

Christoph Rackwitz

15.1k
5
37
49

Source Link

asked Apr 7 at 13:33

sindhuja

175
1
3
13

attention map for an image

I am new to pytorch. I want to use imagenet images to understand how much each pixel contributes to the gradient. For this, I am trying to construct attention maps for my images. However, while doing so, I am encountering the following error:

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[1, 15, 224, 
224] to have 3 channels, but got 15 channels instead

My code is here: https://colab.research.google.com/drive/1oRSbYDxIsUIssFGs3iXzvKw3ziK1UHOl#scrollTo=ysoteG6DvsnR

Is there anything fundamental I am missing here? Do I have to do additional steps in preprocessing? Any help would be greatly appreciated and would help me learn.

Thank you very much in advance.

pytorch computer-vision resnet attention-model

Collectives™ on Stack Overflow

Return to Question

attention map for an image