I am new to pytorch. I want to use imagenet images to understand how much each pixel contributes to the gradient. For this, I am trying to construct attention maps for my images. However, while doing so, I am encountering the following error:

    <ipython-input-64-08560ac86bab>:2: UserWarning: To copy  construct from a tensor, it is recommended to use    sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than   torch.tensor(sourceTensor).
      images_tensor = torch.tensor(images, requires_grad=True)
      <ipython-input-64-08560ac86bab>:3: UserWarning: To copy     construct from a tensor, it is recommended to use     sourceTensor.clone().detach() or   sourceTensor.clone().detach().requires_grad_(True), rather than    torch.tensor(sourceTensor).
      labels_tensor = torch.tensor(labels)
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most   recent call last)
    <ipython-input-65-49bfbb2b28f0> in <cell line: 20>()
     18     plt.show()
     19 
    ---> 20 show_attention_maps(X, y)

    9 frames
    /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py  in batch_norm(input, running_mean, running_var, weight, bias,  training, momentum, eps)
       2480         _verify_batch_size(input.size())
       2481 
    -> 2482     return torch.batch_norm(
       2483         input, weight, bias, running_mean,     running_var, training, momentum, eps, torch.backends.cudnn.enabled
       2484     )

    RuntimeError: running_mean should contain 1 elements not 64


I have tried changing the image size in preprocessing and changing the model to resnet152 instead of resnet18. My understanding from the research I have done is that the batchnorm in the first layer expects input size 1, but I have 64. I am not sure how that can be changed.

My code is here: 

    model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
    import torch.nn as nn
    new_conv1 = nn.Conv2d(15, 1, kernel_size=1, stride=1, padding=112)     
    nn.init.constant_(new_conv1.weight, 1)
    model.conv1 = new_conv1
    model.eval()
    
    for param in model.parameters():
        param.requires_grad = False

    def show_attention_maps(X, y):
    X_tensor = torch.cat([preprocess(Image.fromarray(x)) for x in X], dim=0)
    y_tensor = torch.LongTensor(y)
    attention = compute_attention_maps(X_tensor, y_tensor, model)
    attention = attention.numpy()

    N = X.shape[0]
    for i in range(N):
        plt.subplot(2, N, i + 1)
        plt.imshow(X[i])
        plt.axis('off')
        plt.title(class_names[y[i]])
        plt.subplot(2, N, N + i + 1)
        plt.imshow(attention[i], cmap=plt.cm.gray)
        plt.axis('off')
        plt.gcf().set_size_inches(12, 5)
    plt.suptitle('Attention maps')
    plt.show()

    show_attention_maps(X, y)

    def compute_attention_maps(images, labels, model):
        images_tensor = torch.tensor(images, requires_grad=True)
        labels_tensor = torch.tensor(labels)
        predictions = model(images_tensor.unsqueeze(0))
        criterion = torch.nn.CrossEntropyLoss()
        loss = criterion(predictions, labels_tensor)
        model.zero_grad()
        loss.backward()
        gradients = images_tensor.grad
        attention_maps = torch.mean(gradients.abs(), dim=1)
        return attention_maps


Thank you very much in advance.

Edit: I changed my question because I was able to solve my previous problem by changing the resnet's conv1 (in line 3 of my code provided) and I am still trying to compute attention maps.