Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi, I'm currently trying to use an updated wespeaker voice model like the one shown in the picture, but when I follow the file pyannote/audio/models/embedding/wespeaker/convert.py I can't adapt it, it shows the following error, how do I change ? #1772

Open
LiLiWangzz opened this issue Oct 13, 2024 · 1 comment

Comments

@LiLiWangzz
Copy link

          Hi, I'm currently trying to use an updated wespeaker voice model like the one shown in the picture, but when I follow the file pyannote/audio/models/embedding/wespeaker/convert.py I can't adapt it, it shows the following error, how do I change ?

@hbredin
WESPEAKER
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "layer1.0.conv1.weight", "layer1.0.bn1.weight", "layer1.0.bn1.bias", "layer1.0.bn1.running_mean", "layer1.0.bn1.running_var", "layer1.0.conv2.weight", "layer1.0.bn2.weight", "layer1.0.bn2.bias", "layer1.0.bn2.running_mean", "layer1.0.bn2.running_var", "layer1.1.conv1.weight", "layer1.1.bn1.weight", "layer1.1.bn1.bias", "layer1.1.bn1.running_mean", "layer1.1.bn1.running_var", "layer1.1.conv2.weight", "layer1.1.bn2.weight", "layer1.1.bn2.bias", "layer1.1.bn2.running_mean", "layer1.1.bn2.running_var", "layer1.2.conv1.weight", "layer1.2.bn1.weight", "layer1.2.bn1.bias", "layer1.2.bn1.running_mean", "layer1.2.bn1.running_var", "layer1.2.conv2.weight", "layer1.2.bn2.weight", "layer1.2.bn2.bias", "layer1.2.bn2.running_mean", "layer1.2.bn2.running_var", "layer2.0.conv1.weight", "layer2.0.bn1.weight", "layer2.0.bn1.bias", "layer2.0.bn1.running_mean", "layer2.0.bn1.running_var", "layer2.0.conv2.weight", "layer2.0.bn2.weight", "layer2.0.bn2.bias", "layer2.0.bn2.running_mean", "layer2.0.bn2.running_var", "layer2.0.shortcut.0.weight", "layer2.0.shortcut.1.weight", "layer2.0.shortcut.1.bias", "layer2.0.shortcut.1.running_mean", "layer2.0.shortcut.1.running_var", "layer2.1.conv1.weight", "layer2.1.bn1.weight", "layer2.1.bn1.bias", "layer2.1.bn1.running_mean", "layer2.1.bn1.running_var", "layer2.1.conv2.weight", "layer2.1.bn2.weight", "layer2.1.bn2.bias", "layer2.1.bn2.running_mean", "layer2.1.bn2.running_var", "layer2.2.conv1.weight", "layer2.2.bn1.weight", "layer2.2.bn1.bias", "layer2.2.bn1.running_mean", "layer2.2.bn1.running_var", "layer2.2.conv2.weight", "layer2.2.bn2.weight", "layer2.2.bn2.bias", "layer2.2.bn2.running_mean", "layer2.2.bn2.running_var", "layer2.3.conv1.weight", "layer2.3.bn1.weight", "layer2.3.bn1.bias", "layer2.3.bn1.running_mean", "layer2.3.bn1.running_var", "layer2.3.conv2.weight", "layer2.3.bn2.weight", "layer2.3.bn2.bias", "layer2.3.bn2.running_mean", "layer2.3.bn2.running_var", "layer3.0.conv1.weight", "layer3.0.bn1.weight", "layer3.0.bn1.bias", "layer3.0.bn1.running_mean", "layer3.0.bn1.running_var", "layer3.0.conv2.weight", "layer3.0.bn2.weight", "layer3.0.bn2.bias", "layer3.0.bn2.running_mean", "layer3.0.bn2.running_var", "layer3.0.shortcut.0.weight", "layer3.0.shortcut.1.weight", "layer3.0.shortcut.1.bias", "layer3.0.shortcut.1.running_mean", "layer3.0.shortcut.1.running_var", "layer3.1.conv1.weight", "layer3.1.bn1.weight", "layer3.1.bn1.bias", "layer3.1.bn1.running_mean", "layer3.1.bn1.running_var", "layer3.1.conv2.weight", "layer3.1.bn2.weight", "layer3.1.bn2.bias", "layer3.1.bn2.running_mean", "layer3.1.bn2.running_var", "layer3.2.conv1.weight", "layer3.2.bn1.weight", "layer3.2.bn1.bias", "layer3.2.bn1.running_mean", "layer3.2.bn1.running_var", "layer3.2.conv2.weight", "layer3.2.bn2.weight", "layer3.2.bn2.bias", "layer3.2.bn2.running_mean", "layer3.2.bn2.running_var", "layer3.3.conv1.weight", "layer3.3.bn1.weight", "layer3.3.bn1.bias", "layer3.3.bn1.running_mean", "layer3.3.bn1.running_var", "layer3.3.conv2.weight", "layer3.3.bn2.weight", "layer3.3.bn2.bias", "layer3.3.bn2.running_mean", "layer3.3.bn2.running_var", "layer3.4.conv1.weight", "layer3.4.bn1.weight", "layer3.4.bn1.bias", "layer3.4.bn1.running_mean", "layer3.4.bn1.running_var", "layer3.4.conv2.weight", "layer3.4.bn2.weight", "layer3.4.bn2.bias", "layer3.4.bn2.running_mean", "layer3.4.bn2.running_var", "layer3.5.conv1.weight", "layer3.5.bn1.weight", "layer3.5.bn1.bias", "layer3.5.bn1.running_mean", "layer3.5.bn1.running_var", "layer3.5.conv2.weight", "layer3.5.bn2.weight", "layer3.5.bn2.bias", "layer3.5.bn2.running_mean", "layer3.5.bn2.running_var", "layer4.0.conv1.weight", "layer4.0.bn1.weight", "layer4.0.bn1.bias", "layer4.0.bn1.running_mean", "layer4.0.bn1.running_var", "layer4.0.conv2.weight", "layer4.0.bn2.weight", "layer4.0.bn2.bias", "layer4.0.bn2.running_mean", "layer4.0.bn2.running_var", "layer4.0.shortcut.0.weight", "layer4.0.shortcut.1.weight", "layer4.0.shortcut.1.bias", "layer4.0.shortcut.1.running_mean", "layer4.0.shortcut.1.running_var", "layer4.1.conv1.weight", "layer4.1.bn1.weight", "layer4.1.bn1.bias", "layer4.1.bn1.running_mean", "layer4.1.bn1.running_var", "layer4.1.conv2.weight", "layer4.1.bn2.weight", "layer4.1.bn2.bias", "layer4.1.bn2.running_mean", "layer4.1.bn2.running_var", "layer4.2.conv1.weight", "layer4.2.bn1.weight", "layer4.2.bn1.bias", "layer4.2.bn1.running_mean", "layer4.2.bn1.running_var", "layer4.2.conv2.weight", "layer4.2.bn2.weight", "layer4.2.bn2.bias", "layer4.2.bn2.running_mean", "layer4.2.bn2.running_var", "seg_1.weight", "seg_1.bias".
Unexpected key(s) in state_dict: "front.conv1.weight", "front.bn1.weight", "front.bn1.bias", "front.bn1.running_mean", "front.bn1.running_var", "front.bn1.num_batches_tracked", "front.layer1.0.conv1.weight", "front.layer1.0.bn1.weight", "front.layer1.0.bn1.bias", "front.layer1.0.bn1.running_mean", "front.layer1.0.bn1.running_var", "front.layer1.0.bn1.num_batches_tracked", "front.layer1.0.conv2.weight", "front.layer1.0.bn2.weight", "front.layer1.0.bn2.bias", "front.layer1.0.bn2.running_mean", "front.layer1.0.bn2.running_var", "front.layer1.0.bn2.num_batches_tracked", "front.layer1.1.conv1.weight", "front.layer1.1.bn1.weight", "front.layer1.1.bn1.bias", "front.layer1.1.bn1.running_mean", "front.layer1.1.bn1.running_var", "front.layer1.1.bn1.num_batches_tracked", "front.layer1.1.conv2.weight", "front.layer1.1.bn2.weight", "front.layer1.1.bn2.bias", "front.layer1.1.bn2.running_mean", "front.layer1.1.bn2.running_var", "front.layer1.1.bn2.num_batches_tracked", "front.layer1.2.conv1.weight", "front.layer1.2.bn1.weight", "front.layer1.2.bn1.bias", "front.layer1.2.bn1.running_mean", "front.layer1.2.bn1.running_var", "front.layer1.2.bn1.num_batches_tracked", "front.layer1.2.conv2.weight", "front.layer1.2.bn2.weight", "front.layer1.2.bn2.bias", "front.layer1.2.bn2.running_mean", "front.layer1.2.bn2.running_var", "front.layer1.2.bn2.num_batches_tracked", "front.layer2.0.conv1.weight", "front.layer2.0.bn1.weight", "front.layer2.0.bn1.bias", "front.layer2.0.bn1.running_mean", "front.layer2.0.bn1.running_var", "front.layer2.0.bn1.num_batches_tracked", "front.layer2.0.conv2.weight", "front.layer2.0.bn2.weight", "front.layer2.0.bn2.bias", "front.layer2.0.bn2.running_mean", "front.layer2.0.bn2.running_var", "front.layer2.0.bn2.num_batches_tracked", "front.layer2.0.downsample.0.weight", "front.layer2.0.downsample.1.weight", "front.layer2.0.downsample.1.bias", "front.layer2.0.downsample.1.running_mean", "front.layer2.0.downsample.1.running_var", "front.layer2.0.downsample.1.num_batches_tracked", "front.layer2.1.conv1.weight", "front.layer2.1.bn1.weight", "front.layer2.1.bn1.bias", "front.layer2.1.bn1.running_mean", "front.layer2.1.bn1.running_var", "front.layer2.1.bn1.num_batches_tracked", "front.layer2.1.conv2.weight", "front.layer2.1.bn2.weight", "front.layer2.1.bn2.bias", "front.layer2.1.bn2.running_mean", "front.layer2.1.bn2.running_var", "front.layer2.1.bn2.num_batches_tracked", "front.layer2.2.conv1.weight", "front.layer2.2.bn1.weight", "front.layer2.2.bn1.bias", "front.layer2.2.bn1.running_mean", "front.layer2.2.bn1.running_var", "front.layer2.2.bn1.num_batches_tracked", "front.layer2.2.conv2.weight", "front.layer2.2.bn2.weight", "front.layer2.2.bn2.bias", "front.layer2.2.bn2.running_mean", "front.layer2.2.bn2.running_var", "front.layer2.2.bn2.num_batches_tracked", "front.layer2.3.conv1.weight", "front.layer2.3.bn1.weight", "front.layer2.3.bn1.bias", "front.layer2.3.bn1.running_mean", "front.layer2.3.bn1.running_var", "front.layer2.3.bn1.num_batches_tracked", "front.layer2.3.conv2.weight", "front.layer2.3.bn2.weight", "front.layer2.3.bn2.bias", "front.layer2.3.bn2.running_mean", "front.layer2.3.bn2.running_var", "front.layer2.3.bn2.num_batches_tracked", "front.layer3.0.conv1.weight", "front.layer3.0.bn1.weight", "front.layer3.0.bn1.bias", "front.layer3.0.bn1.running_mean", "front.layer3.0.bn1.running_var", "front.layer3.0.bn1.num_batches_tracked", "front.layer3.0.conv2.weight", "front.layer3.0.bn2.weight", "front.layer3.0.bn2.bias", "front.layer3.0.bn2.running_mean", "front.layer3.0.bn2.running_var", "front.layer3.0.bn2.num_batches_tracked", "front.layer3.0.downsample.0.weight", "front.layer3.0.downsample.1.weight", "front.layer3.0.downsample.1.bias", "front.layer3.0.downsample.1.running_mean", "front.layer3.0.downsample.1.running_var", "front.layer3.0.downsample.1.num_batches_tracked", "front.layer3.1.conv1.weight", "front.layer3.1.bn1.weight", "front.layer3.1.bn1.bias", "front.layer3.1.bn1.running_mean", "front.layer3.1.bn1.running_var", "front.layer3.1.bn1.num_batches_tracked", "front.layer3.1.conv2.weight", "front.layer3.1.bn2.weight", "front.layer3.1.bn2.bias", "front.layer3.1.bn2.running_mean", "front.layer3.1.bn2.running_var", "front.layer3.1.bn2.num_batches_tracked", "front.layer3.2.conv1.weight", "front.layer3.2.bn1.weight", "front.layer3.2.bn1.bias", "front.layer3.2.bn1.running_mean", "front.layer3.2.bn1.running_var", "front.layer3.2.bn1.num_batches_tracked", "front.layer3.2.conv2.weight", "front.layer3.2.bn2.weight", "front.layer3.2.bn2.bias", "front.layer3.2.bn2.running_mean", "front.layer3.2.bn2.running_var", "front.layer3.2.bn2.num_batches_tracked", "front.layer3.3.conv1.weight", "front.layer3.3.bn1.weight", "front.layer3.3.bn1.bias", "front.layer3.3.bn1.running_mean", "front.layer3.3.bn1.running_var", "front.layer3.3.bn1.num_batches_tracked", "front.layer3.3.conv2.weight", "front.layer3.3.bn2.weight", "front.layer3.3.bn2.bias", "front.layer3.3.bn2.running_mean", "front.layer3.3.bn2.running_var", "front.layer3.3.bn2.num_batches_tracked", "front.layer3.4.conv1.weight", "front.layer3.4.bn1.weight", "front.layer3.4.bn1.bias", "front.layer3.4.bn1.running_mean", "front.layer3.4.bn1.running_var", "front.layer3.4.bn1.num_batches_tracked", "front.layer3.4.conv2.weight", "front.layer3.4.bn2.weight", "front.layer3.4.bn2.bias", "front.layer3.4.bn2.running_mean", "front.layer3.4.bn2.running_var", "front.layer3.4.bn2.num_batches_tracked", "front.layer3.5.conv1.weight", "front.layer3.5.bn1.weight", "front.layer3.5.bn1.bias", "front.layer3.5.bn1.running_mean", "front.layer3.5.bn1.running_var", "front.layer3.5.bn1.num_batches_tracked", "front.layer3.5.conv2.weight", "front.layer3.5.bn2.weight", "front.layer3.5.bn2.bias", "front.layer3.5.bn2.running_mean", "front.layer3.5.bn2.running_var", "front.layer3.5.bn2.num_batches_tracked", "front.layer4.0.conv1.weight", "front.layer4.0.bn1.weight", "front.layer4.0.bn1.bias", "front.layer4.0.bn1.running_mean", "front.layer4.0.bn1.running_var", "front.layer4.0.bn1.num_batches_tracked", "front.layer4.0.conv2.weight", "front.layer4.0.bn2.weight", "front.layer4.0.bn2.bias", "front.layer4.0.bn2.running_mean", "front.layer4.0.bn2.running_var", "front.layer4.0.bn2.num_batches_tracked", "front.layer4.0.downsample.0.weight", "front.layer4.0.downsample.1.weight", "front.layer4.0.downsample.1.bias", "front.layer4.0.downsample.1.running_mean", "front.layer4.0.downsample.1.running_var", "front.layer4.0.downsample.1.num_batches_tracked", "front.layer4.1.conv1.weight", "front.layer4.1.bn1.weight", "front.layer4.1.bn1.bias", "front.layer4.1.bn1.running_mean", "front.layer4.1.bn1.running_var", "front.layer4.1.bn1.num_batches_tracked", "front.layer4.1.conv2.weight", "front.layer4.1.bn2.weight", "front.layer4.1.bn2.bias", "front.layer4.1.bn2.running_mean", "front.layer4.1.bn2.running_var", "front.layer4.1.bn2.num_batches_tracked", "front.layer4.2.conv1.weight", "front.layer4.2.bn1.weight", "front.layer4.2.bn1.bias", "front.layer4.2.bn1.running_mean", "front.layer4.2.bn1.running_var", "front.layer4.2.bn1.num_batches_tracked", "front.layer4.2.conv2.weight", "front.layer4.2.bn2.weight", "front.layer4.2.bn2.bias", "front.layer4.2.bn2.running_mean", "front.layer4.2.bn2.running_var", "front.layer4.2.bn2.num_batches_tracked", "pooling.attention.0.weight", "pooling.attention.0.bias", "pooling.attention.2.weight", "pooling.attention.2.bias", "pooling.attention.2.running_mean", "pooling.attention.2.running_var", "pooling.attention.2.num_batches_tracked", "pooling.attention.3.weight", "pooling.attention.3.bias", "bottleneck.weight", "bottleneck.bias".

Originally posted by @LiLiWangzz in #1590 (comment)

@clement-pages
Copy link
Collaborator

Hey @LiLiWangzz, pyannote/audio/models/embedding/wespeaker/convert.py is not dedicated to that. Furthermore, SimAMResNetxx is not currently supported by pyannote, but feel free to open a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants