Skip to main content
Filter by
Sorted by
Tagged with
1 vote
0 answers
9 views

How does sliding window attention work for Mistral7B model without chunking?

I have a very simple tokenizer like this: %%time tokenizer = Tokenizer(models.BPE(byte_fallback=True)) trainer = trainers.BpeTrainer(vocab_size=vocab_size, special_tokens=["<pad>", &...
Jonathan's user avatar
  • 1,936
0 votes
0 answers
9 views

Recommending a pre-train NER model for geospatial entities

I am trying to find the best pre-trained Hugging Face Transformer model exclusively dedicated to geospatial or location entities to extract location entities in English from a text. Does it work way ...
Amir's user avatar
  • 1
0 votes
1 answer
264 views

Cannot import name 'EncoderDecoderCache' from 'transformers'

When I was run the train-4stage.sh file in LLaVolta's repo, I found it report the error Cannot import name 'EncoderDecoderCache' from 'transformers'. Since there no solution from the Internet, anyone ...
marti Shi's user avatar
0 votes
0 answers
15 views

importing util library failed

i am trying to pip install bertopic command for installing and usng bertopic model, here is my next code : from bertopic import BERTopic topic_model = BERTopic.load("MaartenGr/BERTopic_Wikipedia&...
dato's user avatar
  • 277
0 votes
0 answers
15 views

kernel died when I run : dataset = Dataset.from_dict(data_dict)

I am fine-tuning sam model for my dataset containing train_images and train_masks. I am able to create dict, but when calling last command i.e. to load dataset from dict, kernel dies. It happened ...
Sanju 's user avatar
0 votes
0 answers
28 views

torch.OutOfMemoryError: CUDA out of memory - Training Donut Model with Geforce RTX 3060 GPU

I am trying to train a hugging face model locally with my GPU which has 12 GB of memory. Every time I run the code: # Fine-tune the model training_args = Seq2SeqTrainingArguments( output_dir="...
Jacob Narayan's user avatar
0 votes
0 answers
22 views

Unhashable type when calling HuggingFace topic model `topic_labels_` function

If I try to follow the topic modeling tutorial at: https://huggingface.co/docs/hub/en/bertopic The first few lines give me an error: from bertopic import BERTopic topic_model = BERTopic.load("...
coolhand's user avatar
  • 2,049
0 votes
0 answers
30 views

Runtime Error: chuck expects at least a 1-dimensional tensor while fine tuning Llama using transformers

I'm fine tuning a Llama-3.2-3B-Instruct model with a custom dataset. The training script works on one GPU (out of memory, which is possible), but fails with RuntimeError: chunk expects at least a 1-...
majTheHero's user avatar
0 votes
0 answers
25 views

Why does getModelJSON on transformer.js throw an error?

I'm using transformer.js within my Angular app. Locally I dont get any errors but on deployment I get the following error: dialog-agent.component.ts:39 ERROR SyntaxError: Unexpected token '<', &...
Ero Stefano's user avatar
-3 votes
0 answers
23 views

Why does transformer.js throw an error with Angular on Firebase? [closed]

Today I added transformer.js on my Angular web app. Locally it works fine but the app deployed on Firebase throws an error. dialog-agent.component.ts:39 ERROR SyntaxError: Unexpected token '<', &...
Ero Stefano's user avatar
1 vote
0 answers
27 views

How to resolve the meta-3b-instruct Auth Error error while executing a web app on Streamlit Cloud using GitHub?

I have been building an app in Streamlit Cloud which uses a GitHub repo to execute code. Now I am using a hugging face model in the code. I have the API key and permission granted for meta-3b-instruct ...
Yash K's user avatar
  • 11
1 vote
2 answers
51 views

dropout(): argument 'input' (position 1) must be Tensor, not str BERT Issue

I was trying to run some epochs to train my sentiment analysis model, at the very last passage, the epochs stopped with the error in the title. I attach the codes here: Sentiment classifier: # Build ...
Laura Valentini's user avatar
0 votes
0 answers
18 views

Batch Inference for Llama to compute mean log-probabilities of tokens

I have a dataset of inputs, my goal is to first use them to generate some outputs and compute the mean log-probabilities of the generated tokens. I am stuck at the first step of trying to do the ...
O Sub Kwon's user avatar
-1 votes
1 answer
42 views

Cannot install llama-index-embeddings-huggingface==0.1.3 because these package versions have conflicting dependencies

I am unable to install the huggingfaceEmbedding \ Getting the followng error: ERROR: Cannot install llama-index-embeddings-huggingface==0.1.3, llama-index-embeddings-huggingface==0.1.4 and llama-index-...
Saurabh Verma's user avatar
0 votes
0 answers
17 views

Why does moving ML model initialization into a function prevent GPU OOM errors when del, gc.collect(), and torch.cuda.empty_cache() fail?

for model_name in model_list: model = LLM(model_name, trust_remote_code=True) results = evaluate_model(model, task) del model gc.collect() torch.cuda.empty_cache() Despite ...
Charlie Parker's user avatar
-1 votes
0 answers
18 views

How to find mapping between two matrices where matrix one is of shape [B, n, features] and other is of shape [B, m, features] using ML/DL models [closed]

I am working on a problem where I need to map one matrix to another. consider X and Y as follows X has shape [batch_size, seq_len_1, feature_dim] Y has shape [batch_size, seq_len_2, feature_dim] Here, ...
AKSHET PATIAL's user avatar
0 votes
0 answers
35 views

CUDA out of memory while using Llama3.1-8B for inference

I have written a simple Python script that uses the HuggingFace transformers library along with torch to run Llama3.1-8B-instruct purely for inference, after feeding in some long-ish bits of text (...
Tom Wagstaff's user avatar
  • 1,668
2 votes
1 answer
23 views

Stop model.generate

I'm using TextIteratorStreamer to generate text as stream and I use Thread to run model.generate thread = Thread(target=model.generate, kwargs=generation_kwargs) thread.start() I want to introduce a ...
A.A's user avatar
  • 3,951
0 votes
0 answers
34 views

How to reverse the tokenizer.apply_chat_template() method and handle streaming responses in Hugging Face?

While working with streaming, I found that it's not possible to use pipeline (at least we need HuggingFacePipeline and langchain, if I'm wrong let me know) I'm looking for a way to extract assistant ...
A.A's user avatar
  • 3,951
0 votes
0 answers
29 views

How to reverse the tokenizer.apply_chat_template()

# Chat template example prompt = [ { "role": "user", "content": "Random prompt."}, ] # Applying chat template prompt = tokenizer.apply_chat_template(chat) ...
A.A's user avatar
  • 3,951
2 votes
1 answer
38 views

Error in getting Captum text explanations for text classification

I have the following code that I am using to identify the most influential words used to correctly predict the text in the test dataset import pandas as pd import torch from torch.utils.data import ...
Nayantara Jeyaraj's user avatar
0 votes
1 answer
38 views

unexpected transformer's dataset structure after set_transform or with_transform

I am using the feature extractor from ViT like explained here. And noticed a weird behaviour I cannot fully understand. After loading the dataset as in that colab notebook, I see: ds['train'].features ...
hamagust's user avatar
  • 856
0 votes
1 answer
30 views

Do those `[0]` make sense in making the variable

The guide for fine-tuning Gemma with HuggingFace toolset is at: https://huggingface.co/blog/gemma-peft Link to the line: https://huggingface.co/blog/gemma-peft#:~:text=Quote%3A%20%7Bexample-,%5B%...
Dan D.'s user avatar
  • 8,487
0 votes
0 answers
18 views

The curious gap in time cost for QKV computation in LLM inference

I use Nsight System to profile the LLM inference process in HuggingFace Transformers framework. I observe that time for q_proj, k_proj and v_proj varies significantly. As far as I know, the Q, K ...
CarryPls's user avatar
1 vote
2 answers
120 views

Llama-3.2-1B-Instruct generate inconsistent output

I want to use Llama-3.2-1B-Instruct model, and although I have set "temperature": 0.0, "top_p":0.0 and "top_k":0, it still generates inconsistent output. This is how my ...
parvaneh shayegh's user avatar
0 votes
0 answers
34 views

Multi-GPU fine-tuning llama issue. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0

I am working on a llama fine-tuning task. When I train on a single GPU, the program runs fine. import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" os.environ["...
bill yao's user avatar
0 votes
1 answer
55 views

How to Log Training Loss at Step Zero in Hugging Face Trainer or SFT Trainer?

’m using the Hugging Face Trainer (or SFTTrainer) for fine-tuning, and I want to log the training loss at step 0 (before any training steps are executed). I know there’s an eval_on_start option for ...
Charlie Parker's user avatar
0 votes
0 answers
20 views

use BridgeTower/bridgetower-large-itm-mlm-itc on local laptop

I need help. I am currently studying Multimodal RAG: Chat with Videos In the course, there is a use of bridgetower-large-itm-mlm-itc using predictionguard. When I want to try it on a local laptop, ...
031 130's user avatar
0 votes
0 answers
34 views

Encoder Decoder Transformer model generate a repetitive token as output in text summarization

I implemented a transformer Encoder Decoder (Bert2Bert) for text summarization task. In train phase train loss decreases but in prediction phase it generate a repetitive token as output for example [2,...
rasoul mohammadi's user avatar
0 votes
0 answers
21 views

How to run inference large size model in multi-GPU effeciently?

I'm trying to run only inference with large 70B sized model with multi-GPU env, but facing some issues. The loading time takes so long, about 15mins. I'm not sure this works properly to shard model ...
James Jang's user avatar
2 votes
2 answers
237 views

blip2 type mismatch exception

I'm trying to create an image captioning model using hugging face blip2 model on colab. My code was working fine till last week (Nov 8) but it gives me an exception now. To install packages I use the ...
Soroush Hosseinpour's user avatar
1 vote
2 answers
42 views

Does peft train newly initialized weights?

When using peft to fine-tune a pretrained model e.g., DistilBert, you need to specify the target_modules. In case of DistilBert, typically, the attention weights are targeted. Example: lora_config = ...
Qdr's user avatar
  • 725
0 votes
0 answers
29 views

Memory increasing after hugging face generate method

I wanted to make an inference with codegemma model from huggingface, but when I use model.generate(**inputs) method GPU memory cost increases from 39 GB to 49 GB in peak usage with torch profiler no ...
prostak's user avatar
  • 139
0 votes
0 answers
18 views

Sometimes when transformers import on windows it throws an error "Failed to import transformers.models.clip.processing_clip"

Transformers lib works really strangely. From run to run of my application sometimes transformers import throws an exception. As you can see it mentions that the error happens because of inability to ...
Alex Panfilkin's user avatar
-1 votes
0 answers
21 views

Custom parameter gradients not propagating in PyTorch

I'm trying to implement model merging for T5-small where I want to learn the merging coefficients during training. I have a reference implementation that works for other models, but when adapting it ...
ZhengJay's user avatar
1 vote
0 answers
40 views

How to Optimize Preprocessing and Post-Processing in DETR-Based Object Detection?

My Question: How can I reduce the time spent on preprocessing and post-processing? Background Information I'm implementing object detection on video frames using DETR. My system processes frames from ...
birdalugur's user avatar
2 votes
1 answer
16 views

Methods to reduce a Tensor embedding to x,y,z coordinates

I have a model from hugging face and would like to use it for performing word comparisons. At first I thought of performing a series of similarity calculations across words of interest but quickly I ...
linkey apiacess's user avatar
2 votes
0 answers
39 views

MultiModal Cross attention

I am dealing with two embeddings, text and image both are last_hidden_state of transfomer models (bert and vit), so the shapes are (batch, seq, emd_dim). I want to feed text information to image using ...
m sh's user avatar
  • 21
0 votes
0 answers
55 views

How to Compute Teacher-Forced Accuracy (TFA) for Hugging Face Models While Handling EOS Tokens?

I am trying to compute Teacher-Forced Accuracy (TFA) for Hugging Face models, ensuring the following: EOS Token Handling: The model should be rewarded for predicting the first EOS token. Ignoring ...
Charlie Parker's user avatar
0 votes
0 answers
43 views

Batch forward huggingface transformer error

I am trying to perform fine-tuning on a base model with around 5-8 billion parameters. I have a dataset that results of combining the Dolly-15K and the alpaca-cleaned datasets. I want to perform a ...
Mario Kroll's user avatar
-1 votes
0 answers
23 views

GPT2: `register_forward_hook` and `output_hidden_state` gave different outputs of an intermediate layer

I want to output the 20th GPT2Block in a GPT2 medium model (24 GPT2Block blocks in total). I have used register_forward_hook and output_hidden_state separately, but they give different results. My ...
FeiYiZhaiMenRen's user avatar
0 votes
0 answers
29 views

Building an Open Pretrained Transformer from Scratch with NumPy

I tried to convert the OPT written by huggingface into one written with NumPy, but the results turned out to be very strange, and I don't know what to do. My code : import numpy as np def gelu(x): ...
jia-yu Lee's user avatar
0 votes
0 answers
16 views

While trying to implement QLORA using trainer class, getting casting error

lora_config=LoraConfig( r=8, lora_alpha=32, target_modules=['q_lin','v_lin'], lora_dropout=0.1, bias='all' ) class distilbertMultiClass(nn.Module): def __init__(self,model,...
Lijin Durairaj's user avatar
0 votes
1 answer
52 views

Can not load the safetensors huggingface model in DJL in Java

I tried a lot, but I want to read embeddings from the jina embeddings this is my java code: public static float[] getTextEmbedding(String text) throws ModelNotFoundException, MalformedModelException, ...
Richard Burkhardt's user avatar
0 votes
0 answers
87 views

AttributeError: 'DistributedDataParallel' object has no attribute 'policy' when saving a PPOTrainer

I am attempting to run a PPO script using Transformers and TRL. However, I encounter an error during the model saving step: Traceback (most recent call last): File "/run/determined/workdir/...
AsiaLootus's user avatar
3 votes
2 answers
54 views

Pyspark sentiment analysis invalid output

I am trying to perform sentiment analysis for a use case. Most of the time, it is giving correct results, but in some cases, even positive comments are being marked as negative. How can I fix my code ...
sande's user avatar
  • 654
-1 votes
0 answers
34 views

Spring Ai with Pinecone using Onnx Embedding Error

I am Using SpringAi with PineCone Vector Storage with Openai Embeddings / Onnx Embeddings in both the case I got the same issue I referred these documentations to implement the things Referred ...
mahes waran's user avatar
0 votes
1 answer
66 views

Error ("bus error") running the simplest example on Hugging Face Transformers Pipeline (Macos M1)

I'm trying to follow the quick tour example here: https://huggingface.co/docs/transformers/quicktour and i'm getting a "bus error". My env is: MacOS Sonoma 14.7, Apple M1 Max chip Python 3....
Roy Ca's user avatar
  • 491
0 votes
0 answers
29 views

VSCode install error for the hugginface relik library

So I really need to use the python library called relik on VSCode. When I would use pip install relik in the terminal, it would install and everything but when I tested it using this code in a cell ...
Anjali G's user avatar
0 votes
0 answers
26 views

Emotion Analysis with bhadresh-savani/bert-base-uncased-emotion

Hope I can get some help here please! I am trying to run an emotion analysis model from Hugging Face rep. (bhadresh-savani/bert-base-uncased-emotion) and I am struggling with the model run as it's ...
Rita Bini's user avatar

1
2 3 4 5
71