Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
23 views

How to get the Information Content (IC) of a sense in WordNet?

I want to get the Lin similarity between two senses using the formula and check the result using NLTK's WordNet APIs. According to the NLTK WordNet Documentation synset1.lin_similarity(synset2, ic): ...
Vic's user avatar
  • 1
-2 votes
1 answer
38 views

Hybridized collaborative filtering and sentence similarity-based system for doctor recommendation based on user input of symptoms and location

I'm trying to solve a problem of recommending a doctor based on a user's symptoms and location using a hybridized collaborative filtering and sentence similarity-based recommender system that follow ...
Sadura Akinrinwa's user avatar
0 votes
1 answer
94 views

Different results for the same epoch using different number of total epochs

I am training a Machine Learning model for STS task using the Sentence Transformers library. When I was testing it, I noticed that my model generated different results for the same number of epochs ...
Hígor Hahn's user avatar
0 votes
0 answers
31 views

System.AccessViolationException: Attempted to read or write protected memory at TorchSharp.PInvoke.LibTorchSharp.THSGenerator_manual_seed

I was trying to reproduce the example of ML.Net: machinelearning-samples/samples/csharp/getting-started/MLNET2/SentenceSimilarity and I got the following error : Fatal error. System....
CBal's user avatar
  • 31
0 votes
0 answers
50 views

Matching strings with high similarity using Sentence Similarity NLP

So, currently I'm have a list of vectors in my database, and I'm getting data from an API, from that API I loop over each of strings provided converting it to a vector & matching the most similar ...
Xyeut's user avatar
  • 1
2 votes
1 answer
211 views

Find exact and similar phrases in a string with millions of characters

I have a list of phrases and a corpus which is a string of text with millions of words. For each phrase in my phrase list, I want to find and record the most similar phrases found in the corpus-string....
98fly's user avatar
  • 41
0 votes
1 answer
353 views

Roberta for sentence similarity

I have a pre trained roberta model. And have a dataset having two sentence pairs with a label that indicated whether the sentence pair is similar or not. I want to use that roberta model to do that. I ...
pycoder's user avatar
0 votes
0 answers
23 views

implementation of a search for similar documents

I want to implement a method for searching similar documents that will identify semantic links but I have no ideas at all I would be glad if you could give me some advice on what to read or give me an ...
user412026's user avatar
1 vote
0 answers
105 views

What is the standard threshold value that is best for accuracy when employing Euclidean distance as a metric for gauging textual similarity?

I'm using Euclidean distance as a metric to compare two sentences for similarity while clustering them using my custom incremental KMeans algorithm. The current threshold value I'm using is 0.7 which ...
sanjay M's user avatar
1 vote
2 answers
107 views

How to detect if two sentences are simmilar, not in meaning, but in syllables/words?

Here are some examples of the types of sentences that need to be considered "similar" there was a most extraordinary noise going on shrinking rapidly she soon made out there was a most ...
BLOCKCRAFT 2.0's user avatar
3 votes
1 answer
326 views

Batched BM25 search in PySpark

I have a large dataset of documents (average length of 35 words). I want to find the top k nearest neighbors of all these documents by using BM25. Every document needs to be compared with every other ...
theodre7's user avatar
  • 108
3 votes
2 answers
6k views

Searching existing ChromaDB database using cosine similarity

I have a preexisting database with around 15 PDFs stored. I want to be able to search the database so that I'm getting the X most relevant results back given a certain threshold using cosine ...
Yash Sharma's user avatar
1 vote
1 answer
245 views

Sentence Similarity between a phrase with 2-3 words and documents with multiple sentences

What I want to achieve: I have thousands of documents (descriptions of incidents) and I would like to find the documents which match a phrase or are similar to the words in the phrase. An example, for ...
Naveen Reddy Marthala's user avatar
0 votes
0 answers
62 views

indexing does not speed up retrival of numpy array from sqlite3

I am trying to store embedding as numpy array in sqlite3. Sqlite3 handles numpy array as blob.I am handling approximately 30 million data. I am able to successfully store the data and retrieve it as ...
saurav's user avatar
  • 1
1 vote
0 answers
90 views

Filtering Documents Using Word Embeddings: Keep Job Postings, Exclude Resumes

I have a DataFrame containing a column of various documents, and I'm trying to filter out documents that resemble resumes while keeping job postings. To achieve this, I've utilized a CSV file provided ...
Banafshe khazali's user avatar
2 votes
1 answer
130 views

String Similarity for all possible combination in Optimised fashion

I am facing a problem while finding string similarity. Scenario: The string which consisits of following fields first_name, middle_name and last_name What I have do is to find string similarity ...
Akhilesh mahajan's user avatar
0 votes
0 answers
231 views

Facing accuracy issue with sentence transformers

I'm facing issue in sentence similarity while using sentence transformers with cosine metric. I'm comparing transcribed audio text with the predefined set of sentences. Even when the the whole ...
Niceboy's user avatar
  • 11
1 vote
1 answer
2k views

What is the best distance measure to use when doing semantic search on the embeddings generated by sentence transformers?

I understand there are many distance measures to calculate the distance between two vectors (embeddings). However which one is the best when comparing two vectors for semantic similarity that have ...
morpheus's user avatar
  • 20.2k
0 votes
0 answers
58 views

String Match using Fuzzy Lookup in Excel

I am trying to use Fuzzy Lookup to match two strings in two columns of a table that looks like below. Table1 Table2 | Column A | Column B | | -------- | -------- | | Flower.com |...
Chitwan 's user avatar
1 vote
1 answer
4k views

Hardware requirements for using sentence-transformers/all-MiniLM-L6-v2

Can someone please advise me upon the hardware requirements of using sentence-transformers/all-MiniLM-L6-v2 for a semantic similarity use-case. I had downloaded the model locally and am using it to ...
nsb's user avatar
  • 19
2 votes
1 answer
1k views

How to map word level timestamps to text of a given transcript?

I am currently developing a tool to visualize song lyrics. The tool computes the similarity in the phonetics of syllables and assigns a rhyme group to each syllable. Syllables belonging to the same ...
paulpelikan's user avatar
-1 votes
1 answer
230 views

Given a sentence, how can I generate similar sentences of that particular sentence using machine learning or other ways?

I am testing my AI app that ask a question and based on that do some processing. I have to write some tests that even people ask the same question in different way the system will generate same result....
Sal's user avatar
  • 7
0 votes
2 answers
74 views

Shuffle pandas column while avoiding a condition

I have a dataframe that shows 2 sentences are similar. This dataframe has a 3rd relationship column which also contains some strings. This 3rd column shows how similar the texts are. For instance: P ...
trazoM's user avatar
  • 314
0 votes
1 answer
118 views

Clustering generates length mismatch with original data

I'm trying to use this fastclustering.py to do some clustering on textual data. my data is in a dataframe called df['processed_activities']. But I'm getting this error telling me it's a mismatch ...
Moe_blg's user avatar
  • 119
0 votes
2 answers
599 views

Boosting documents with term matches in elasticsearch after cosine similarity

I am using text embeddings stored in elasticsearch to get documents similar to a query. But I noticed that in some cases, I get documents that don't have the words from the query in them with a higher ...
BBloggsbott's user avatar
1 vote
1 answer
795 views

A machine Learning model to find similarities between two words in python

I have 2 lists of words. The first list contains 5 words. The second list contains 1000s of words. I am looking for a ML model that will help me find the best match between the words in the first list ...
Sharhad's user avatar
  • 83
1 vote
0 answers
133 views

How can I fine tune sentence transfomer without any labels?

I only have product descriptions and nothing else. I need to match similar products using cosine similarity. I have achieved this by taking embeddings from the Sentence Transformer. However, I need to ...
Margam Rohith Kumar's user avatar
0 votes
2 answers
434 views

Calculate Cosine Similarity Sentences ValueError: Expected 2D array, got 1D array instead

So I'm doing a cosine similarity calculation on a list of sentences. I've got the embedding of the calculations done. Here's the embedding The shape of embedding (11, 3072) [[-0.02179624 -0.17235152 -...
intodarkmoon's user avatar
-2 votes
1 answer
395 views

Find Similarity on Excel Column - Brand, Product Name and Weight

I'll try to explain you, as far as I can, my new Python Challenge! We have two datasets in Excell for two different Retailer (supermarket) and in each of them there are some information about their ...
Take A few's user avatar
1 vote
0 answers
134 views

How to find Sentence Transformer support languages?

I want to get the sentence embedding results to find the sentence similarities in my NLP project. Since I am working with a low-resource language (Sinhala), I want to know whether any ...
Pamudu's user avatar
  • 61
0 votes
1 answer
136 views

database search page using universal sentence encoder for simantic search

im trying to build a page that uses uneversal sentence encoder modle to search through database 'abstract' attribute and this error is appearing in the browser console enter image description here i ...
dania's user avatar
  • 11
0 votes
0 answers
80 views

How do I implement a model that finds a correlation (not similarity) between query and target sentence?

When building an NLP model (I'm going to use an attention-based one), how can we implement one for finding the correlation, not similarity, between the query and target sentences? For instance, the ...
kemakino's user avatar
  • 1,122
1 vote
0 answers
182 views

Why loss is not decreasing in a Siamese BERT-Network training (Entity matching task)

I'm trying to finetune a model for an entity matching task (kind of a sentence similarity task). The idea is that if I give as input two sentences the model should output if they represent the same ...
pushz's user avatar
  • 21
0 votes
0 answers
495 views

Calculate similarity between sets of keywords in Python

For my project I want to compare to sets of keywords that are stored in lists and obtain a similarity index. An example would look like the following: db_1: list of 5 keywords db_2: list of 10 ...
codexxblack's user avatar
0 votes
1 answer
999 views

Generating embedding for long documents using pre-trained word vectors

I have a set of pre-trained word embeddings from the Wikipedia corpus. I also have 300 dimension embeddings of Wikipedia article pages. I am looking to build a similarity engine by running a simple ...
Steve Nathan's user avatar
-1 votes
1 answer
832 views

How to find similarity score between two rows in a pandas data frame

I want to find the similarity of given sentences between two rows. In my sample data frame: import pandas as pd data = [f'Sent {str(i)}' for i in range(10)] df = pd.DataFrame(data=data, columns=['...
kcats_wolf's user avatar
2 votes
1 answer
79 views

What robust algorithm implementation can I use to perform phrase similarity with two inputs?

This is the problem: I have two columns in my matadata database "field name" and "field description" I need to check if the "field description" is actually a description ...
emichester's user avatar
3 votes
4 answers
4k views

How to save a SetFit trainer locally after training

I am working on an HPC with no internet access on worker nodes and the only option to save a SetFit trainer after training, is to push it to HuggingFace hub. How do I go about saving it locally to ...
Tanish Bafna's user avatar
0 votes
1 answer
476 views

Doc2Vec How to find most similar document

I am using Gensim's Doc2Vec, and was wondering if there is a way to get the most similar document to another document that is outside the list of TaggedDocuments used to train the Doc2Vec model. Right ...
Elliot Jackson's user avatar
0 votes
0 answers
160 views

SpaCy doc.similarity limitations

I'm building an information retrieval tool that receives an user's request and returns the most similar label in the corpus. With Spacy's vanilla similarity, I have the following limitation : request =...
JulienBr's user avatar
0 votes
2 answers
1k views

How to check the similarity score between two web urls?

I'm working on a project that frequently needs to check the similarity score between two web url, initially i did this by scraping all the text from the web page and then calculated the document ...
Abhishek Jha's user avatar
2 votes
1 answer
674 views

Sentence Transformers in Python: "[E1002] Span index out of range"

As a programming noob, I am trying to find similar sentences in several hundreds of newspaper articles. I have tried my code with a smaller text sample which has worked brilliantly. Now, with a larger ...
Mathias's user avatar
  • 41
3 votes
1 answer
2k views

How to download and use the universal sentence encoder instead of loading it from url

I am using the universal sentence encoder to find sentence similarity. below is the code that i use to load the model import tensorflow_hub as hub model = hub.load("https://tfhub.dev/google/...
Jithin P James's user avatar
1 vote
0 answers
131 views

How to extract the keywords on which universal sentence encoder was trained on?

I am using Universal sentence encoder to encode some documents into a 512 dimensional embeddings. These are then used to find similar items to a search query which is also encoded using USE. USE works ...
Pratyush's user avatar
1 vote
1 answer
262 views

Why do my t-SNE plots with euclidean and cosine distances look similar

I have a question about two t-SNE plots I made. I have a set of 850 articles for which I wanted to check which articles are similar to each other. This was done by pre-processing the articles first, ...
HenkieTee's user avatar
0 votes
0 answers
199 views

Compute similarity in pyspark

I have a csv file contains some data, I want select the similar data with an input. my data is like: H1 | H2 | H3 --------+---------+---------- A | 1 | 7 B | 5 | 3 C ...
Tavakoli's user avatar
  • 1,375
0 votes
2 answers
498 views

Parameters for training a sentence-similarity model using Bert?

I have a list of sentences : sentences = ["Missing Plate", "Plate not found"] I am trying to find the most similar sentences in the list by using Transformers model with ...
Chintan Mehta's user avatar
0 votes
1 answer
149 views

How to identify the similar words using the word2vec

input: I have a set of words(N) & input sentence problem statement: the sentence is dynamic, the user can give any sentence related to one business domain. we have to map the input sentence tokens ...
Dinesh Kumar's user avatar
1 vote
1 answer
2k views

BERT with WMD distance for sentence similarity

I have tried to calculate the similarity between the two sentences using BERT and word mover distance (WMD). I am unable to find the correct formula for WMD in python. Also tried the WMD python ...
Abdul Wahab's user avatar
0 votes
1 answer
437 views

Sentence Transformers Using BOW?

I have a collection of terms that appear or are somehow related to web pages (e.g. keywords from the HTML tags). These are not sentences, they are just a collection of keywords, words in a title etc. ...
B_Miner's user avatar
  • 1,820

1
2 3 4 5