Newest 'sentence-similarity' Questions

0 votes

0 answers

23 views

How to get the Information Content (IC) of a sense in WordNet?

I want to get the Lin similarity between two senses using the formula and check the result using NLTK's WordNet APIs. According to the NLTK WordNet Documentation synset1.lin_similarity(synset2, ic): ...

Vic

1

asked Aug 4 at 11:32

-2 votes

1 answer

38 views

Hybridized collaborative filtering and sentence similarity-based system for doctor recommendation based on user input of symptoms and location

I'm trying to solve a problem of recommending a doctor based on a user's symptoms and location using a hybridized collaborative filtering and sentence similarity-based recommender system that follow ...

Sadura Akinrinwa

1

asked Jul 23 at 17:20

0 votes

1 answer

94 views

Different results for the same epoch using different number of total epochs

I am training a Machine Learning model for STS task using the Sentence Transformers library. When I was testing it, I noticed that my model generated different results for the same number of epochs ...

Hígor Hahn

1

asked Jun 24 at 22:35

0 votes

0 answers

31 views

System.AccessViolationException: Attempted to read or write protected memory at TorchSharp.PInvoke.LibTorchSharp.THSGenerator_manual_seed

I was trying to reproduce the example of ML.Net: machinelearning-samples/samples/csharp/getting-started/MLNET2/SentenceSimilarity and I got the following error : Fatal error. System....

CBal

31

asked Jun 23 at 23:08

0 votes

0 answers

50 views

Matching strings with high similarity using Sentence Similarity NLP

So, currently I'm have a list of vectors in my database, and I'm getting data from an API, from that API I loop over each of strings provided converting it to a vector & matching the most similar ...

Xyeut

1

asked May 31 at 3:46

2 votes

1 answer

211 views

Find exact and similar phrases in a string with millions of characters

I have a list of phrases and a corpus which is a string of text with millions of words. For each phrase in my phrase list, I want to find and record the most similar phrases found in the corpus-string....

98fly

41

asked May 27 at 15:55

0 votes

1 answer

353 views

Roberta for sentence similarity

I have a pre trained roberta model. And have a dataset having two sentence pairs with a label that indicated whether the sentence pair is similar or not. I want to use that roberta model to do that. I ...

pycoder

1

asked May 5 at 12:08

0 votes

0 answers

23 views

implementation of a search for similar documents

I want to implement a method for searching similar documents that will identify semantic links but I have no ideas at all I would be glad if you could give me some advice on what to read or give me an ...

user412026

1

asked May 3 at 15:04

1 vote

0 answers

105 views

What is the standard threshold value that is best for accuracy when employing Euclidean distance as a metric for gauging textual similarity?

I'm using Euclidean distance as a metric to compare two sentences for similarity while clustering them using my custom incremental KMeans algorithm. The current threshold value I'm using is 0.7 which ...

sanjay M

11

asked Apr 1 at 3:17

1 vote

2 answers

107 views

How to detect if two sentences are simmilar, not in meaning, but in syllables/words?

Here are some examples of the types of sentences that need to be considered "similar" there was a most extraordinary noise going on shrinking rapidly she soon made out there was a most ...

BLOCKCRAFT 2.0

13

asked Mar 29 at 2:08

3 votes

1 answer

326 views

Batched BM25 search in PySpark

I have a large dataset of documents (average length of 35 words). I want to find the top k nearest neighbors of all these documents by using BM25. Every document needs to be compared with every other ...

theodre7

108

asked Jan 18 at 4:40

3 votes

2 answers

6k views

Searching existing ChromaDB database using cosine similarity

I have a preexisting database with around 15 PDFs stored. I want to be able to search the database so that I'm getting the X most relevant results back given a certain threshold using cosine ...

Yash Sharma

112

asked Jan 10 at 14:12

1 vote

1 answer

245 views

Sentence Similarity between a phrase with 2-3 words and documents with multiple sentences

What I want to achieve: I have thousands of documents (descriptions of incidents) and I would like to find the documents which match a phrase or are similar to the words in the phrase. An example, for ...

Naveen Reddy Marthala

3,093

asked Dec 14, 2023 at 15:09

0 votes

0 answers

62 views

indexing does not speed up retrival of numpy array from sqlite3

I am trying to store embedding as numpy array in sqlite3. Sqlite3 handles numpy array as blob.I am handling approximately 30 million data. I am able to successfully store the data and retrieve it as ...

saurav

1

asked Dec 6, 2023 at 18:45

1 vote

0 answers

90 views

Filtering Documents Using Word Embeddings: Keep Job Postings, Exclude Resumes

I have a DataFrame containing a column of various documents, and I'm trying to filter out documents that resemble resumes while keeping job postings. To achieve this, I've utilized a CSV file provided ...

Banafshe khazali

11

asked Sep 18, 2023 at 16:03

2 votes

1 answer

130 views

String Similarity for all possible combination in Optimised fashion

I am facing a problem while finding string similarity. Scenario: The string which consisits of following fields first_name, middle_name and last_name What I have do is to find string similarity ...

Akhilesh mahajan

132

asked Jul 26, 2023 at 10:49

0 votes

0 answers

231 views

Facing accuracy issue with sentence transformers

I'm facing issue in sentence similarity while using sentence transformers with cosine metric. I'm comparing transcribed audio text with the predefined set of sentences. Even when the the whole ...

Niceboy

11

asked Jul 24, 2023 at 5:25

1 vote

1 answer

2k views

What is the best distance measure to use when doing semantic search on the embeddings generated by sentence transformers?

I understand there are many distance measures to calculate the distance between two vectors (embeddings). However which one is the best when comparing two vectors for semantic similarity that have ...

morpheus

20.2k

asked Jul 18, 2023 at 23:35

0 votes

0 answers

58 views

String Match using Fuzzy Lookup in Excel

I am trying to use Fuzzy Lookup to match two strings in two columns of a table that looks like below. Table1 Table2 | Column A | Column B | | -------- | -------- | | Flower.com |...

Chitwan

21

asked Jul 12, 2023 at 11:29

1 vote

1 answer

4k views

Hardware requirements for using sentence-transformers/all-MiniLM-L6-v2

Can someone please advise me upon the hardware requirements of using sentence-transformers/all-MiniLM-L6-v2 for a semantic similarity use-case. I had downloaded the model locally and am using it to ...

nsb

19

asked Jul 5, 2023 at 8:53

2 votes

1 answer

1k views

How to map word level timestamps to text of a given transcript?

I am currently developing a tool to visualize song lyrics. The tool computes the similarity in the phonetics of syllables and assigns a rhyme group to each syllable. Syllables belonging to the same ...

paulpelikan

21

asked Jun 28, 2023 at 15:06

-1 votes

1 answer

230 views

Given a sentence, how can I generate similar sentences of that particular sentence using machine learning or other ways?

I am testing my AI app that ask a question and based on that do some processing. I have to write some tests that even people ask the same question in different way the system will generate same result....

Sal

7

asked Jun 18, 2023 at 21:34

0 votes

2 answers

74 views

Shuffle pandas column while avoiding a condition

I have a dataframe that shows 2 sentences are similar. This dataframe has a 3rd relationship column which also contains some strings. This 3rd column shows how similar the texts are. For instance: P ...

trazoM

314

asked May 22, 2023 at 14:05

0 votes

1 answer

118 views

Clustering generates length mismatch with original data

I'm trying to use this fastclustering.py to do some clustering on textual data. my data is in a dataframe called df['processed_activities']. But I'm getting this error telling me it's a mismatch ...

Moe_blg

119

asked Apr 24, 2023 at 8:29

0 votes

2 answers

599 views

Boosting documents with term matches in elasticsearch after cosine similarity

I am using text embeddings stored in elasticsearch to get documents similar to a query. But I noticed that in some cases, I get documents that don't have the words from the query in them with a higher ...

BBloggsbott

378

asked Apr 12, 2023 at 6:26

1 vote

1 answer

795 views

A machine Learning model to find similarities between two words in python

I have 2 lists of words. The first list contains 5 words. The second list contains 1000s of words. I am looking for a ML model that will help me find the best match between the words in the first list ...

Sharhad

83

asked Apr 9, 2023 at 7:11

1 vote

0 answers

133 views

How can I fine tune sentence transfomer without any labels?

I only have product descriptions and nothing else. I need to match similar products using cosine similarity. I have achieved this by taking embeddings from the Sentence Transformer. However, I need to ...

Margam Rohith Kumar

11

asked Apr 7, 2023 at 13:24

0 votes

2 answers

434 views

Calculate Cosine Similarity Sentences ValueError: Expected 2D array, got 1D array instead

So I'm doing a cosine similarity calculation on a list of sentences. I've got the embedding of the calculations done. Here's the embedding The shape of embedding (11, 3072) [[-0.02179624 -0.17235152 -...

intodarkmoon

1

asked Mar 22, 2023 at 7:01

-2 votes

1 answer

395 views

Find Similarity on Excel Column - Brand, Product Name and Weight

I'll try to explain you, as far as I can, my new Python Challenge! We have two datasets in Excell for two different Retailer (supermarket) and in each of them there are some information about their ...

Take A few

7

asked Mar 13, 2023 at 8:40

1 vote

0 answers

134 views

How to find Sentence Transformer support languages?

I want to get the sentence embedding results to find the sentence similarities in my NLP project. Since I am working with a low-resource language (Sinhala), I want to know whether any ...

Pamudu

61

asked Feb 15, 2023 at 17:33

0 votes

1 answer

136 views

database search page using universal sentence encoder for simantic search

im trying to build a page that uses uneversal sentence encoder modle to search through database 'abstract' attribute and this error is appearing in the browser console enter image description here i ...

dania

11

asked Jan 30, 2023 at 12:22

0 votes

0 answers

80 views

How do I implement a model that finds a correlation (not similarity) between query and target sentence?

When building an NLP model (I'm going to use an attention-based one), how can we implement one for finding the correlation, not similarity, between the query and target sentences? For instance, the ...

kemakino

1,122

asked Jan 20, 2023 at 22:14

1 vote

0 answers

182 views

Why loss is not decreasing in a Siamese BERT-Network training (Entity matching task)

I'm trying to finetune a model for an entity matching task (kind of a sentence similarity task). The idea is that if I give as input two sentences the model should output if they represent the same ...

pushz

21

asked Jan 2, 2023 at 17:53

0 votes

0 answers

495 views

Calculate similarity between sets of keywords in Python

For my project I want to compare to sets of keywords that are stored in lists and obtain a similarity index. An example would look like the following: db_1: list of 5 keywords db_2: list of 10 ...

codexxblack

19

asked Dec 7, 2022 at 14:47

0 votes

1 answer

999 views

Generating embedding for long documents using pre-trained word vectors

I have a set of pre-trained word embeddings from the Wikipedia corpus. I also have 300 dimension embeddings of Wikipedia article pages. I am looking to build a similarity engine by running a simple ...

Steve Nathan

119

asked Nov 15, 2022 at 18:12

-1 votes

1 answer

832 views

How to find similarity score between two rows in a pandas data frame

I want to find the similarity of given sentences between two rows. In my sample data frame: import pandas as pd data = [f'Sent {str(i)}' for i in range(10)] df = pd.DataFrame(data=data, columns=['...

kcats_wolf

3

asked Nov 13, 2022 at 8:45

2 votes

1 answer

79 views

What robust algorithm implementation can I use to perform phrase similarity with two inputs?

This is the problem: I have two columns in my matadata database "field name" and "field description" I need to check if the "field description" is actually a description ...

emichester

189

asked Nov 8, 2022 at 14:30

3 votes

4 answers

4k views

How to save a SetFit trainer locally after training

I am working on an HPC with no internet access on worker nodes and the only option to save a SetFit trainer after training, is to push it to HuggingFace hub. How do I go about saving it locally to ...

Tanish Bafna

33

asked Oct 12, 2022 at 18:23

0 votes

1 answer

476 views

Doc2Vec How to find most similar document

I am using Gensim's Doc2Vec, and was wondering if there is a way to get the most similar document to another document that is outside the list of TaggedDocuments used to train the Doc2Vec model. Right ...

Elliot Jackson

1

asked Aug 13, 2022 at 13:09

0 votes

0 answers

160 views

SpaCy doc.similarity limitations

I'm building an information retrieval tool that receives an user's request and returns the most similar label in the corpus. With Spacy's vanilla similarity, I have the following limitation : request =...

JulienBr

70

asked Aug 10, 2022 at 9:06

0 votes

2 answers

1k views

How to check the similarity score between two web urls?

I'm working on a project that frequently needs to check the similarity score between two web url, initially i did this by scraping all the text from the web page and then calculated the document ...

Abhishek Jha

153

asked Aug 8, 2022 at 6:32

2 votes

1 answer

674 views

Sentence Transformers in Python: "[E1002] Span index out of range"

As a programming noob, I am trying to find similar sentences in several hundreds of newspaper articles. I have tried my code with a smaller text sample which has worked brilliantly. Now, with a larger ...

Mathias

41

asked Aug 5, 2022 at 16:49

3 votes

1 answer

2k views

How to download and use the universal sentence encoder instead of loading it from url

I am using the universal sentence encoder to find sentence similarity. below is the code that i use to load the model import tensorflow_hub as hub model = hub.load("https://tfhub.dev/google/...

Jithin P James

762

asked Jul 20, 2022 at 3:58

1 vote

0 answers

131 views

How to extract the keywords on which universal sentence encoder was trained on?

I am using Universal sentence encoder to encode some documents into a 512 dimensional embeddings. These are then used to find similar items to a search query which is also encoded using USE. USE works ...

Pratyush

11

asked Jul 13, 2022 at 21:06

1 vote

1 answer

262 views

Why do my t-SNE plots with euclidean and cosine distances look similar

I have a question about two t-SNE plots I made. I have a set of 850 articles for which I wanted to check which articles are similar to each other. This was done by pre-processing the articles first, ...

HenkieTee

31

asked Jul 6, 2022 at 13:20

0 votes

0 answers

199 views

Compute similarity in pyspark

I have a csv file contains some data, I want select the similar data with an input. my data is like: H1 | H2 | H3 --------+---------+---------- A | 1 | 7 B | 5 | 3 C ...

Tavakoli

1,375

asked Jul 5, 2022 at 7:22

0 votes

2 answers

498 views

Parameters for training a sentence-similarity model using Bert?

I have a list of sentences : sentences = ["Missing Plate", "Plate not found"] I am trying to find the most similar sentences in the list by using Transformers model with ...

Chintan Mehta

117

asked Jun 23, 2022 at 9:30

0 votes

1 answer

149 views

How to identify the similar words using the word2vec

input: I have a set of words(N) & input sentence problem statement: the sentence is dynamic, the user can give any sentence related to one business domain. we have to map the input sentence tokens ...

Dinesh Kumar

1

asked Jun 22, 2022 at 13:56

1 vote

1 answer

2k views

BERT with WMD distance for sentence similarity

I have tried to calculate the similarity between the two sentences using BERT and word mover distance (WMD). I am unable to find the correct formula for WMD in python. Also tried the WMD python ...

Abdul Wahab

139

asked Jun 5, 2022 at 9:03

0 votes

1 answer

437 views

Sentence Transformers Using BOW?

I have a collection of terms that appear or are somehow related to web pages (e.g. keywords from the HTML tags). These are not sentences, they are just a collection of keywords, words in a title etc. ...

B_Miner

1,820

asked May 24, 2022 at 0:05

Collectives™ on Stack Overflow

Related Tags