231 questions
0
votes
0
answers
23
views
How to get the Information Content (IC) of a sense in WordNet?
I want to get the Lin similarity between two senses using the formula and check the result using NLTK's WordNet APIs.
According to the NLTK WordNet Documentation
synset1.lin_similarity(synset2, ic): ...
-2
votes
1
answer
38
views
Hybridized collaborative filtering and sentence similarity-based system for doctor recommendation based on user input of symptoms and location
I'm trying to solve a problem of recommending a doctor based on a user's symptoms and location using a hybridized collaborative filtering and sentence similarity-based recommender system that follow ...
0
votes
1
answer
94
views
Different results for the same epoch using different number of total epochs
I am training a Machine Learning model for STS task using the Sentence Transformers library.
When I was testing it, I noticed that my model generated different results for the same number of epochs ...
0
votes
0
answers
31
views
System.AccessViolationException: Attempted to read or write protected memory at TorchSharp.PInvoke.LibTorchSharp.THSGenerator_manual_seed
I was trying to reproduce the example of ML.Net: machinelearning-samples/samples/csharp/getting-started/MLNET2/SentenceSimilarity and I got the following error :
Fatal error. System....
0
votes
0
answers
50
views
Matching strings with high similarity using Sentence Similarity NLP
So, currently I'm have a list of vectors in my database, and I'm getting data from an API, from that API I loop over each of strings provided converting it to a vector & matching the most similar ...
2
votes
1
answer
211
views
Find exact and similar phrases in a string with millions of characters
I have a list of phrases and a corpus which is a string of text with millions of words. For each phrase in my phrase list, I want to find and record the most similar phrases found in the corpus-string....
0
votes
1
answer
353
views
Roberta for sentence similarity
I have a pre trained roberta model. And have a dataset having two sentence pairs with a label that indicated whether the sentence pair is similar or not. I want to use that roberta model to do that.
I ...
0
votes
0
answers
23
views
implementation of a search for similar documents
I want to implement a method for searching similar documents that will identify semantic links but I have no ideas at all I would be glad if you could give me some advice on what to read or give me an ...
1
vote
0
answers
105
views
What is the standard threshold value that is best for accuracy when employing Euclidean distance as a metric for gauging textual similarity?
I'm using Euclidean distance as a metric to compare two sentences for similarity while clustering them using my custom incremental KMeans algorithm. The current threshold value I'm using is 0.7 which ...
1
vote
2
answers
107
views
How to detect if two sentences are simmilar, not in meaning, but in syllables/words?
Here are some examples of the types of sentences that need to be considered "similar"
there was a most extraordinary noise going on shrinking rapidly she soon made out
there was a most ...
3
votes
1
answer
326
views
Batched BM25 search in PySpark
I have a large dataset of documents (average length of 35 words). I want to find the top k nearest neighbors of all these documents by using BM25. Every document needs to be compared with every other ...
3
votes
2
answers
6k
views
Searching existing ChromaDB database using cosine similarity
I have a preexisting database with around 15 PDFs stored. I want to be able to search the database so that I'm getting the X most relevant results back given a certain threshold using cosine ...
1
vote
1
answer
245
views
Sentence Similarity between a phrase with 2-3 words and documents with multiple sentences
What I want to achieve: I have thousands of documents (descriptions of incidents) and I would like to find the documents which match a phrase or are similar to the words in the phrase. An example, for ...
0
votes
0
answers
62
views
indexing does not speed up retrival of numpy array from sqlite3
I am trying to store embedding as numpy array in sqlite3. Sqlite3 handles numpy array as blob.I am handling approximately 30 million data. I am able to successfully store the data and retrieve it as ...
1
vote
0
answers
90
views
Filtering Documents Using Word Embeddings: Keep Job Postings, Exclude Resumes
I have a DataFrame containing a column of various documents, and I'm trying to filter out documents that resemble resumes while keeping job postings. To achieve this, I've utilized a CSV file provided ...
2
votes
1
answer
130
views
String Similarity for all possible combination in Optimised fashion
I am facing a problem while finding string similarity.
Scenario: The string which consisits of following fields
first_name, middle_name and last_name
What I have do is to find string similarity ...
0
votes
0
answers
231
views
Facing accuracy issue with sentence transformers
I'm facing issue in sentence similarity while using sentence transformers with cosine metric. I'm comparing transcribed audio text with the predefined set of sentences. Even when the the whole ...
1
vote
1
answer
2k
views
What is the best distance measure to use when doing semantic search on the embeddings generated by sentence transformers?
I understand there are many distance measures to calculate the distance between two vectors (embeddings). However which one is the best when comparing two vectors for semantic similarity that have ...
0
votes
0
answers
58
views
String Match using Fuzzy Lookup in Excel
I am trying to use Fuzzy Lookup to match two strings in two columns of a table that looks like below.
Table1 Table2
| Column A | Column B |
| -------- | -------- |
| Flower.com |...
1
vote
1
answer
4k
views
Hardware requirements for using sentence-transformers/all-MiniLM-L6-v2
Can someone please advise me upon the hardware requirements of using sentence-transformers/all-MiniLM-L6-v2 for a semantic similarity use-case. I had downloaded the model locally and am using it to ...
2
votes
1
answer
1k
views
How to map word level timestamps to text of a given transcript?
I am currently developing a tool to visualize song lyrics. The tool computes the similarity in the phonetics of syllables and assigns a rhyme group to each syllable. Syllables belonging to the same ...
-1
votes
1
answer
230
views
Given a sentence, how can I generate similar sentences of that particular sentence using machine learning or other ways?
I am testing my AI app that ask a question and based on that do some processing. I have to write some tests that even people ask the same question in different way the system will generate same result....
0
votes
2
answers
74
views
Shuffle pandas column while avoiding a condition
I have a dataframe that shows 2 sentences are similar. This dataframe has a 3rd relationship column which also contains some strings. This 3rd column shows how similar the texts are. For instance:
P ...
0
votes
1
answer
118
views
Clustering generates length mismatch with original data
I'm trying to use this fastclustering.py to do some clustering on textual data. my data is in a dataframe called df['processed_activities']. But I'm getting this error telling me it's a mismatch ...
0
votes
2
answers
599
views
Boosting documents with term matches in elasticsearch after cosine similarity
I am using text embeddings stored in elasticsearch to get documents similar to a query. But I noticed that in some cases, I get documents that don't have the words from the query in them with a higher ...
1
vote
1
answer
795
views
A machine Learning model to find similarities between two words in python
I have 2 lists of words. The first list contains 5 words. The second list contains 1000s of words.
I am looking for a ML model that will help me find the best match between the words in the first list ...
1
vote
0
answers
133
views
How can I fine tune sentence transfomer without any labels?
I only have product descriptions and nothing else. I need to match similar products using cosine similarity. I have achieved this by taking embeddings from the Sentence Transformer. However, I need to ...
0
votes
2
answers
434
views
Calculate Cosine Similarity Sentences ValueError: Expected 2D array, got 1D array instead
So I'm doing a cosine similarity calculation on a list of sentences. I've got the embedding of the calculations done.
Here's the embedding
The shape of embedding (11, 3072)
[[-0.02179624 -0.17235152 -...
-2
votes
1
answer
395
views
Find Similarity on Excel Column - Brand, Product Name and Weight
I'll try to explain you, as far as I can, my new Python Challenge!
We have two datasets in Excell for two different Retailer (supermarket) and in each of them there are some information about their ...
1
vote
0
answers
134
views
How to find Sentence Transformer support languages?
I want to get the sentence embedding results to find the sentence similarities in my NLP project. Since I am working with a low-resource language (Sinhala), I want to know whether any ...
0
votes
1
answer
136
views
database search page using universal sentence encoder for simantic search
im trying to build a page that uses uneversal sentence encoder modle to search through database 'abstract' attribute and this error is appearing in the browser console enter image description here
i ...
0
votes
0
answers
80
views
How do I implement a model that finds a correlation (not similarity) between query and target sentence?
When building an NLP model (I'm going to use an attention-based one), how can we implement one for finding the correlation, not similarity, between the query and target sentences?
For instance, the ...
1
vote
0
answers
182
views
Why loss is not decreasing in a Siamese BERT-Network training (Entity matching task)
I'm trying to finetune a model for an entity matching task (kind of a sentence similarity task).
The idea is that if I give as input two sentences the model should output if they represent the same ...
0
votes
0
answers
495
views
Calculate similarity between sets of keywords in Python
For my project I want to compare to sets of keywords that are stored in lists and obtain a similarity index.
An example would look like the following:
db_1: list of 5 keywords
db_2: list of 10 ...
0
votes
1
answer
999
views
Generating embedding for long documents using pre-trained word vectors
I have a set of pre-trained word embeddings from the Wikipedia corpus. I also have 300 dimension embeddings of Wikipedia article pages. I am looking to build a similarity engine by running a simple ...
-1
votes
1
answer
832
views
How to find similarity score between two rows in a pandas data frame
I want to find the similarity of given sentences between two rows.
In my sample data frame:
import pandas as pd
data = [f'Sent {str(i)}' for i in range(10)]
df = pd.DataFrame(data=data, columns=['...
2
votes
1
answer
79
views
What robust algorithm implementation can I use to perform phrase similarity with two inputs?
This is the problem:
I have two columns in my matadata database "field name" and "field description"
I need to check if the "field description" is actually a description ...
3
votes
4
answers
4k
views
How to save a SetFit trainer locally after training
I am working on an HPC with no internet access on worker nodes and the only option to save a SetFit trainer after training, is to push it to HuggingFace hub. How do I go about saving it locally to ...
0
votes
1
answer
476
views
Doc2Vec How to find most similar document
I am using Gensim's Doc2Vec, and was wondering if there is a way to get the most similar document to another document that is outside the list of TaggedDocuments used to train the Doc2Vec model.
Right ...
0
votes
0
answers
160
views
SpaCy doc.similarity limitations
I'm building an information retrieval tool that receives an user's request and returns the most similar label in the corpus.
With Spacy's vanilla similarity, I have the following limitation :
request =...
0
votes
2
answers
1k
views
How to check the similarity score between two web urls?
I'm working on a project that frequently needs to check the similarity score between two web url, initially i did this by scraping all the text from the web page and then calculated the document ...
2
votes
1
answer
674
views
Sentence Transformers in Python: "[E1002] Span index out of range"
As a programming noob, I am trying to find similar sentences in several hundreds of newspaper articles. I have tried my code with a smaller text sample which has worked brilliantly. Now, with a larger ...
3
votes
1
answer
2k
views
How to download and use the universal sentence encoder instead of loading it from url
I am using the universal sentence encoder to find sentence similarity. below is the code that i use to load the model
import tensorflow_hub as hub
model = hub.load("https://tfhub.dev/google/...
1
vote
0
answers
131
views
How to extract the keywords on which universal sentence encoder was trained on?
I am using Universal sentence encoder to encode some documents into a 512 dimensional embeddings. These are then used to find similar items to a search query which is also encoded using USE. USE works ...
1
vote
1
answer
262
views
Why do my t-SNE plots with euclidean and cosine distances look similar
I have a question about two t-SNE plots I made.
I have a set of 850 articles for which I wanted to check which articles are similar to each other.
This was done by pre-processing the articles first, ...
0
votes
0
answers
199
views
Compute similarity in pyspark
I have a csv file contains some data, I want select the similar data with an input.
my data is like:
H1 | H2 | H3
--------+---------+----------
A | 1 | 7
B | 5 | 3
C ...
0
votes
2
answers
498
views
Parameters for training a sentence-similarity model using Bert?
I have a list of sentences :
sentences = ["Missing Plate", "Plate not found"]
I am trying to find the most similar sentences in the list by using Transformers model with ...
0
votes
1
answer
149
views
How to identify the similar words using the word2vec
input: I have a set of words(N) & input sentence
problem statement:
the sentence is dynamic, the user can give any sentence related to one business domain. we have to map the input sentence tokens ...
1
vote
1
answer
2k
views
BERT with WMD distance for sentence similarity
I have tried to calculate the similarity between the two sentences using BERT and word mover distance (WMD). I am unable to find the correct formula for WMD in python. Also tried the WMD python ...
0
votes
1
answer
437
views
Sentence Transformers Using BOW?
I have a collection of terms that appear or are somehow related to web pages (e.g. keywords from the HTML tags). These are not sentences, they are just a collection of keywords, words in a title etc. ...