0

I have two numpy arrays x and y of same length, and I am trying to make a square matrix A such that the (i,j) entry of the matrix will contain a 1 if a certain relationship holds between x[i], x[j], y[i] and y[j], and a 0 otherwise.

The current method I have for this is starting with A as a zero matrix (of dimensions len(x) by len(x)) and then using a double for loop to range over all i, and all j>i (this matrix is symmetric in the diagonal). However this takes a really really long time to run - the length of x is around 15,000. The vast majority of entries in A will be 0, so this seems to be quite an inefficient way of doing this. I thought that potentially masked arrays could be used but I haven’t figured out how to use them in this situation. Any help would be greatly appreciated!

Here is the code I currently have: df is a data frame in which one column contains surnames and another a date.

import pandas as pd
from datetime import datetime

date_format = '%Y-%m-%d %H:%M:%S'

df1 = pd.DataFrame([['Smith', '2024-12-16 12:00:00'], ['Smith', '2024-12-16 13:00:00'], ['Doe', '2024-12-16 12:01:00'], ['Doe', '2024-12-16 12:04:00']])# -*- coding: utf-8 -*-
df1.columns = ['Surname', 'Date']
df1['Date'] = df1.apply(lambda r : datetime.strptime(r['Date'], date_format),1)

x1 = df1['Surname'].to_numpy()
y1 = df1['Date'].to_numpy()

A1 = scipy.sparse.lil_matrix((len(df1.index), len(df1.index))).todense()
for i in range(len(x1)):
    for j in range(i, len(x1)):
        A1[i,j] = int((x1[i] == x1[j]) & (abs(np.timedelta64(y1[i]-y1[j], 'm')) < np.timedelta64(5, 'm')))
New contributor
JLB is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
10
  • 2
    Please provide a minimal reproducible example
    – Homer512
    Commented yesterday
  • 1
    It will be easier to understand if you give an example instead of a description.
    – Guy
    Commented yesterday
  • Also you say x and y are numpy arrays but then tag the question with sparse-matrix. Which one is it? numpy.array or scipy sparse matrix?
    – Homer512
    Commented yesterday
  • @Homer512 x and y are numpy arrays but I have been using A as a scipy sparse matrix
    – JLB
    Commented yesterday
  • 4
    The code you posted is not a minimal reproducible example. An MRE is a snippet of code that I can copy-and-paste into my own interpreter and it will run. I don't have your dataframe df. It's also supposed to be minimal. Your dataframe has no meaning for the question and just complicates the picture. Please read the help section on how to ask a good question before asking.
    – Homer512
    Commented yesterday

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Browse other questions tagged or ask your own question.