How to efficiently make a large matrix of 1s and 0s

Ask Question

Asked yesterday

Modified yesterday

Viewed 95 times

I have two numpy arrays x and y of same length, and I am trying to make a square matrix A such that the (i,j) entry of the matrix will contain a 1 if a certain relationship holds between x[i], x[j], y[i] and y[j], and a 0 otherwise.

The current method I have for this is starting with A as a zero matrix (of dimensions len(x) by len(x)) and then using a double for loop to range over all i, and all j>i (this matrix is symmetric in the diagonal). However this takes a really really long time to run - the length of x is around 15,000. The vast majority of entries in A will be 0, so this seems to be quite an inefficient way of doing this. I thought that potentially masked arrays could be used but I haven’t figured out how to use them in this situation. Any help would be greatly appreciated!

Here is the code I currently have: df is a data frame in which one column contains surnames and another a date.

import pandas as pd
from datetime import datetime

date_format = '%Y-%m-%d %H:%M:%S'

df1 = pd.DataFrame([['Smith', '2024-12-16 12:00:00'], ['Smith', '2024-12-16 13:00:00'], ['Doe', '2024-12-16 12:01:00'], ['Doe', '2024-12-16 12:04:00']])# -*- coding: utf-8 -*-
df1.columns = ['Surname', 'Date']
df1['Date'] = df1.apply(lambda r : datetime.strptime(r['Date'], date_format),1)

x1 = df1['Surname'].to_numpy()
y1 = df1['Date'].to_numpy()

A1 = scipy.sparse.lil_matrix((len(df1.index), len(df1.index))).todense()
for i in range(len(x1)):
    for j in range(i, len(x1)):
        A1[i,j] = int((x1[i] == x1[j]) & (abs(np.timedelta64(y1[i]-y1[j], 'm')) < np.timedelta64(5, 'm')))

edited yesterday

Holger Just

55.6k15 gold badges121 silver badges132 bronze badges

asked yesterday

JLB

1091 bronze badge

New contributor

2

Please provide a minimal reproducible example
– Homer512
Commented yesterday
1

It will be easier to understand if you give an example instead of a description.
– Guy
Commented yesterday
Also you say x and y are numpy arrays but then tag the question with sparse-matrix. Which one is it? numpy.array or scipy sparse matrix?
– Homer512
Commented yesterday
@Homer512 x and y are numpy arrays but I have been using A as a scipy sparse matrix
– JLB
Commented yesterday
4

The code you posted is not a minimal reproducible example. An MRE is a snippet of code that I can copy-and-paste into my own interpreter and it will run. I don't have your dataframe df. It's also supposed to be minimal. Your dataframe has no meaning for the question and just complicates the picture. Please read the help section on how to ask a good question before asking.
– Homer512
Commented yesterday

| Show 5 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Collectives™ on Stack Overflow

How to efficiently make a large matrix of 1s and 0s

0

Your Answer

Browse other questions tagged
python
pandas
dataframe
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged pythonpandasdataframe or ask your own question.

Browse other questions tagged
python
pandas
dataframe
or ask your own question.