All Questions
Tagged with dask-distributed numpy
13 questions
0
votes
0
answers
23
views
Processing the same array, dask.array is too slow compared to numpy.array
import BWStest as bws
import numpy as np
from skimage.measure import label
import dask.array
from tqdm import tqdm
CalWin = [7,25]
stack = []
thershold = 0.05
for i in range(5):
image = np.random....
0
votes
0
answers
83
views
How to speed up interpolation in dask
I have a piece of data code that performs interpolation on a large number of arrays.
This is extremely quick with numpy, but:
The data the code will work with in reality will often not fit in memory
...
1
vote
1
answer
52
views
Why dask shows smaller size than the actual size of the data (numpy array)?
Dask shows slightly smaller size than the actual size of a numpy array. Here is an example of a numpy array that is exactly 32 Mb:
import dask as da
import dask.array
import numpy as np
shape = (1000,...
0
votes
0
answers
209
views
Cannot start dask client
When I try and initiate a dask distributed cluster with:
from dask.distributed import Client, progress
client = Client(threads_per_worker=1, n_workers=2)
client
I get the following error:
...
0
votes
1
answer
127
views
Update parameters for parallel calculations using Python
I have some Python code that performs the operations listed below. The calc_result() function generates results based on the input parameters. At each step, those input parameters are updated to ...
0
votes
1
answer
107
views
Attempting to optimize the following loop over numpy arrays? Best method ? (numba or dask)
I am trying to refactor this code in order to minimize its runtime and memory usage (if possible)
for i in range(gbl.NumStoreRows):
cal_effects[i,:,:len(orig_cols)] = cal_effects_vals - **Use ~1gb ...
2
votes
1
answer
1k
views
Dask diagnostics - progress bar with map_partition / delayed
I am using the distributed scheduler and distributed progressbar.
Is there a way of having the progress bar work for Dataframe.map_partition or delayed? I assume the lack of futures is what causes the ...
0
votes
2
answers
1k
views
Dask and numpy - slow conversion between numpy array and dask array
I need to save a dask array from a big numpy array. Below there is a minimum working example that’s show the process. Note that a is created with numpy.random only for this mwe, unfortunately I can ...
0
votes
1
answer
165
views
problem parralleling dask code on single machine
Paralleling with dask is slower than sequential coding.
I have a nested for loops which I am trying to parallel on a local cluster but can't find the right way.
I want to parallel the inside loop.
...
-2
votes
1
answer
243
views
How to resolve Kernel Error or Memory Error?
I had and array of strings whose length is 50000. I am trying to create a a similarity matrix of dimension 50000 * 500000. In order to make it i tried forming the list of tuples using the following ...
2
votes
1
answer
124
views
How does distribution works in dask?
I have a dataframe:
import numpy as np
import pandas as pd
import dask.dataframe as dd
a = {'b':['cat','bat','cat','cat','bat','No Data','bat','No Data'],
'c':['str1','str2','str3', 'str4','...
3
votes
0
answers
686
views
split bigquery dataframe into chunks using dask
I searched and tested different ways to find if I can be able to split bigquery dataframe into chunks of 75 rows, but couldn't find a way to do so. here is the senario:
I got a very large bigquery ...
32
votes
1
answer
4k
views
Converting numpy solution into dask (numpy indexing doesn't work in dask)
I'm trying to convert my monte carlo simulation from numpy into dask, because sometimes the arrays are too large, and can't fit into the memory. Therefore I set up a cluster of computers in the cloud: ...