Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
0 answers

Processing the same array, dask.array is too slow compared to numpy.array

import BWStest as bws import numpy as np from skimage.measure import label import dask.array from tqdm import tqdm CalWin = [7,25] stack = [] thershold = 0.05 for i in range(5): image = np.random....
user19298695's user avatar
0 votes
0 answers

How to speed up interpolation in dask

I have a piece of data code that performs interpolation on a large number of arrays. This is extremely quick with numpy, but: The data the code will work with in reality will often not fit in memory ...
abinitio's user avatar
  • 805
1 vote
1 answer

Why dask shows smaller size than the actual size of the data (numpy array)?

Dask shows slightly smaller size than the actual size of a numpy array. Here is an example of a numpy array that is exactly 32 Mb: import dask as da import dask.array import numpy as np shape = (1000,...
Ress's user avatar
  • 780
0 votes
0 answers

Cannot start dask client

When I try and initiate a dask distributed cluster with: from dask.distributed import Client, progress client = Client(threads_per_worker=1, n_workers=2) client I get the following error: ...
dbschwartz's user avatar
0 votes
1 answer

Update parameters for parallel calculations using Python

I have some Python code that performs the operations listed below. The calc_result() function generates results based on the input parameters. At each step, those input parameters are updated to ...
wigging's user avatar
  • 9,160
0 votes
1 answer

Attempting to optimize the following loop over numpy arrays? Best method ? (numba or dask)

I am trying to refactor this code in order to minimize its runtime and memory usage (if possible) for i in range(gbl.NumStoreRows): cal_effects[i,:,:len(orig_cols)] = cal_effects_vals - **Use ~1gb ...
Paul Russell's user avatar
2 votes
1 answer

Dask diagnostics - progress bar with map_partition / delayed

I am using the distributed scheduler and distributed progressbar. Is there a way of having the progress bar work for Dataframe.map_partition or delayed? I assume the lack of futures is what causes the ...
Giannis's user avatar
  • 5,496
0 votes
2 answers

Dask and numpy - slow conversion between numpy array and dask array

I need to save a dask array from a big numpy array. Below there is a minimum working example that’s show the process. Note that a is created with numpy.random only for this mwe, unfortunately I can ...
Guido Muscioni's user avatar
0 votes
1 answer

problem parralleling dask code on single machine

Paralleling with dask is slower than sequential coding. I have a nested for loops which I am trying to parallel on a local cluster but can't find the right way. I want to parallel the inside loop. ...
netfr's user avatar
  • 1
-2 votes
1 answer

How to resolve Kernel Error or Memory Error?

I had and array of strings whose length is 50000. I am trying to create a a similarity matrix of dimension 50000 * 500000. In order to make it i tried forming the list of tuples using the following ...
Vas's user avatar
  • 998
2 votes
1 answer

How does distribution works in dask?

I have a dataframe: import numpy as np import pandas as pd import dask.dataframe as dd a = {'b':['cat','bat','cat','cat','bat','No Data','bat','No Data'], 'c':['str1','str2','str3', 'str4','...
Vas's user avatar
  • 998
3 votes
0 answers

split bigquery dataframe into chunks using dask

I searched and tested different ways to find if I can be able to split bigquery dataframe into chunks of 75 rows, but couldn't find a way to do so. here is the senario: I got a very large bigquery ...
MT467's user avatar
  • 698
32 votes
1 answer

Converting numpy solution into dask (numpy indexing doesn't work in dask)

I'm trying to convert my monte carlo simulation from numpy into dask, because sometimes the arrays are too large, and can't fit into the memory. Therefore I set up a cluster of computers in the cloud: ...
patex1987's user avatar
  • 417