Newest 'dask-distributed+numpy' Questions

0 votes

0 answers

23 views

Processing the same array, dask.array is too slow compared to numpy.array

import BWStest as bws import numpy as np from skimage.measure import label import dask.array from tqdm import tqdm CalWin = [7,25] stack = [] thershold = 0.05 for i in range(5): image = np.random....

user19298695

1

asked Dec 4 at 11:57

0 votes

0 answers

83 views

How to speed up interpolation in dask

I have a piece of data code that performs interpolation on a large number of arrays. This is extremely quick with numpy, but: The data the code will work with in reality will often not fit in memory ...

abinitio

805

asked Jun 13 at 14:41

1 vote

1 answer

52 views

Why dask shows smaller size than the actual size of the data (numpy array)?

Dask shows slightly smaller size than the actual size of a numpy array. Here is an example of a numpy array that is exactly 32 Mb: import dask as da import dask.array import numpy as np shape = (1000,...

Ress

780

asked Jan 31 at 17:10

0 votes

0 answers

209 views

Cannot start dask client

When I try and initiate a dask distributed cluster with: from dask.distributed import Client, progress client = Client(threads_per_worker=1, n_workers=2) client I get the following error: ...

dbschwartz

783

asked Jan 14, 2023 at 21:22

0 votes

1 answer

127 views

Update parameters for parallel calculations using Python

I have some Python code that performs the operations listed below. The calc_result() function generates results based on the input parameters. At each step, those input parameters are updated to ...

wigging

9,160

asked Nov 16, 2021 at 21:58

0 votes

1 answer

107 views

Attempting to optimize the following loop over numpy arrays? Best method ? (numba or dask)

I am trying to refactor this code in order to minimize its runtime and memory usage (if possible) for i in range(gbl.NumStoreRows): cal_effects[i,:,:len(orig_cols)] = cal_effects_vals - **Use ~1gb ...

Paul Russell

179

asked Oct 7, 2021 at 19:51

2 votes

1 answer

1k views

Dask diagnostics - progress bar with map_partition / delayed

I am using the distributed scheduler and distributed progressbar. Is there a way of having the progress bar work for Dataframe.map_partition or delayed? I assume the lack of futures is what causes the ...

Giannis

5,496

asked Dec 1, 2020 at 10:23

0 votes

2 answers

1k views

Dask and numpy - slow conversion between numpy array and dask array

I need to save a dask array from a big numpy array. Below there is a minimum working example that’s show the process. Note that a is created with numpy.random only for this mwe, unfortunately I can ...

Guido Muscioni

1,295

asked Feb 20, 2020 at 0:01

0 votes

1 answer

165 views

problem parralleling dask code on single machine

Paralleling with dask is slower than sequential coding. I have a nested for loops which I am trying to parallel on a local cluster but can't find the right way. I want to parallel the inside loop. ...

netfr

1

asked Apr 22, 2019 at 12:11

-2 votes

1 answer

243 views

How to resolve Kernel Error or Memory Error?

I had and array of strings whose length is 50000. I am trying to create a a similarity matrix of dimension 50000 * 500000. In order to make it i tried forming the list of tuples using the following ...

Vas

998

asked Mar 1, 2019 at 9:58

2 votes

1 answer

124 views

How does distribution works in dask?

I have a dataframe: import numpy as np import pandas as pd import dask.dataframe as dd a = {'b':['cat','bat','cat','cat','bat','No Data','bat','No Data'], 'c':['str1','str2','str3', 'str4','...

Vas

998

asked Feb 28, 2019 at 11:39

3 votes

0 answers

686 views

split bigquery dataframe into chunks using dask

I searched and tested different ways to find if I can be able to split bigquery dataframe into chunks of 75 rows, but couldn't find a way to do so. here is the senario: I got a very large bigquery ...

MT467

698

asked Sep 27, 2018 at 17:34

32 votes

1 answer

4k views

Converting numpy solution into dask (numpy indexing doesn't work in dask)

I'm trying to convert my monte carlo simulation from numpy into dask, because sometimes the arrays are too large, and can't fit into the memory. Therefore I set up a cluster of computers in the cloud: ...

patex1987

417

asked Aug 23, 2018 at 11:50

Collectives™ on Stack Overflow

All Questions

Processing the same array, dask.array is too slow compared to numpy.array

How to speed up interpolation in dask

Why dask shows smaller size than the actual size of the data (numpy array)?

Cannot start dask client

Update parameters for parallel calculations using Python

Attempting to optimize the following loop over numpy arrays? Best method ? (numba or dask)

Dask diagnostics - progress bar with map_partition / delayed

Dask and numpy - slow conversion between numpy array and dask array

problem parralleling dask code on single machine

How to resolve Kernel Error or Memory Error?

How does distribution works in dask?

split bigquery dataframe into chunks using dask

Converting numpy solution into dask (numpy indexing doesn't work in dask)

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags