2

I have an array of elements, for example r = np.arange(15). I'm trying to split this array into chunks of consecutive elements, where each chunk (except maybe the last one) has size M and there are m repeating elements between each pair of chunks.

For example: split_to_chunks(np.arange(15), M=5, m=1) should yield four lists: [0, 1, 2, 3, 4], [4, 5, 6, 7, 8], [8, 9, 10, 11, 12], [12, 13, 14]

Obviously this can be done iteratively, but I'm looking for a more "pythonic" (and faster) way of doing this.

6
  • are you handling lists, or numpy arrays?
    – 2e0byo
    Commented May 18, 2022 at 16:10
  • lists, numpy arrays, pandas Series - this should work for Iterables in general
    – Jon Nir
    Commented May 18, 2022 at 16:58
  • 1
    Iteration is pythonic!!
    – wwii
    Commented May 18, 2022 at 18:19
  • M=5; m=1; b=M-m; [a[i:i+M] for i in range(0,len(a),b)]
    – wwii
    Commented May 18, 2022 at 18:27
  • Please check my answer it works with different overlaps. Commented May 20, 2022 at 14:16

2 Answers 2

2

Something like this with list comprehension:

[l[i*(M-m):i*(M-m)+M] for i in range(math.ceil((len(l)-m)/(M-m)))]

Example:

import math
l = list(range(15))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
m, M = 2, 5
[l[i*(M-m):i*(M-m)+M] for i in range(math.ceil((len(l)-m)/(M-m)))]
# [[0, 1, 2, 3, 4],
# [3, 4, 5, 6, 7],
# [6, 7, 8, 9, 10],
# [9, 10, 11, 12, 13],
# [12, 13, 14]]

m, M = 3, 5
[l[i*(M-m):i*(M-m)+M] for i in range(math.ceil((len(l)-m)/(M-m)))]
# [[0, 1, 2, 3, 4],
#  [2, 3, 4, 5, 6],
#  [4, 5, 6, 7, 8],
#  [6, 7, 8, 9, 10],
#  [8, 9, 10, 11, 12],
#  [10, 11, 12, 13, 14]]

l = range(5)
m, M = 2, 3
[l[i*(M-m):i*(M-m)+M] for i in range(math.ceil((len(l)-m)/(M-m)))]
# [range(0, 3), range(1, 4), range(2, 5)]

Explanation:

Chunk i starts at index i*(M-m) and ends M positions later at index i*(M-m) + M.

chunk index    starts           ends
-------------------------------------------------
0              0                M
1              M-m              M-m+M = 2*M-m
2              2*M-m-m=2(M-m)   2*(M-m)+M = 3M-2m
...

Now the problem is to determine how many chunks.

At each step we increase the initial index by M-m, so to count the total number of steps we need to divide the length of the list by M-m (but after subtracting m because in the first chunk we're not skipping anything).

Finally, use the ceiling function to add the last incomplete chunk in case the division is not exact.

1

This should do the job:

def split_to_chunks(r, M=5, m=1):
    return [r[i*(M-m): (i+1)*M-i*m] for i in range(len(r)//(M-m)+1) if i*(M-m) < len(r)]

Explanation: in a list comprehension loop through the indexes in the way explained in the question. Each start of a chunk will start at i*(M-m) and end at (i+1)*M-i*m. Finally if the start of the chunk is after the length of the array it will skip it.

4
  • Please add explanation to your answer
    – S.B
    Commented May 18, 2022 at 16:47
  • thanks! I think the if statement is redundant, no?
    – Jon Nir
    Commented May 18, 2022 at 17:20
  • @JonNir in some cases you will have an empty list at the end if it is not there
    – D.Manasreh
    Commented May 18, 2022 at 19:35
  • This will give an incorrect result for for instance split_to_chunks(np.arange(15), 5, 3) Commented May 20, 2022 at 14:14

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.