Do steps round down when it's fractional?

Question

I have begun experimenting with training machine learning models and encountered some confusion around the concepts of epochs and steps in the training process. While researching online, I came across a formula (𝜎 = (𝜀 × 𝜂) ÷ 𝛽) relating epochs, steps, and batch size. Applying this formula to my own dataset yielded a fractional number of steps, which raised questions for me about how steps are typically handled in practice. I'm unsure if fractional steps are rounded down or how exactly this translates to the actual training process. My lack of hands-on experience with implementing training loops has made it challenging to intuitively grasp how these concepts map to real-world model training scenarios.

To better understand the relationship between epochs, steps, and batch size, I tried applying the formula I found (𝜎 = (𝜀 × 𝜂) ÷ 𝛽) to a dataset (this is only an theorical example dataset):

total_samples = 10000  # Total number of samples in my dataset
batch_size = 32        # Batch size I plan to use
epochs = 10            # Number of epochs I want to train for

steps_per_epoch = total_samples / batch_size
total_steps = (epochs * total_samples) / batch_size

print(f"Steps per epoch: {steps_per_epoch}")
print(f"Total steps: {total_steps}")

This produced the following output:

Steps per epoch: 312.5
Total steps: 3125.0

The fractional result for steps per epoch (312.5) left me uncertain about how this would be implemented in a real training loop. Specifically:

Are fractional steps typically rounded down in practice?
If rounding occurs, does this mean some data samples might be skipped in each epoch?
How do common machine learning frameworks handle this situation?

I haven't actually implemented a training loop yet, so I'm not sure how these fractional steps would be handled in code. My main difficulty is bridging the gap between the theoretical calculation and its practical application in model training.

Sachin Hosmani · Accepted Answer · 2024-09-28 20:41:09Z

If rounding occurs, does this mean some data samples might be skipped in each epoch?

A fractional step simply means that the set of examples in the very last step of an epoch is smaller than the batch size.

To round down the step size means to ignore this last set of examples. This can be somewhat okay to do if the batch size is small in relation to the dataset size and if data gets shuffled for each epoch.
To round up the step size means to train with a smaller batch at the end of the epoch. This could throw some infrastructural components off balance in large scale ML systems and complicate our analysis (because it is now not precisely using the same batch size for all batches).

Are fractional steps typically rounded down in practice? How do common machine learning frameworks handle this situation?

Let's see how Pytorch handles this. In Pytorch we usually load data through data loaders. It has a parameter called drop_last whose purpose is exactly to specify whether to round down or round up. By default it is set to False which means to round up. This could be a good reference point for your implementation as well.

Collectives™ on Stack Overflow

Do steps round down when it's fractional?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
tensorflow
machine-learning
deep-learning
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged tensorflowmachine-learningdeep-learning or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
tensorflow
machine-learning
deep-learning
or ask your own question.