0

I have begun experimenting with training machine learning models and encountered some confusion around the concepts of epochs and steps in the training process. While researching online, I came across a formula (𝜎 = (𝜀 × 𝜂) ÷ 𝛽) relating epochs, steps, and batch size. Applying this formula to my own dataset yielded a fractional number of steps, which raised questions for me about how steps are typically handled in practice. I'm unsure if fractional steps are rounded down or how exactly this translates to the actual training process. My lack of hands-on experience with implementing training loops has made it challenging to intuitively grasp how these concepts map to real-world model training scenarios.

To better understand the relationship between epochs, steps, and batch size, I tried applying the formula I found (𝜎 = (𝜀 × 𝜂) ÷ 𝛽) to a dataset (this is only an theorical example dataset):

total_samples = 10000  # Total number of samples in my dataset
batch_size = 32        # Batch size I plan to use
epochs = 10            # Number of epochs I want to train for

steps_per_epoch = total_samples / batch_size
total_steps = (epochs * total_samples) / batch_size

print(f"Steps per epoch: {steps_per_epoch}")
print(f"Total steps: {total_steps}")

This produced the following output:

Steps per epoch: 312.5
Total steps: 3125.0

The fractional result for steps per epoch (312.5) left me uncertain about how this would be implemented in a real training loop. Specifically:

  1. Are fractional steps typically rounded down in practice?
  2. If rounding occurs, does this mean some data samples might be skipped in each epoch?
  3. How do common machine learning frameworks handle this situation?

I haven't actually implemented a training loop yet, so I'm not sure how these fractional steps would be handled in code. My main difficulty is bridging the gap between the theoretical calculation and its practical application in model training.

1 Answer 1

0

If rounding occurs, does this mean some data samples might be skipped in each epoch?

A fractional step simply means that the set of examples in the very last step of an epoch is smaller than the batch size.

  • To round down the step size means to ignore this last set of examples. This can be somewhat okay to do if the batch size is small in relation to the dataset size and if data gets shuffled for each epoch.
  • To round up the step size means to train with a smaller batch at the end of the epoch. This could throw some infrastructural components off balance in large scale ML systems and complicate our analysis (because it is now not precisely using the same batch size for all batches).

Are fractional steps typically rounded down in practice? How do common machine learning frameworks handle this situation?

Let's see how Pytorch handles this. In Pytorch we usually load data through data loaders. It has a parameter called drop_last whose purpose is exactly to specify whether to round down or round up. By default it is set to False which means to round up. This could be a good reference point for your implementation as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.