I have begun experimenting with training machine learning models and encountered some confusion around the concepts of epochs and steps in the training process. While researching online, I came across a formula (𝜎 = (𝜀 × 𝜂) ÷ 𝛽) relating epochs, steps, and batch size. Applying this formula to my own dataset yielded a fractional number of steps, which raised questions for me about how steps are typically handled in practice. I'm unsure if fractional steps are rounded down or how exactly this translates to the actual training process. My lack of hands-on experience with implementing training loops has made it challenging to intuitively grasp how these concepts map to real-world model training scenarios.
To better understand the relationship between epochs, steps, and batch size, I tried applying the formula I found (𝜎 = (𝜀 × 𝜂) ÷ 𝛽) to a dataset (this is only an theorical example dataset):
total_samples = 10000 # Total number of samples in my dataset
batch_size = 32 # Batch size I plan to use
epochs = 10 # Number of epochs I want to train for
steps_per_epoch = total_samples / batch_size
total_steps = (epochs * total_samples) / batch_size
print(f"Steps per epoch: {steps_per_epoch}")
print(f"Total steps: {total_steps}")
This produced the following output:
Steps per epoch: 312.5
Total steps: 3125.0
The fractional result for steps per epoch (312.5) left me uncertain about how this would be implemented in a real training loop. Specifically:
- Are fractional steps typically rounded down in practice?
- If rounding occurs, does this mean some data samples might be skipped in each epoch?
- How do common machine learning frameworks handle this situation?
I haven't actually implemented a training loop yet, so I'm not sure how these fractional steps would be handled in code. My main difficulty is bridging the gap between the theoretical calculation and its practical application in model training.