-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PT: add uniform likelihood bucket batching #1661
Conversation
9b53a49
to
c80d589
Compare
a2d513a
to
1392e99
Compare
1392e99
to
0a561e4
Compare
When I mention that I would try to optimize this, I was more thinking about writing a dedicated script just to do that. But this here could also be interesting. I'm not sure thought that I would put this into RETURNN yet, while you are still experimenting with it. You can rather put this somewhere into your |
With "limit", you mean the max num seqs of a bucket? I still don't really understand why this is something you want to optimize for. Why does it matter that they are evenly distributed? This is to get better speed or better model performance or both? I don't see how this affects speed. I also don't see/understand how this affects model performance. I thought that optimizing the max seq lens of each bucket, i.e. the boundaries, would be a more reasonable thing to optimize w.r.t. minimizing the amount of padding. That would improve at least the speed. |
It seems this PR includes all the epoch_continuous changes? Why? Are those in any way related here? I don't see how. If they are not related, can you remove them from this PR? |
I don‘t think this PR is going to be merged in the current form. |
This is a first attempt at automatically optimizing the bucket limits during training. After every subepoch the limits are adjusted to make every bucket catch a roughly equal number of segments.
I am debating on two more things: