Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PT: add uniform likelihood bucket batching #1661

Closed
wants to merge 20 commits into from

Conversation

NeoLegends
Copy link
Member

@NeoLegends NeoLegends commented Dec 5, 2024

This is a first attempt at automatically optimizing the bucket limits during training. After every subepoch the limits are adjusted to make every bucket catch a roughly equal number of segments.

I am debating on two more things:

  1. Keeping only a fixed number of seq lens that have passed through the batching (e.g. 10k + upper + lower value) to have an upper bound on the memory consumption. Whenever the list is extended by another batch and crosses said limit, we re-sample randomly from that list to keep it at a fixed size. As long as the number of seq lens we keep is large enough this shouldn't have a big influence on the accuracy of the results.
  2. Keeping seq len statistics even across subepochs to make the statistics better.

@NeoLegends NeoLegends self-assigned this Dec 5, 2024
@NeoLegends NeoLegends force-pushed the moritz-optim-bucket-limits branch 4 times, most recently from 9b53a49 to c80d589 Compare December 5, 2024 13:10
@NeoLegends NeoLegends marked this pull request as ready for review December 5, 2024 13:13
@NeoLegends NeoLegends requested review from albertz and a team as code owners December 5, 2024 13:13
@NeoLegends NeoLegends force-pushed the moritz-optim-bucket-limits branch 2 times, most recently from a2d513a to 1392e99 Compare December 5, 2024 13:17
@NeoLegends NeoLegends changed the title PT: add optimizing bucket batching PT: add uniform likelihood bucket batching Dec 5, 2024
@NeoLegends NeoLegends force-pushed the moritz-optim-bucket-limits branch from 1392e99 to 0a561e4 Compare December 5, 2024 13:17
@albertz
Copy link
Member

albertz commented Dec 5, 2024

When I mention that I would try to optimize this, I was more thinking about writing a dedicated script just to do that.

But this here could also be interesting.

I'm not sure thought that I would put this into RETURNN yet, while you are still experimenting with it. You can rather put this somewhere into your i6_experiments.users. and then just use it from there. (I think we might need to extend the class resolution slightly, if "." in cls_name: ...; see similar code in rf.build_from_dict or get_optimizer_class or other such functions.)

@NeoLegends NeoLegends marked this pull request as draft January 10, 2025 13:38
@albertz
Copy link
Member

albertz commented Jan 11, 2025

This is a first attempt at automatically optimizing the bucket limits during training. After every subepoch the limits are adjusted to make every bucket catch a roughly equal number of segments.

With "limit", you mean the max num seqs of a bucket?

I still don't really understand why this is something you want to optimize for. Why does it matter that they are evenly distributed?

This is to get better speed or better model performance or both? I don't see how this affects speed. I also don't see/understand how this affects model performance.

I thought that optimizing the max seq lens of each bucket, i.e. the boundaries, would be a more reasonable thing to optimize w.r.t. minimizing the amount of padding. That would improve at least the speed.

@albertz
Copy link
Member

albertz commented Jan 11, 2025

It seems this PR includes all the epoch_continuous changes? Why? Are those in any way related here? I don't see how. If they are not related, can you remove them from this PR?

@NeoLegends
Copy link
Member Author

I don‘t think this PR is going to be merged in the current form.

@NeoLegends NeoLegends closed this Jan 11, 2025
@albertz albertz deleted the moritz-optim-bucket-limits branch January 11, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants