-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for optimize_out_slice_nd #543
base: master
Are you sure you want to change the base?
Conversation
a75c364
to
bbdbb67
Compare
bbdbb67
to
10cffc1
Compare
This comment has been minimized.
This comment has been minimized.
a45b4a5
to
e432a28
Compare
I've added the tf implementation of the new Before implementing the RETURNN layer, I wanted to clarify if we want to allow a I.e. """
:param tf.Tensor input: shape (B, T1, ..., Tn, D)
:param tf.Tensor start: shape (B,T1 .., Tn-1), int32 which automatically indicates n as the slice-axis
""" |
I don't really understand why you would want that? I would assume this
layer (like GatherNdLayer) only makes sense like this here. Do not care
about other cases, esp not if they don't really make sense.
Mikel Zhobro ***@***.***> schrieb am Fr., 25. Juni 2021,
15:17:
… ***@***.**** commented on this pull request.
------------------------------
In tests/test_TFNetworkRecLayer.py
<#543 (comment)>:
> + "window": {"class": "slice_nd2", # no_opt: [B,4,D], opt: [B,T,4,D]
+ "from": "base:data", "start": "data:source", "size": 4, "is_output_layer": True},
Is there any way how to make it dependent? For example copy it into an
intermediate layer or so?
Otherwise, I don't know how making it work for all cases is possible.
Maybe making sure that the input has one more time axis than start or
tf.tile it otherwise?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#543 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAON7BRD77M6BH3TZBMJSTTUR6WXANCNFSM46TVFJYQ>
.
|
Can you post here what the actual problem/error is when you use these test cases with the original |
4eaa89a
to
a8c19a2
Compare
The Error strace you asked. The thing is that the |
1292e80
to
263ba62
Compare
263ba62
to
4749b27
Compare
returnn/tf/layers/basic.py
Outdated
assert x.get_dim_tag(start_axis).is_equal(start.get_dim_tag(start_axis), **is_equal_opts) | ||
|
||
# Handle the case when layer is pulled out of rec loop but the input hasn't change | ||
if self.optimized_out_of_loop_and_unchanged_input(x, start): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the name of the function and the comment. This layer is not related to the rec loop (RecLayer
) in any way. No comment should mention anything about rec loop here.
I don't really understand what you are actually testing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only expect input_data
comes from base_network
, no check is required. We can always assume that the input_data
stays the same both when being pulled out of the loop or not.
Should I follow this logic?
self.output.size_placeholder = x.size_placeholder.copy() | ||
if isinstance(size, tf.Tensor): | ||
self.output.size_placeholder[0] = tf.maximum(seq_lens - tf.reshape(start, tf.shape(seq_lens)), 0) | ||
self.output.size_placeholder[slice_axis] = size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. size
is just a scalar. But you need a vector of shape [B] here. What's wrong with the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you need a vector of shape [B] here. What's wrong with the old code?
We cannot use start
directly to calculate the size as it has shape [B,T0,..]
instead of [B,]
.
So I am not sure how to set self.output.size_placeholder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of self.output.size_placeholder
if we have more than 1 spatial axis? Should any element of self.output.size_placeholder
have size (B,)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_placeholder
is a dict, mapping each spatial axis (counted without batch dim) to such [B]
tensor.
So if you have a tensor [B,T1,T2,F]
, then for each batch entry b
you find it's length of axis T1
in size_placeholder[0][b]
, and the one for T2
in size_placeholder[1][b]
.
However with this it's not possible that lengths depend on anything else then the batch entry, i.e. you cannot model that the some batch entry should have a different length in axis T2
for different time steps t1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot model that the same batch entry should have a different length in axis T2 for different time steps t1
This is what we have here, though.
What is self.output.size_placeholder
used for anyways? Does it cause any problems if set wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size_placeholder
is mostly used for masking, e.g which entries to ignore when calculating the loss, or reducing a sequence (e.g. take the average/min/max over the sequence), or also for layers which concat in time and such.
Many layers consider this, if this is set wrong then weird things can happen and your calculations will be wrong and depend on batching.
So yes, I'd say setting this is kind of important.
I don't know a way to set it how you want it. Maybe @albertz knows more?
returnn/tf/layers/basic.py
Outdated
self.output.placeholder = slices | ||
|
||
@classmethod | ||
def optimized_out_of_loop_and_unchanged_input(cls, input_data, start): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained above (and before), this name must be changed to what this function actually does/checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now called input_comes_from_base_network
.
returnn/tf/layers/basic.py
Outdated
def optimized_out_of_loop_and_unchanged_input(cls, input_data, start): | ||
""" | ||
:rtype: bool | ||
The idea is to check that the axis after the last common axis is a feature axis instead of spatial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this. Why do you check this? Why is this relevant?
Also, what does this mean, "the idea"? Is this what this function returns, or how does this idea relate to what this function returns? If this is what the function returns, it should be like :returns: True iff axis after last ... is ... instead of ...
or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reformulated it. Should be more clear now.
"""
The idea is to check if the axis after the last common axis is the feature axis instead of another spatial axis.
Because the input_data should normally have one extra spatial axis compared to start.
"""
slice_idx = tf.tile(tf.expand_dims(start, -1), [1] * len_common_dims + [size]) + tf.range(size) | ||
mask = tf.logical_or(tf.greater(slice_idx, slice_dim - 1), tf.less(slice_idx, 0)) # (B,T1 .., Tn-1, size) | ||
slice_idx = tf.clip_by_value(slice_idx, 0, slice_dim - 1) # cliped slice idx | ||
res = tf.gather(x, slice_idx, axis=len_common_dims, batch_dims=len_common_dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tf.gather
with axis
and batch_dims
is only supported in later TF versions (I don't remember since what version, can you/someone check?).
I'm not sure anymore which is the min TF version we want to support (I think we documented this somewhere; can someone check? @patrick-wilken ? @JackTemaki ?).
Maybe we need our own wrapper for gather
. Or you use the more generic gather_nd
(which might be slower for this case though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example case which shows that
slice_nd
layer doesn't get properly optimized out of the loop.This pull request is meant to fix that.
For clarification, we want to generalize
SliceNdLayer
to not only work onstart
of shape(batch,)
but cover even the cases with several time axis.