Performance of computing partial derivative #77

smao-astro · 2020-12-27T13:06:09Z

Hi,

I am pretty new to neurodiffeq, thank you very much for the excellent library.

I am interested in the way, and the computational speed, of computing partial derivatives w.r.t. the inputs.

Take forward ODE (1D, 1 unknown variable) solver for example, the input is x, a batch of coordinates, and the output of the neural network is y, the approximated solution of the PDE at these coordinates. If view the neural network as a smooth function that simulate the solution and name it f, the forward part in the training is evaluating y = f(x), and for each element of input, x_i, the neural network gives y_i = f(x_i), the i increase from 0 to N-1, the batch size. When constructing the loss function, one evaluate the residual of PDE, which usually require evaluating \frac{\partial y_i}{\partial x_i} and higher order of derivative.

My question related to the way of evaluating the \frac{\partial y_i}{\partial x_i}, for example x is (N, 1) tensor, y is also (N, 1) tensor, N is the batch size, if you do autograd.grad(y, t, create_graph=True, grad_outputs=ones, allow_unused=True) as the lines below
https://github.com/odegym/neurodiffeq/blob/718f226d40cfbcb9ed50d72119bd3668b0c68733/neurodiffeq/neurodiffeq.py#L21-L22
my understanding is that it will evaluate a Jacobian Matrix of size (N, N) with elements equal to \frac{\partial y_i}{\partial x_j} (I, j from 0 to N-1) regardless of the fact that y_i only dependent on x_i and thus computation (and storage) on the non-diagonal elements is useless and unnecessary. In other word, the computation actually can be done by evaluating N gradients, but the current method do N * N times.

My question is that:

Is what I state above correct to your understanding?
If correct, do you think this may cause computation speed influence?
If 1 is correct, do you know any way, and do you have any plan to reduce the computation needed?

Thanks!

The text was updated successfully, but these errors were encountered:

shuheng-liu · 2020-12-27T13:24:31Z

Thanks for you interest in neurodiffeq. Your question was very good and thought-provoking, I need to check whether it indeed computes a N x N Jacobian when there's no interdependence in the batch. My intuition is that you are right.

In the meantime, may I ask how you came to know neurodiffeq? It'd be nice to know how we can further promote it :)

smao-astro · 2021-01-29T13:52:29Z

Hi @shuheng-liu ,

Do you have any update would like to share about the questions, I am not familiar to automatic diff so I would like to know whether the current implementation is efficient.

In addition, I noticed that this nightly new API might helpful, what do you think?

I can not remember the detail since I noticed it earlier last year, maybe from some paper (?)

Thanks.

shuheng-liu · 2021-01-30T03:50:10Z

computation (and storage) on the non-diagonal elements is useless and unnecessary

I am pretty sure it won't introduce additional storage because PyTorch accumulates (sums) the gradient w.r.t. the same variable. For example

x = torch.rand(10, 1, requires_grad=True)  # shape 10 x 1
y = torch.cat([x, x*2, x*3], 1)            # shape 10 x 3
y.backward(torch.ones_like(y))
print(x.grad)

will give you a 10 x 1 tensor filled with 6s, instead of a 10 x 3 tensor of (1, 2, 3)s

As for the computation part, I'm thinking it will be more expensive though because PyTorch uses reverse mode of automatic differentiation, so when the output has a high dimension, the computation will slow down. I'm not sure if this can be fixed without switching to a different framework other than PyTorch.

torch.vmap looks interesting but I don't have much experience with it. I'm still trying to understand it.

shuheng-liu added enhancement New feature or request good first issue Good for newcomers question Further information is requested labels Dec 27, 2020

shuheng-liu self-assigned this Dec 27, 2020

shuheng-liu pinned this issue Dec 27, 2020

sathvikbhagavan unpinned this issue Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of computing partial derivative #77

Performance of computing partial derivative #77

smao-astro commented Dec 27, 2020

shuheng-liu commented Dec 27, 2020 •

edited

Loading

smao-astro commented Jan 29, 2021

shuheng-liu commented Jan 30, 2021

Performance of computing partial derivative #77

Performance of computing partial derivative #77

Comments

smao-astro commented Dec 27, 2020

shuheng-liu commented Dec 27, 2020 • edited Loading

smao-astro commented Jan 29, 2021

shuheng-liu commented Jan 30, 2021

shuheng-liu commented Dec 27, 2020 •

edited

Loading