Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference Paper - Implementation #18

Closed
kampelmuehler opened this issue Jan 4, 2019 · 11 comments
Closed

Difference Paper - Implementation #18

kampelmuehler opened this issue Jan 4, 2019 · 11 comments

Comments

@kampelmuehler
Copy link

Dear authors,

equation (1) in the paper states that you are taking the euclidean norm squared of the weighted differences.
Something like euclidean_norm(dot(w_l, (y - y_0)))²
However, in the implementation you are weighting the squared difference of the euclidean norms, something like dot(w_l, (euclidean_norm(y)-euclidean_norm(y_0))²) which as far as I am concerend is not the same thing. Or am I missing something here?

Thanks!

@richzhang
Copy link
Owner

normalize_tensor doesn't take the norm, it normalizes the tensor. I think the implementation matches the paper.

@aweinmann
Copy link

Hello,
I still have issues understanding how the implementation matches the paper formula

lets say we have the following example:

a = np.array([0.1, 0.4, 0.2])
b = np.array([0.3, 0.5, 0.6])
w = np.array([1,1,1])
eps = 1e-10 

we normalize the two inputs a and b:

a_n = a / (np.sqrt(np.sum(a**2)) + eps)
b_n = b / (np.sqrt(np.sum(b**2)) + eps)

now from the paper formula i would expect the following computation:

np.dot(w, (a_n - b_n)) ** 2

0.021256129990661603

in the implementation its computed like this:

np.sum(np.subtract(a_n, b_n)**2)

0.1742581415905922

which would be equivalent to np.dot(w, (a_n - b_n) ** 2) (for w_l = 1∀l)

shouldn't it be:

np.sum(np.subtract(a_n, b_n))**2

0.021256129990661603

or did I miss something?

Thanks in advance!

@richzhang
Copy link
Owner

Each element needs to be squared first (see how p-norms are calculated)

@aweinmann
Copy link

Thanks a lot for the reply!

Still feel like the formula in the paper is not matching the implementation since in the formula the norm is computed after the dot product which already sums up the difference (with w_l = 1∀l)
image

Nevertheless thanks a lot for clarifying the implementation, it is clear to me now.
Best Regards

@richzhang
Copy link
Owner

richzhang commented Mar 25, 2020

Inside the || || is a vector, not a scalar. The symbol does not mean dot product

@aweinmann
Copy link

Sorry, somehow missed the reply. Thanks!

@fukka
Copy link

fukka commented Aug 4, 2023

Hi, thank you for your amazing work, but I also find the code a bit different from the paper:
In paper you have L2(w*(y_1-y_2)), but in the code you have dot(w, L2((y_1-y_2))).

@richzhang
Copy link
Owner

Yes, the w in the code is actually w^2 in the paper.

@fukka
Copy link

fukka commented Aug 4, 2023

Thank you so much for your reply. May I ask another question?
I am trying to re-train with your code, I get most of the weight of w == 0, because of the clamping.
But in the public weights, you get most of w (70%-90%) < 0.1 but >0 as you showed in your paper.
Could you let me know whether the public weights were trained with published code, that w is not squared and clamped?

@richzhang
Copy link
Owner

richzhang commented Aug 4, 2023

Hmm, the weights should match the code. Many of the weights are 0 in the official model as well (see Fig 10 in the paper: https://arxiv.org/pdf/1801.03924.pdf), depending on the variant

@fukka
Copy link

fukka commented Aug 4, 2023

Yes. Thank you. I checked the official model, most weights are close to 0 (<0.1) but not actually 0. Mine re-trained model has most weights equal to 0. I will debug more. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants