Tensorflow Gradients is NAN

(from stack overflow)

https://stackoverflow.com/questions/41918795/minimize-a-function-of-one-variable-in-tensorflow

Many of the other solutions use clipping to avoid an undefined

gradient. Depending on your problem, clipping introduces bias and may

not be acceptable in all cases. As the following code demonstrates, we

need only handle the point of discontinuity--not the region near it.

Specific Answer

def cross_entropy(x, y, axis=-1):

  safe_y = tf.where(tf.equal(x, 0.), tf.ones_like(y), y)

  return -tf.reduce_sum(x * tf.log(safe_y), axis)

def entropy(x, axis=-1):

  return cross_entropy(x, x, axis)

But did it work?

x = tf.constant([0.1, 0.2, 0., 0.7])

e = entropy(x)

# ==> 0.80181855

g = tf.gradients(e, x)[0]

# ==> array([1.30258512,  0.60943794, 0., -0.64332503], dtype=float32)  Yay! No NaN.

(Note: deleteddup cross-post.)

General Recipe

Use an innertf.whereto ensure the function has no asymptote.That is, alter the input to the inf generating function such that no inf can be created.Then use a secondtf.whereto always select the valid code-path.That is, implement the mathematical condition as you would "normally", i.e., the "naive" implementation.

In Python code, the recipe is:

Instead of this:

tf.where(x_ok, f(x), safe_f(x))

Do this:

safe_x = tf.where(x_ok, x, safe_x)

tf.where(x_ok, f(safe_x), safe_f(x))

Example

Suppose you wish to compute:

f(x) = { 1/x, x!=0

      { 0,  x=0

A naive implementation results in NaNs in the gradient, i.e.,

def f(x):

  x_ok = tf.not_equal(x, 0.)

  f = lambda x: 1. / x

  safe_f = tf.zeros_like

  return tf.where(x_ok, f(x), safe_f(x))

Does it work?

x = tf.constant([-1., 0, 1])

tf.gradients(f(x), x)[0].eval()

# ==> array([ -1.,  nan,  -1.], dtype=float32)

#  ...bah! We have a NaN at the asymptote despite not having

# an asymptote in the non-differentiated result.

The basic pattern for avoiding NaN gradients when usingtf.whereis to calltf.wheretwice.  The innermosttf.whereensures that the resultf(x)is always finite. The outermosttf.whereensures the correct result is chosen.  For the running example, the trick plays out like this:

def safe_f(x):

  x_ok = tf.not_equal(x, 0.)

  f = lambda x: 1. / x

  safe_f = tf.zeros_like

  safe_x = tf.where(x_ok, x, tf.ones_like(x))

  return tf.where(x_ok, f(safe_x), safe_f(x))

But did it work?

x = tf.constant([-1., 0, 1])

tf.gradients(safe_f(x), x)[0].eval()

# ==> array([-1.,  0., -1.], dtype=float32)

# ...yay! double-where trick worked. Notice that the gradient

# is now a constant at the asymptote (as opposed to being NaN).

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容