An overview of gradient descent optimization algorithms

SGD(self, lr=0.01, momentum=0.0, decay=0.0, nesterov=False, *args, **kwargs)

Stochastic gradient descent, with support for momentum,
learning rate decay, and Nesterov momentum.

# Arguments
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter updates momentum.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.


Very good material:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s