An overview of gradient descent optimization algorithms

SGD(self, lr=0.01, momentum=0.0, decay=0.0, nesterov=False, *args, **kwargs)

Docstring:
Stochastic gradient descent, with support for momentum,
learning rate decay, and Nesterov momentum.

# Arguments
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter updates momentum.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.

 

Very good material:

http://sebastianruder.com/optimizing-gradient-descent/index.html#gradientdescentvariants

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s