Project description

AdaBelief Slim

This repository contains the code for the adabelief-slim Python package, from which you can use a Pytorch implementation of the AdaBelief optimizer.

Installation

Using Python 3.6 or higher:

pip install adabelief-slim

Usage

from adabelief import AdaBelief

model = ...
kwargs = ...

optimizer = AdaBelief(model.parameters(), **kwargs)

The following hyperparameters can be passed as keyword arguments:

lr: learning rate (default: 1e-3)
betas: 2-tuple of coefficients used for computing the running averages of the gradient and its "variance" (see paper) (default: (0.9, 0.999))
eps: term added to the denominator to improve numerical stability (default: 1e-8)
weight_decay: weight decay coefficient (default: 1e-2)
amsgrad: whether to use the AMSGrad variant of the algorithm (default: False)
rectify: whether to use the RAdam variant of the algorithm (default: False)
weight_decouple: whether to use the AdamW variant of this algorithm (default: True)

Be aware that the AMSGrad and RAdam variants can't be used simultaneously.

Motivation

As you're probably aware, one of the paper's main authors (Juntang Zhuang) released his code in this repository, which is used to maintain the adabelief_pytorch package. Thus, you may be wondering why this repository exists, and how it differs with his. The reason is actually pretty simple: the author made some decisions regarding his code which made it an unsuitable option for me. While it wasn't the only thing that bugged me, my main issue was with adding unnecessary packages as dependencies.

Regarding differences, the main ones are:

I removed the fixed_decay option, as the author's experiments showed it wasn't great
I removed the degenerate_to_sgd option, as the author copied the RAdam codebase, but it seems recommended to always use it
I removed all logging related features, along with the print_change_log option
I removed all code specific to older version of Pytorch (I think all versions above 1.4 should work), as I don't care for them
I changed the flow of the code to be closer to the official implementation of AdamW
I removed all usage of the .data property as it isn't recommended, and can be avoided with the torch.no_grad decorator
I moved the code specific to AMSGrad so that it isn't executed if the RAdam variant is selected
I added an exception if both RAdam and AMSGrad are selected, as they can't both be used (in the official repository RAdam is used if both RAdam and AMSGrad are selected)
I removed half-precision support, as I don't care for it

References

Codebases

Papers

Adam: A Method for Stochastic Optimization: proposed Adam
Decoupled Weight Decay Regularization: proposed AdamW
On the Convergence of Adam and Beyond: proposed AMSGrad
On the Variance of the Adaptive Learning Rate and Beyond: proposed RAdam
AdaBelief Optimizer, adapting stepsizes by the belief in observed gradients: proposed AdaBelief

License

MIT

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.1

Feb 26, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adabelief-slim-0.0.1.tar.gz (5.3 kB view hashes)

Uploaded Feb 26, 2021 Source

Built Distribution

adabelief_slim-0.0.1-py3-none-any.whl (6.5 kB view hashes)

Uploaded Feb 26, 2021 Python 3

Hashes for adabelief-slim-0.0.1.tar.gz

Hashes for adabelief-slim-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`d3656e7f0c82b29bee3857daa76717fbbe2925c5239ff968bc9a75cab4adf2d4`
MD5	`2a43fb5e27c6b74dfa058809be77477e`
BLAKE2b-256	`ddc763943d3f63ee6084419a2a74c2fb87daf8ff828a880413ffd4b10f82bc49`

Hashes for adabelief_slim-0.0.1-py3-none-any.whl

Hashes for adabelief_slim-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`edd6b3bab3b3461d3ad8bde99a52df404ecb1f1eee7d86c58597f7974ef4c5b7`
MD5	`a537e1d634bb053771cdcf03a6533871`
BLAKE2b-256	`ef1ae0e08f2d6467bc255295a032745fdbdce45beab1d32e489dd6a7762afcc5`