Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam ShazeerAzalia MirhoseiniKrzysztof MaziarzAndy DavisQuoc LeGeoffrey HintonJeff Dean

   Papers with code   Abstract  PDF

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation... (read more)

Benchmarked Models

No benchmarked models yet. Click here to submit a model.