Expert Networks
Collection of N parallel sub-networks (experts), each specializing in a distinct subset of the input space; in the Transformer context, experts are typically FFN networks.
A set of N parallel sub-networks, each independently parameterized. In the Transformer context, experts are typically feed-forward networks (FFN) with identical architecture but separate weight matrices. Each expert learns to specialize on a different subset of the input distribution as a result of competitive routing.