The Effect of Activation Functions On Superposition in Toy Models
An in-depth exploration of how different activation functions influence superposition in neural networks.
An in-depth exploration of how different activation functions influence superposition in neural networks.
This paper presents a method for identifying label errors in natural language processing (NLP) datasets using the T5 model.
In this paper, we apply variations of Deep Q-learning (DQN) and Proximal Policy Optimization (PPO) to learn the game of heads-up no-limit Texas Hold’em.
We propose a new framework which we call GRUNet. GRUNet is a novel Bi-Directional RNN architecture that combines GRU + RNN + ResNet to effectively combine text and im- age input to answer VQA questions.