Project Suggestions

Parallel DNN training: There has been a lot of progress in the last year on parallelizing image-processing deep Nets using rapid model updates at the minibatch rate. But text or event based networks may have features dimensions in the millions or more, and model sizes of many Gigabytes. Its impossible to directly update those models at close to minibatch rates, and this significantly huts performance. But it should be possible to use power-law structure to improve updates...

Parallel MCMC methods: SGD is a sequential method with poor scaling. MCMC often have more natural parallelism (i.e. independence between particles) and can be tuned to optimally explore/exploit parameter space with a reasonable amount of communication. Most DNNs are not generative so their parameter distribution is artificial, but simple designs (e.g. additive gaussian noise on parameters) seem to work well. There is plenty of opportunity to explore in this space, see the readings in the Martino et al. paper.

Accelerating Reinforcement Learning: There are at least two major sub-problems here: Models trained directly from recorded data, and those trained using a simulator. For the former, the challenges seem very similar to those of scaling up DNNs. For the latter, the challenge is likely to be in increasing throughput from the simulator, which is typically the bottleneck. Ideas include tiling the state space (already happens in many computer games) to simulate many trajectories going that part of space efficiently. It may even be possible to do this in a somewhat simulator-agnostic way with a smart "scheduler" that clusters trajectory segments and data objects based on access.

Optimizing Hyperparameter trajectories: State-of-the-art models include many hyperparameters such as learning rate, temperature, momentum, minibatch size,... Optimal trajectories for these parameters are not constant, but time varying to perform annealing or layer-wise training. Bayesian optimization from black-box data requires full dataset training and is probably too costly for this. Reinforcement learning, or direct training of DNNs to set hyperparameters (Gu et al. paper) can use online model accuracy estimates during training and are likely to do much better.

Filtering Training Data from Massive Streams: Emerging data streams from vehicles, drones, IoT devices etc. are far to large to train on directly. Filtering the most important data for training can provide most of the benefits from these data without the full cost. The solution could resemble current attention models, which focus recognition resources on parts of a dataset to improve recognition.