As part of our recent work on memory layer architectures, I wrote up some of my thoughts on the continual learning problem broadly: Blog post: Some of the exposition goes beyond mem layers, so I thought it'd be useful to highlight separately: