Friday, March 11, 2016

Training Deep NN and Theory of Sleep

It is just an idea, and probably somebody has already been working on it. I still feel the urge to share with you about how I think the Deep NN has inspired me on the evolutional purpose of dreaming, and how theory of dreaming can help us better train Deep NN. However, I must disclaim here, that the following discourse are purely speculative.
NN has been known as a kind of unsupervised learning. i.e., it learns the regularity of data without the help of human expert to label them. The NN then can be trained via ‘backprapagation’, during which the weights of the links, and the value (activity) of the nodes and be updated via optimization of ‘objective function’. Typically, cross entropy is used as objective function. The typical problem one usually confront when training the network is the gradient descent on the landscape of the objective function may fall into local minimum. Several techniques are widely used to help escape from suboptimum solution, such as annealing, adding momentum term, or adding stochastic factor into the training. Here I propose an idea that is inspired by damped oscillation pattern of sleep. The damped oscillation of sleeping stage consists four stages, each of which lasts from 10 to 20 minutes with slight variance depending on individual and various physiological factors. The oscillation is illustrated in the following graph.

Each stage are reported to be responsible for different purpose. The REM -> Stage 4 section can be think for as backward propagation, while the Stage 4 -> REM section represents the forward propagation. The shallow-deep sleep cycle help strengthen the memory and also help internalize the vast amount of transient episodes into deep, invariant knowledges. I will elaborate in the next paragraph. The picture below shows the famous LeNet proposed by Yann LeCun. The net resembles brain structure not only in terms of topology but how the information is processed. Each layer ‘pools’(samples) small patch of the input and forms a feature space that is going to be sampled by a super-layer.You can see the trained middle layers as filters of information. Finally, the every single nodes in the end-layer can distinguish objects exclusively. In broader sense, the deep NN can condense knowledge from a huge batch of unsorted data, be it image, sounds, or movements. The success of Deep NN sheds light on the long coveted universal theory that explains how might brain learns, predict, interpret, and create things. But still, we are very far from finding such a theory, for the anatomical counterparts and physiological mechanism that justifies the NN are not yet approved. Even more, I am neither saying that Deep NN can explains EVERYTHING of brain mechanism nor the universal theory exists.

In general it is agreed upon that sleep is essential for long term memory consolidation, and can enhance cognitive processing. The REM sleep, where we usually dream and can easily be awaken by external disturbance, seems to be playing a critical role of learning. It is instructive that during dreaming our mental activity is actually quite similar to the waking state, and the major difference is the mental process is isolated from body movement. By analyzing the texture of dream, we will find out , unsuprisingly in retrospect, that the function and form conforms each other. The structure of dreams reflects exactly how we stores invariances in the deep, hierarchical structure of filters of different functional level.
In the first cycle of sleep, we dive into deep sleep (stage 1 to stage 4) from aware state. This process correspond to the back propagating phase of LeNet (Input layer to end layer), where in each stage of sleep corresponds to optimization of weights of each layer of NN. It doesn’t mean that the physical counterpart of Deep NN must have four layers, because the definition of stage is artificial and only serve certain purpose. There are not real distinct boundaries between them, it is more like a gradual process. During this half-cycle, the optimization of weights cannot be too good, since it can easily fall into local minimum.
At the deep sleep, end-layer that interfaced with hormone signals produces by glands[1], and other subcortical bodies [2] received instruction and starts to send signal back to sub-layers. These end-layers represents emotional components, and inverse the previous process, starting to predict the possible outcome in the sub feature space given such emotions.Since the inverse problem does not exist a unique solution, we won’t see a playback of our input, but a physically and causally plausible theatre, in which a unpredictable, bizarre, but animated drama is on the show. When we see such a show during sleep, our other parts of brain are aroused, resulting awake-like brain state. The vivid experience are overall endogenous, but its constituents can be novel to each other perceptrons, as the hierarchical brain structure always branch out when goes downward, and the genesis of artifacts is nonlinear due to convoluted operation or quasi-randomized internal driving sources.

During the first REM stage we are exposed to fabricated scenarios that appears arbitrary and illogical, but are strongly emotional and animated. These scenarios are oftentimes consistent in sense of causality and physics, but their motifs and development are totally unpredictable. Take the visual recognition for example. We can generate a face by synthesize features according to the structure of NN, but no single line on of the synthesized image will have the same shape, same shade, and orientation. In a sense the dream provides artificial data let allows us to train the Deep NN again. The subsequent descent of the sleep stage allows optimization for the second time. This helps the NN escape from local optimum of objective function.
This cycle of forward modeling -backpropagation is repeated a few times but not to many for 1) the data novelty deprecate after each cycle. 2) over-fitting. 3) time to wake up. Due to the pyramid shaped hierarchical structure, deeper layers has progressively less perceptrons/ filters, and therefore more prone to over-fitting. This accounts for why the depth of each sleep cycle decrease in light of learning and memory. The lower the hierarchy, the more perceptrons to be trains, and therefore the shallower stages (REM, stage 1, stage 2) occupy the primary proportion in the later part of the whole sleep.
What is the benefit of interfacing the emotional layer with emotional related hormone sources ? It is postulated that the hormones are supervising signal that instruct our brain to learned scenarios that embeds negative emotions ( stress, anxiety, fear, and anger) so that we can handle social relations better. Learning to endure and even utilize our negative emotions during sleep prepares us better to handle frustration and unpredictable difficulties in real world, where most of time we live under certain pressure of survival. It equips us with a inherit sense of crisis, which helped our ancestors survived disasters that those with different mindsets did not.

[1]such as pineal gland, Pituitary gland
[2] like amygdala (emotional memory), mammillary body (recollective, or episodic memory), and hippocampus (spacial memory, STM, LTM).
[3] Other theories of sleep : Activation synthetic theory ; Continual -activation theory ; Reverse learning ; Dreams as excitations of long term memory;

No comments:

Post a Comment