next up previous contents
Next: Induction of Negative Information Up: Experiment 2: A Neural Previous: Representational Issues

Training the Network

The network is ``trained'' to respond differently (i.e., take on different activations in the valence and semantic vectors) to different stimuli by exposing it to a series of stimuli. After being exposed to each stimulus, the network's activations are compared to the expected activations for the semantic and affective units. The strength of connections within the network and the strength of output of each unit (it's ``bias'') were modified at each iteration using a modified back-propagation (Rumelhart, Hinton, and Williams, 1986) algorithm which minimizes the errors the network makes in associating an ``input'' with an ``output'', successively, over time, allowing the network to better associate the input with the output. Formally, back propagation reduces error, e.g., between an input and output layer, proportional to $\eta*\delta_{output}^T*Input+\alpha*\delta_{InputOutput}$ where $\eta$ is a constant representing how fast the network is allowed to learn, $\delta_{output}$ is the discrepancy between what the network was supposed to respond and what it did respond, $\alpha$is the network's ``momentum'' or how much it changes at one epoch affects how much it changes at the next, and $\delta_{InputOutput}$ is the matrix representing how much the connections between the Input and Output layers were changed on the previous iteration.

While many have argued that back-propagation itself is not biologically plausible, no claim is being made in this thesis that learning actually takes place via a back-propagation algorithm. Rather, it is only assumed humans change their associations with stimuli based on their experiences, and that more association to some stimulus means that it is better learned. Such assumptions are at the heart of many schools of psychology including behaviorism, structuralism, and most schools of cognitive science.

For the current model, the back propagation algorithm was modified in two important ways. First, because the network generated both semantic and affective outputs, error was minimized with respect to target values in each layer. The total error for a given layer was said to be the sum of the mean squared error (MSE) for the valence nodes and the semantic nodes. As there are nine times as many semantic nodes as valence nodes, the error for the valence nodes is thus corrected much more quickly than that for the semantic nodes. Differential error weighting schemes were explored but did not prove ultimately useful, as observed biases were either not replicated or did not appear different from those found in the unweighted case, e.g., when the total error was considered to be (SSvalenceerr+SSsemanticerr)/(2*(SemNodes+ValNodes)). the network took longer to learn but did not noticeably increase its performance.

Second, because of the diffusion into the network, and because of the recurrent loop between the affective and semantic nodes, some technique had to be used for deciding when, after exposure to a stimulus, error would be adjusted. It was arbitrarily decided that activation from the stimulus, through the affective semantic loop, 10 times during a single training epochs. Weights were adjusted proportional the error at the last activation, rather than proportional to the average error over each activation as is done in standard back propagation. This technique was used to represent the learning of a stimulus after exposure to it.


next up previous contents
Next: Induction of Negative Information Up: Experiment 2: A Neural Previous: Representational Issues
Greg Siegle
1999-11-15