Next: The role of noise
Up: Results of Simulations
Previous: Simulating the Lack of
An objection to many neural network models is that there are so many
parameters that changes to any one of them could create the desired
behaviors, giving the chosen model low construct validity. As such,
many values were tried for each of the network parameters with the
following qualitative results.
- Network construction
Number of hidden units - changing the number of hidden units
changed the network's ability to learn stimuli, and whether the
network evidenced particular information processing biases, in some
cases. When very few hidden units were used, the network did not
converge during training. When hidden units, equal to or greater than
the number of inputs were used, information processing biases were
reduced, presumably because the network attained a very precise
conception of inputs as they occurred.
Bias terms - Because the network's information processing biases were
reflected primarily in the bias terms of nodes associated with
affective valence, the same simulations were run on a network in which
bias terms were not learned. The desired effects by and large
disappeared, suggesting that affective biases are, indeed a product of
distinct affective states rather than a product of differential spread
of activation. Still, experiments using various earlier versions of
the network suggested that if the learning rate and noise are appropriately
decreased, and parameters such as
are systematically adjusted,
the expected biases may return.
- Activation parameters
(input rate) - decreasing tau increased the network's
overall reaction times. It also increased the contribution of
activations from the affective-semantic loop thus increasing
information processing biases.
(effect of affective nodes on semantic nodes) -
changing beta did not affect the network's performance qualitatively.
maximum and minimum network activations - the network's
valence accumulation mechanism was dependent on a representation of
neutrality as a value equal in magnitude but in the opposite direction
to the network's representation of positivity or negativity. Thus, as
long as neither the minimum or maximum activation was set to zero,
these parameters were not important.
- Task parameters
temporal threshold for ``no'' decisions - changing the
temporal threshold affected the network's error rate and its
information processing biases. When the threshold for ``no'' decisions
was brought into the range of other responses, the network made more
incorrect rejections thus decreasing its sensitivity. Most
incorrect rejections were necessarily the items which would have taken
the network the longest to respond to (i.e., neutral words and
nondepressotypic negative words) and thus the network did not show the
expected information processing biases in this case. When
the threshold was too high, the network made many false alarms,
decreasing its sensitivity though this condition did not affect
observed information processing biases.
positive accumulation determination threshold - Decreasing the
positive threshold decreased the time which it took the network to
respond. Decreases in this threshold also increased the network's
sensitivity to noise when noise was large, since small variations in
noise could affect the network's performance more.
negative accumulation determination threshold - The evidence
accumulators generally achieved an asymptote in activation significantly
before a threshold value was achieved. Thus if the negative threshold
was too high, the network would never make ``no'' decisions. If the
negative threshold was too low all decisions would be ``no''. If the
threshold was near the asymptote, it's particular value did not appear to
influence the network's information processing biases.
- Learning parameters
and
(learning parameters) - Increasing the learning rate often
had the effect of allowing the network to learn the affective valence
of words well, while making more errors on the lexical decision task. If
the learning rate was increased sufficiently high, information
processing biases on the lexical decision task were obscured.
Activations in one epoch - The network did not appear
sensitive to the number of activations in each epoch, as long as there
were enough activations to allow the network's representation of the
stimulus to be sufficiently greater than the input noise. When this
was not the case, the network had difficulty learning.
- Training set
Number of stimuli - As long as all possible stimuli were
represented in the network's lexicon for the lexical decision task,
the network's performance did not appreciably change with the number
of stimuli.
Number of negative stimuli representing depressogenic loss -
The network was sensitive to the ratio of the number of depressotypic
stimuli to the number of stimuli in the network's lexicon. When this
number became high the network began to overgeneralize to
negativity. When this number was too low, the network was not biased
in processing other stimuli. If a large proportion of the network's lexicon is
trained on during the depression induction period, the network's
overall biases to nondepressotypic stimuli are dwarfed by its biases
towards depressotypic stimuli, as would be expected to occur for
humans.
Training exemplars - Variations in the training set considerably
disrupted the network's performance. If the semantic features were not
perfectly balanced within each valence, a given valence or semantic
determination would have undue popularity.
- Other aspects of the network
Method of evidence accumulation - When affective evidence
accumulators were calculated as the cosine of the network's response
with the desired output, the valence identification effects were not
present, potentially suggesting that determinations of the negativity
of a stimulus do not rest on the stimulus's lack of positive content,
and vice-versa.
The network's performance for the 80 epoch and
500 epoch conditions were qualitatively similar for the valence
identification task under a separate replication using 50 simulated
subjects. For the lexical decision task, the network was very slightly
slower at positive, negative, and neutral words, rather than just
negative words in the depressed condition. Yet, when average simulated
reaction times above 2 standard deviations from the mean of a condition, within each
group were removed, results were qualitatively similar to the
described simulations.
Next: The role of noise
Up: Results of Simulations
Previous: Simulating the Lack of
Greg Siegle
1999-11-15