Activation in the network is assumed to feed forward from the input nodes through a set of ``generalization'' or hidden nodes to activate nodes representing a set of semantic features. The use of generalization nodes is a common technique for representing generalization processes assumed to occur between the orthographic and semantic representations of a stimulus (e.g., Seidenberg, 1993). Functionally, these nodes allow the network to represent key similarities between features not having to do with affect or semantic content, but relevant to the determination of these quantities. The semantic units are the neuron-like elements assumed to store the semantic qualities of words.
Tucker and Derryberry's (1992) notion of feedback between brain structures responsible for making affective and semantic determinations is captured in the network by allowing activation to spread from the semantic features to two neuron-like elements representing the affective features positivity and negativity, which again feed back to the semantic units. The decision is made to allow minimal semantic recognition before affective recognition is engaged in (i.e., activation passes first to the semantic nodes, and then to affective identification nodes). 12
The amount which affective determinations are allowed to affect
semantic determinations may be adjusted by making the activation in
the semantic units a logistic function of both their own
value from the previous time period, and the value suggested by the
valence units. An arbitrary parameter
governs how much
activation comes from the semantic and how much from the valence units
in the network. In initial simulations this parameter was left out,
but it was determined that the network did not converge during
training when the activation of the semantic layer was entirely due to
the activation from the valence layer. Most values less than unity
allowed the network to converge; for the simulations described below
was arbitrarily set to 0.5. The inclusion of such a parameter
appears reasonable as a stimulus presented to one's eyes would be
expected to have more effect, if slightly, on the determination of the
semantic content of that stimulus, than one's notion of what the
stimulus is based solely on its affective valence.
Importantly, considerations in representing affective information in the network abound, and, in fact, lead to different predictions regarding the role of affect in depression. For example, representing affect with a single node, where a value for that node of 0 is positive and 1 is negative implicitly assumes that negativity is the opposite of positivity. Various studies (e.g., Ingram, 1984) show that negative and positive mood are probably not exactly orthogonal, and moreover, that they may co-exist. Similarly, various words, e.g., ``family'' seem to have positive and negative connotations at the same time. We might thus be tempted to use two nodes to represent affect, one for positivity and one for negativity. Neutrality could then be coded as the absence of positivity or negativity. Another method for representing affect would be to have three nodes, one for positivity, one for negativity, and one for neutrality. Yet, the explicit identification of a stimulus as neutral does not appear to be a function of the amygdala, and thus, a such a solution may not be appropriate. More complex representational schemes such as those proposed by Tryon (1994) give even greater representational power, e.g., for representing nuances of affect other than positivity and negativity. In the current network, this added representational power is exchanged for the relative simplicity of the two node solution.
In the current network, affective valence is represented as two nodes. When a stimulus is negative, one of the nodes is positively activated. When a stimulus is positive, the other node is activated. When a stimulus is neutral, neither node is activated. Thus, combinations of positivity and negativity may be represented as some activation on both the positive and negative nodes. An important question now arises as to the proper magnitude of positivity or negativity of negative stimuli. If the valence 2-vector representing positivity and negativity are created of the same magnitude, a representation is set up in which positivity and negativity are equally far from neutrality in the vector space representing the three valences, and they are farther from each other than they are from neutrality. Were this the case for humans one might expect the confuseability matrices for the valence identification task presented earlier to be symmetric. Since they were not, one solution might be to adjust the magnitude for the valence vectors such that positivity was closer to neutrality in valence-space than negativity. Yet, this technique would suggest that the observed confuseability represents an innate representational system. As it is not clear that such an innate representation exists, and as the observed confusabilities were very low such a technique was not used, and positivity and negativity were represented as having equal magnitudes. Still, not addressing this issue is a weakness of the model.
Representational considerations also abound in deciding what the network's measured ``outputs'' should be. If the network is to be designed to perform some task explicitly, the network's outputs can be constructed to represent the output from the task, e.g., the color of the word which was given to the network as a stimulus. This procedure has the advantage of allowing me to train the network by giving it analogs of the stimuli a human might see with outputs a human might give. Yet, a popular criticism of tasks designed to test information processing models is that they allow inspection of only the results of the task at hand, and not the underlying psychological constructs. Given that the network is being designed to investigate the role of affect in the lexical decision task, it seems prudent to allow affect to be represented in the network's outputs. That way, network's performance on the task, as well as information regarding it's evaluation of the stimulus would be apparent. For example, we could either have output nodes for ``Yes it is a word'' and ``No it isn't a word'', or outputs which represent the network's possible semantic and affective evaluations of words as well as somehow determining it's decision of whether the stimulus is a word. This latter method will also allow explicitly modeling of the valence identification task, by looking at the nodes representing the affective valence of the stimulus, and a recall/recognition task, by looking at the entire output as a reminding or word. Because groups of nodes in the network are loosely functionally modeled after brain structures, a parsimonious method for examining the network's outputs is to examine activations in the groups of nodes responsible for representing affective and semantic information. This technique will also allow explicit physiological investigation of activation in brain structures in response to information processing tasks when that technology becomes available.
These considerations give rise to a vision of the network presented in Figure 10, p. 98. It consists of a set of 18 orthographic units representing the initial perception of an incoming stimulus. The number 18 does not represent any biological or cognitive quantity, but is large enough to allow a great variety (218) of nonorthogonally binary coded words to be represented. These orthographic units are connected to 12 internal generalization units which provide for generalization from the input to more abstract, internal representations. The generalization units are then connected to 18 semantic outputs which activate 2 affective units which, in turn, reactivate the semantic units. Again, the particular numbers of units devoted to the generalization and semantic units have no clear analog to a biological system, and are intended only as representational conveniences for the current simulations. The particular stimuli which were used were randomly generated and are included in Appendix E.
A final representational decision in the construction of the network involves what is to be measured about the network's performance. Because the tasks to be modeled are reaction time tasks, an analog of attention-dependent reaction time devised by Cohen, Dunbar, and McClelland (1990) to model the Stroop task, is used. Continuous processing in Cohen et al.'s (1990) model of the Stroop task is represented by making the activation of each node in the network a logistic function of its input over time. Thus, the longer a stimulus is attended to, the more the input nodes will be activated. Responses are generated in the following manner. Following Ratcliff's (1978) conjecture that retrieval from memory occurs as a diffusion process, ``evidence accumulators'' are kept for each possible output of the network (i.e., for each word in the network's lexicon, and for each valence: positive, negative, and neutral). As the network ``observes'' the input, outputs are continuously generated via spreading activation. After each such cycle an amount proportional to the network's activation of its corresponding output node is added to each counter, subject to gaussian noise. When an accumulator achieves some threshold, arbitrarily set to 1.0, the network is said to have generated that response.
An affective determination is made when an evidence accumulator corresponding to an affective valence reaches a threshold. Positive and negative accumulators are activated proportional to the activation of the affective nodes. A neutral accumulator is the negative average of the positive and negative activations. It may be noted that by changing the ratio of positive to negative activation contributing to the activation of neutrality, the network's confuseability may be changed to model human confuseability matrices on the valence identification task, though this step was not taken in the present simulation.
A semantic determination is said to occur when an evidence accumulator corresponding to a word reaches a threshold, arbitrarily set to 1. Evidence accumulators are incremented at each epoch proportional to the cosine of the semantic vector with the desired output, subject to gaussian noise. An interesting question now arises as to how the network can know when a stimulus, e.g., presented during the lexical decision task, is not a word. As shown in Table 2, p. 63, human ``No'' responses were, on average, a full standard deviation longer than ``Yes'' responses on the lexical decision task. To reflect the apparent determination that a stimulus is not a word, after determining that no word in a person's lexicon matches the stimulus, ``No'' responses are modeled via the inclusion of a temporal threshold, such that if no element of the semantic vector is activated beyond some value when the temporal threshold is reached, the network is said to decide that the stimulus is not a word. As people did not always respond to nonwords in exactly the same amount of time, noise is added to this threshold such that it varies between tasks.
Variations in Stimulus Duration (SD) are simulated by activating the orthographic units for a variable number of epochs. The evidence accumulators then continued to collect evidence using only activations provided by the affective-semantic recurrent loop until a lexical or affective determination was made, simulating association in memory in the absence of a stimulus.
The affective lexical decision task and affective valence identification task may be simulated rather easily using the preding framework, in the following manner. A stimulus and duration are chosen. The stimulus is presented to the network for the duration. For the affective valence identification task, the network engages in the affective semantic loop until an affective determination is made. For the affective lexical decision task, the network engages in the affective semantic loop until a semantic determination is made.