FootNotes:

Numbered notes were present in the submitted manuscript.
Starred notes were not present in the submitted manuscript.

1. D is standardized as an effect size (d) by dividing the difference in means by the pooled standard deviation, to control for differences in variation in the studies. Average d variables are computed using the inverse of the sampling variance as weights (Hedges & Olkin, 1985).

2. 1500ms Stimulus Onset Asynchrony condition only

3. Specifically, valence ratings in Williams' et al.'s (1998) experiment had the following means for positivity and negativity when rated on a scale of 1 (not emotional) to 5 (very emotional): positive=(3.62, 1.13), negative=(1.49, 3.45), neutral=(2.36. 1.28).

4. The network was designed in concert with collection of data for Siegle's (1996) thesis. Thus, reaction time estimates for nonpersonally relevant negative, positive, and neutral information were, to some extent, informed by observed data, and should not be thought of entirely as predictions. Other network behaviors are more strictly predictions.

5. This finding is not preserved using a Hebb learning rule. Since the Hebb rule does not seek to minimize errors in decision making, previous training does not affect the network's response to current stimuli. Rather, biases would be expected to be exaggerated in ruminative copers based on a Hebb-training model.

6. With no forgetting function, the same pattern emerges for late simulated dilations, but there is no decrease in simulated early dilations.

7. One of three participants tested by a research assistant rather than the primary investigator was rumored to have been tested on the same day as his words were collected, but this report was not verified.

8. The warned reaction time task and gaze task were added after the protocol had begun. The 36 participants who took these tasks were the last 36 participants run on the protocol. These tasks were performed at the end of the testing session and thus should not have confounded results obtained during the rest of the testing session.

9. This differential elimination did not generally affect interpretation of subsequent analyses a great deal. The effects of the differential elimination are explored in detail in exploratory analyses contained in Siegle (1999b).

10. D was used rather than the more traditional d' because most people made relatively few errors. When an individual makes no errors, calculating d' involves finite approximations for infinite Z scores. As d' is effectively a normalized version of D (calculated as z(sensitivity)+z(1-specificity)), D should provide the same information that d' would.

11. .74 without assuming non-zero population error rates

12. By analyzing each factor separately, the assumption is made that factors from the PCA represent qualitatively different independent processes, i.e., rumination involves different processes, and potentially different brain areas, than cognitive or motor processes. This analysis does not preclude the possibility that these different processes arise as a function of a single distributed system, such as the neural network model. Rather, just as different nodes in the neural network model were observed to become active at different but overlapping times, different brain processes are assumed to be active at different, possibly overlapping times.

13. These results are pursued in exploratory analyses included on the web site accompanying this dissertation in Siegle (1999b).

14. Follow-up analysis of this claim is contained in the associated web site.

15. Again, this conclusion is not dependent on empirical results showing differences in depressed people's late pupil dilation in response to personally relevant and nonrelevant information. Such late processing is assumed to occur after any information has been associated with personally relevant information, and represents individuals thinking about personally relevant information independent of the content of the original stimulus.

16. Due to their smaller network, Siegle and Ingram (1997a) used 70 epochs of overtraining.

* Many sources (e.g., Coyne, 1994; Coyne & Gotlib, 1983; Gotlib, 1984) have questioned whether results obtained using dysphoric college students may be generalized to clinically depressed individuals. They suggest that dysphoric college students experience many stressors that are not representative of most depressed individuals, such as constant evaluation by superiors, academic problems, changing social relationships, the adjustment to independent living, recent separation from parents, and the transition to adulthood (e.g., Kashani & Priesmeyer, 1983). More generally, Coyne (1994) cautions that analog populations often do not capture the whole picture of a disorder, and thus, arguments based on a sample do not necessarily generalize to the disorder. As an example, Coyne states that poverty often causes distress, but rarely depression. As such, arguments made regarding the relationship of income levels to distress in an analog population may not generalize to the greater depressed population. In addition, the neuroanatomy of college students (specifically development of connections between the cortex and limbic structures which are thought to be important to depression) is undergoing fundamental changes irrespective of their newfound educational status (Benes, 1989). Thus, to be sure that Siegle et al.'s (1998) results hold not only for dysphoric, but for depressed individuals it will be important to examine results of the tasks in a population clinically depressed and nondepressed individuals who are not in college. This strategy is adopted in the experiment described in the following sections.

* Scaling was done by scaling positivity values to 3.62...

* The second session was, in all but three cases conducted by the author. In the remaining cases, the second session was conducted by a research assistant who had received extensive training.

* That is, because the standard deviation around neutral words was higher than for other valences, depressed individuals did not display a significantly larger discrepancy in response times to negative and neutral words than did non-depressed individuals.

* These results use person-100cut-harmmean-rescaled data set available from the author

* Similarly, the larger biases towards positivity and neutrality in nondepressed individuals are consistent with the network’s predictions.

* Planned contrasts assuming the factors represent a continuous process were also performed. Rather than representing qualitatively different processes, the extracted pupil dilation components may be thought of as indexing a continuous phenomenon, each component occuring at approximately the same temporal offset from the previous component.. To test this hypothesis, contrasts were examined from ANOVAs with valence (positive, neutral, negative) and factors believed to represent aspects of attention and information processing after stimulus onset (Factor 1, 2, 3) on factor loadings. Tests of the linear trend in factor revealed that pupil dilations increased over time for depressed individuals, F(1,22)=4.66, p=.042, h2=.175, but decreased for nondepressed individuals, F(1,24)=6.62, p=.017, h2=.216, on the valence identification task. The difference in the linear trend between depressed and nondepressed individuals was significant, based on a contrast from the same MANOVA in which group was included as a between subjects variable, F(1,46)=11.04, p=.002, h2=19.2. To examine the idea that depressed individuals would have high early and late dilations to personally relevant negative words, in contrast to their generally low early dilations, Findings were less strong, but similar for the lexical decision task data for trials in which nonmatching valence ratings were excluded. For depressed individuals, the linear trend showed a slight, but not significant increase, F(1,21)=2.09, p=.163, h2=.09. Because the first cognitive factor (Factor 2) was higher than the late ruminative factor (Factor 1), a quadratic trend was more strongly represented, F(1,21)=4.5, p=.043, h2=.178. A slight decrease in dilation over time, of relatively similar magnitude, was observed in the nondepressed group, F(1,24)=2.58, p=.121, h2=.097. As with the valence-identification task, the difference in linear trends between depressed and nondepressed individuals was significant, F(1,45)=4.5, p=.019, h2=.092.

* Specifically, the possibilities, based on the physiological model, were that a) they could think about nonemotional aspects of negative things, b) they could think about emotional aspects of negative things, and c) thinking about emotional and nonemotional aspects of negative things could be thought of as interacting.

* Exploratory analyses reported on the associated web site suggest that depressed individuals’ attention to personally relevant negative stimuli was sustained, while nondepressed individuals did not sustain attention to these stimuli. This result occurred during both tasks. Surprisingly nondepressed people showed greater differentiation in responses to stimuli, displaying greater cognitive activity during the early stages of attention, and paying greater late attention to positive information.

* Indeed, analysis of the rating data revealed that depressed individuals reliably suggest that negative words are negative, whereas they are not as likely to categorize positive or neutral words consistently. Similarly, analysis of response biases also suggested that depressed individuals appear biased to label all types of stimuli as negative. These findings are consistent with the general results from the neural network model in which all stimuli tend to be rated as more negative, and most often labeled as negative by the overtrained network. Moreover, depressed individuals seemed particularly prone to rate words words they had generated to be positive and personally relevant as negative or neutral. This finding further suggests that depressed individuals have a difficult time seeing positivity, even when stimuli are relevant to them. Results analyzing reaction times on the valence identification task were similarly consistent with predictions derived from the neural network model. As predicted, depressed individuals are indeed slow to say that positive words are positive, and are quick to say that negative words are negative. This finding suggests that depressed individuals have an easier time processing negative than positive information.

* The network had learned to associate a particular stimulus with a negative valence more strongly than it had learned any other association. Thus, connections in the network to its representation of the negative valence, and to that stimulus were stronger than other connections. When feedback occurred within the network, these bits of information were thus likely became activated no matter what the original stimulus was. The network’s initial responses were thus most related to stimulus when it was the personally relevant negative stimulus on which the network was overtrained.

* An argument against this explanation is that various tasks that do not nominally assess semantic processing, such as the Stroop task, in which individuals are asked to name the color in which words are presented, often reveal effects of interference from the semantic content of stimuli (e.g., Williams et al., 1996). As such, even if the task could be done without conscious semantic processing, it is likely that individuals are interpreting the semantic meaning of stimuli. Still, they may not do so until after their reaction time.

* For example, the analog of a healthy individual was created by training the model equally on positive, negative, and neutral exemplars based on the assumption that nondepressed individuals have relatively equivalent numbers of positive, negative, and neutral experiences, while depressed people have more negative thoughts or experiences. An alternate approach consistent with Schwartz and Garamoni’s (1989) States of Mind (SOM) model might suggest the analog of nondepressed individuals should involve overtraining on positive information (Siegle, 1996; Park, 1998) to represent their hypothesized greater numbers of positive than negative congitions. An analog of depression would involve subsequently overtraining the model on negative exemplars enough to disrupt the ratio of positive to negative training examples. Such a network would be initially biased to respond to positive information. As it is overtrained on negative information, the network would begin to respond more evenly to positive and negative exemplars. The result would be a model in which the non-overtrained analog of "normal" functioning would respond differently to different valences, potentially paying particular attention to positive information. Depending on the level of overtraining, the overtrained network, might respond more evenly to various valences on some tasks. This explanation is not wholy satisfying however, given that depressed individuals were biased in responding to the valence identification task in the expected manner. Moreover the initial training on non-orthogonal valences done with the current model served to generate connection weights similar to some overtraining on positive information, using an orthogonal valence representation. Extensive formal modeling would be necessary to establish whether this hypothesis could explain the obtained results.

* The following experiment with the simulated neural network shows that overtraining can be largely reversed by retraining. The associated figure shows the network’s response to a positive stimulus, along with connection weights between the semantic and valence layers before overtraining, and after overtraining. The network is then retrained on one positive, and one neutral exemplar for two and then for five epochs. The conventions for the subfigures on the left follow those described for Figure 5. As shown in the figure, with more retraining, the network’s valence activation, match accumulation, and simulated pupil dilation curves for the valence identification task look increasingly like they did before the overtraining. In the semantic nodes, it can be seen that the retrained network responds to the presented stimulus by activation of the two new stimuli on which it was overtrained. As shown in the Hinton diagrams on the right side of the figure, the retrained network still inhibits positive information more than it had originally done so, but activation from the new personally relevant positive and neutral patterns allows competition from valence nodes representing positivity.
      Helping depressed people to relearn positive associations is thus expected to lead people think more positively, even when negative cognitions are not challenged. The trick will be to make positive cognitions "stick" for depressed people in the same way that negative cognitions do. The more a depressed person associates incoming information with learned negative exemplars, the less likely a positive exemplar is to be learned, as such. Siegle (1996; Siegle & Ingram, 1997) have shown that the amount of feedback occurring between the affective and semantic representations of information in the brain govern how likely information is to be turned negative. Thus, it is suggested that ruminative response styles be targeted in therapy before positive retraining is engaged in. Rehearsal schedules and other traditional methods of behavioral reinforcement may also be of use in this respect.
      In terms of pharmacologic interventions, it was noted that the primary function of depressive overtraining was to increase inhibition of cognitions that are not personally relevant and negative. This analysis suggests that a pharmacologic agent that could block inhibition in the amygdala and hippocampal systems might be useful in the remediation of depression. Park (1998, unpublished) presents converging evidence suggesting that seratonergic pathways stemming from the median raphe may serve a primarily inhibitory function, and may thus be candidates for pharmacologic intervention. Additionally, because biases are hypothesized to occur as a result of inhibitory feedback between the hippocampus and amygdala systems, drugs targeting either of these structures could break the cycle. If later research shows that certain depressed individuals attend primarily to the affective or semantic aspects of information, drugs specifically targeting one or the other of these structures could be considered.