4. The Evidence for Psi: Experimental Studies

Previous: 3. The Evidence for Psi: Spontaneous Phenomena

Up: Consciousness and the Physical World

4. The Evidence for Psi: Experimental Studies

As noted in the last chapter, most parapsychologists have adopted the view that spontaneous cases cannot provide a “clean proof” of the existence of psi due to the various possible skeptical explanations of these cases, such as those invoking coincidence, delusion, unconscious inference and fraud. Therefore, they have turned to experimental approaches to establish the reality of psi effects. In the early days of parapsychology, such experiments typically involved attempts by human subjects to use their powers of extrasensory perception to discern the identity of a playing card hidden from their view or to use their psychokinetic abilities to influence the fall of mechanically thrown dice. When a subject is guessing a randomly selected card held in a separate room, the problems of sensory cues and unconscious inference are presumably removed. If contemporaneous records of the experiments are made, one need not rely on the fallible memory (or deceiving testimony) of the people involved, save of course for the experimenters themselves. Thus, the problems of memory distortion or fraudulent testimony on the part of informants in spontaneous cases are likewise eliminated when the experimental approach is adopted.

Perhaps the chief benefit of the experimental approach is its ability to deal with the objection that apparent cases of psi are simply due to coincidence. For instance, suppose an experiment is run in which a subject tries to guess in advance the outcomes of a series of tosses of an unbiased coin. If the subject guesses the outcomes of four tosses, his probability of getting them all right by chance may be easily computed. If a coin is tossed four times, one of the following events must occur:

HHHH	HHHT	HHTH	HTHH	THHH	HHTT	HTHT	THHT
HTTH	THTH	TTHH	HTTT	THTT	TTHT	TTTH	TTTT

(Here HHTH, for instance, denotes the event in which the coin comes up heads on the first, second and fourth tosses, while the third toss comes up tails.) As these events are all equally likely, the probability that the toss outcomes correspond exactly to the subject’s sequence of guesses is 1 in 16. In other words, the subject would be expected to obtain perfect success through sheer luck in about one out of every sixteen experiments. Similarly, if a subject guesses twenty-five flips of a coin and gets twenty of them right, standard statistical formulas may be applied to determine that the probability of guessing twenty or more flips correctly by chance is approximately .002. In other words, one would expect this level of success to occur in only two out of every thousand experiments by chance. As it would be unlikely that this level of success could be achieved through sheer luck (that is, in the absence of ESP), one would take the step of rejecting the “null hypothesis” (that the results can be ascribed to chance) and state that the results are significant at the .002 level, meaning that the probability of attaining such an extreme score by chance would be less than .002 in the absence of ESP. Similarly, if a person attempts to guess the order of the cards in a well-shuffled and hidden deck of ESP cards and guesses 13 of the cards correctly (as opposed to the five she would be expected to get right on the average by chance), the mathematical theory of probability can be invoked to show that this would happen in fewer than 1 in 10,000 such experiments by chance. We would conclude that it is very unlikely that we would have obtained such a result by chance unless we were to run thousands of such experiments.

The philosopher Francis Bacon was perhaps the first on record to suggest that psi phenomena could be investigated through the statistical analysis of card-guessing and dice-throwing experiments (Bell, 1956). Charles Richet of France (1884, 1888) was the first to initiate anything approaching an actual research program in this area, using card-guessing as a technique for investigating ESP. In the early part of the twentieth century, experimental studies of ESP involving the guessing of cards were performed by Leonard Troland and George Estabrooks at Harvard University and J. E. Coover at Stanford University (Troland, 1917; Coover, 1917; Estabrooks, 1927). Estabrooks’ very successful experiment was conducted while completing his doctorate under William McDougall, a prominent psychologist who had an interest in psychical research.

In 1927, McDougall moved to Duke University to assume the chairmanship of the psychology department. He was followed soon thereafter by an enthusiastic young psychical researcher, J. B. Rhine, and his wife, Louisa. During the academic year 1929-1930, Rhine began his program of experimental research on psi phenomena. This program eventually evolved into the sustained and continuous research tradition that has become known as experimental parapsychology. For this reason, Rhine is usually regarded as the founder of the field of parapsychology (in the sense of the experimental study of psi phenomena). Rhine in fact was responsible for the adoption of the name “parapsychology” to describe his field of inquiry, although it should be noted that Max Dessoir (1889) in Germany was the first to use the term “parapsychologie” to describe the investigation of the “border” region between normal and abnormal psychological states. Rhine can, however, lay sole claim to coining the term “extrasensory perception,” or ESP, to describe the receptive form of psi (as opposed to psychokinesis, the alleged ability of mind to influence matter directly and without involvement of the motor apparatus of the body).

Rhine’s initial methods for investigating ESP relied heavily on the standard “ESP cards,” which were designed for Rhine by the Duke perceptual psychologist Karl Zener. (This deck was known for a long time as the “Zener deck,” somewhat to the consternation of Zener, who later abandoned parapsychological research for work in more mainstream and less controversial areas of psychology.) The ESP deck consists of 25 cards, with five cards representing each of the following five symbols: circle, star, cross, square and wavy lines. When a subject guesses the order of the cards in a well-shuffled ESP deck, he has a one-fifth chance of guessing any particular card correctly, and it can be shown mathematically that the average score he would expect to achieve by chance is five correct guesses.

In 1934, at the suggestion of a young gambler, Rhine began to investigate psychokinesis (PK), using dice as target objects. Initially, Rhine investigated the ability of human subjects to influence dice to roll in such a way that a given “target” face would come up. Later, other investigators had subjects attempt to influence the direction or speed of mechanically thrown dice so that they come to rest at specific target locations. Such tests became known as “placement tests.” Because of the controversy surrounding his ESP results, Rhine withheld publication of his PK research until 1943.

In the modern era, the targets of psychokinetic influence have expanded to include living organisms, red blood cells, thermistors, and quantum-mechanically based random event generators (REGs). REGs have also been used to generate ESP targets. In ESP research, there has been a move in the direction away from forced-choice experiments (in which the subject’s response on each trial is restricted to a finite set of specified alternatives, as in guessing a deck of ESP cards) and toward free-response experiments, in which a subject is free to describe his impression of the target in any manner he chooses.

A free-response methodology is employed in modern ganzfeld experiments, in which a subject is typically seated in a comfortable chair with ping-pong balls placed over his eyes to produce a uniform visual field. Frequently, white or “pink” noise played in the subject’s ears to produce a homogeneous form of auditory stimulation as well.

The subject may then try to describe a target picture that is being viewed by a human sender or agent, this target having been randomly selected from a target pool consisting of, say, four potential target pictures. The subject or an outside judge then ranks the pictures in the target pool against the subject’s descriptions. Obviously, given the random nature of the target selection, the probability that the subject’s description will be matched against the correct target by chance is one-fourth.

Other examples of free-response experiments include remote viewing studies, in which a subject attempts to describe the location to which a human sender has been sent, and dream studies, in which a subject’s dream reports are matched against, say, art prints viewed by a human sender attempting to influence the subject’s dream.

Forced-choice Experiments

Perhaps the foremost forced-choice ESP experiment performed in the heyday of the card-guessing era of Rhine’s early research group at Duke University was the Pearce-Pratt series conducted on the Duke campus during the 1933-1934 academic year (Rhine & Pratt, 1954). In this experiment, the subject, a divinity student named Hubert Pearce, attempted to guess the identity of cards held in a separate building by J. G. Pratt, a graduate student in psychology. In each session, the men would synchronize their watches, and then Pearce would leave for a cubicle in the stacks of the library. Pratt then shuffled a deck of ESP cards and placed one card face down each minute on a book on a table in his building, which was either the Physics Building (100 yards distant from the library) or the Medical Building (250 yards distant). Pearce attempted to guess the identity of the card located on the book at the specified time. Two decks were guessed per session. In all, 1850 cards were guessed, and Pearce averaged 7.54 cards guessed correctly per deck, where 5 would be expected by chance. These results were significant at the p < 10-22 level, meaning that this level of success would occur by chance fewer than once in 10 sextillion such experiments. Clearly chance coincidence cannot account for these results, and they have been taken as strong evidence of ESP.

A more modern form of forced-choice experiment was pioneered by physicist Helmut Schmidt (1969) in his study of the precognition of radioactive decay, a quantum process that is in principle unpredictable under modern theories of physics. Schmidt’s study relied on a type of quantum-mechanically based REG that has since become known as a “Schmidt machine” and is now a widely-used and basic tool in parapsychological research. With Schmidt’s original machine, the subject was confronted with an array of four differently colored light bulbs. The subject’s task was to guess which bulb on the display was going to be the next to light up. The subject signaled his or her guess by pushing a button in front of the chosen bulb. During this process, an electronic counter was constantly cycling through the values 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4… at the rate of a million steps a second. After the subject pressed a button indicating his or her guess, the counter stopped when a Geiger counter detected a decay electron emitted from a sample of strontium 90 and the corresponding lamp was lit. The subject’s task could thus be construed as one of predicting the time of future radioactive decay of a strontium 90 atom to within an accuracy of a millionth of a second. (However, more plausibly from a psychological and sensory-motor point of view, the subjects were simply foreseeing which lamp would be lit.) The subjects’ guesses and the lamps actually lit were automatically recorded on counters and punch tape, eliminating the possibility of directional errors by human recorders. Extensive randomness tests were run on the REG to ensure that its output was indeed random. In Schmidt’s first experiment, three subjects made a total of 63,066 guesses and scored 691.5 more hits than they would have been expected to by chance. This level of success could be achieved through sheer luck in only two out of every billion such experiments. In a confirmation study, Schmidt had the subjects attempt to achieve high scores in some prespecified trials and low scores in others. 20,000 trials were run, and the subjects obtained 401 more hits (in the prespecified direction) than they would have been expected to by chance. Results this good would occur by chance only once in 10 billion such experiments.

Free-Response Experiments

In free-response experiments, the target is generally not chosen from a small pool of targets known to the subject, but instead may be drawn from a small pool of targets that is unknown to the subject at the time of the trial. Much more rarely, the target may be created uniquely for each trial. The subject in turn does not simply select a guess corresponding to one of a fixed number of alternatives, but rather describes her impressions of the target, which may be in the form of dreams, visual imagery, or a free-association monologue. The subject typically uses verbal descriptions or drawings to communicate these impressions.

Some of the earliest free-response experiments involved the telepathic transmission of drawings (e.g., Sinclair, 1930/1962; Warcollier, 1948/1963). In these studies, the sender, or agent, generally constructed a drawing and the percipient attempted to draw a picture corresponding to the sketch made by the agent. In these early studies, some very striking correspondences were obtained, even when the sender and the percipient were located on opposite sides of the Atlantic Ocean. However, as the targets were not selected randomly from a fixed set of alternatives, statistical evaluation of these correspondences proved difficult and a quantitative estimate of the probability that these similarities between the agent’s drawings and the percipient’s impressions would arise by chance could not be obtained, despite the subjectively striking nature of these correspondences.

The most commonly used techniques in modern free-response experiments are the ganzfeld and remote-viewing procedures. A highly successful series of remote-viewing trials was conducted in the late 1970s by the team of Targ and Puthoff at Stanford Research Institute (Puthoff & Targ, 1979; Targ & Puthoff, 1977; Targ, Puthoff & May, 1979). To give the reader the flavor of the remote-viewing procedure, a single trial from a five-trial long-distance series will be described. Unlike most of Targ and Puthoff’s trials, the target was not chosen randomly but rather was selected by a skeptical scientist. The scientist then took the remote-viewing team to the target site, which was a series of underground chambers in Ohio Caverns in Springfield, Ohio, which were filled with stalagmites and stalactites. The subject remained behind in New York City and was told only that the remote-viewing team was located somewhere between New York City and California. After the remote-viewing team had spent 45 minutes touring the caverns, the skeptical scientist then called the subject in New York, whereupon a transcript of the subject’s impression of the target area was read to him. The opening passage of the transcript was as follows:

1:50 PM before starting—Flat semiindustrial countryside with mountain range in background and something to do with underground caves or mines or deep shafts—half manmade, half natural—some electric humming going on—throbbing, inner throbbing. Nuclear or some very far out and possibly secret installation—corridor—mazes of them—whole underground city almost—Don’t like it at all—long for outdoors and nature. 2:00 PM—[Experimenters] R and H walking along sunny road—entering into arborlike shaft—again looks like man helped nature—vines (wisteria) growing in arch at entrance like to a wine cellar—leading into underground world. Darker earth-smelling cool moist passage with something grey and of interest on the left of them—musty—sudden change to bank of elevators—a very manmade [sic] steel wall—and shaft-like inserted silo going below earth—brightly lit (Targ, Puthoff & May, 1979, p. 88).

The above correspondence is of course quite impressive. But it is important that targets in free-response experiments be chosen randomly (as they were in most of Targ and Puthoff’s research). For instance, a depressing global event (or increasing sunspot activity, etc.) may have caused both the skeptical scientist and the percipient in this experiment to be in a gloomy mood, and that may account for both the scientist’s selection of a dark underground cave as a target area and for the percipient’s descriptions.

In the early 1970s, Montague Ullman and Stanley Krippner conducted an experimental study of “dream telepathy” at the New York Maimonides Medical Center (Ullman, Krippner & Vaughn, 1973). This research employed a fully-equipped sleep laboratory and was designed to investigate the possibility that a subject’s dreams could incorporate elements of an art print chosen as an ESP target. The subject went to sleep in the laboratory, with the usual EEG electrodes affixed to his head. He was then awakened toward the end of each rapid eye movement (REM) period, which is known to be associated with dreaming, and asked to give a dream report. Several such reports would be elicited from a given subject in a typical night. The art print to serve as the ESP target was randomly chosen from a set of possible targets. A person who served as sender or “agent” then attempted to “send” the picture to the sleeping subject, so that the latter might incorporate the target material into his or her dream. Usually, one art print served as the target for an entire night. After the subject’s sleep period was concluded, the subject’s dream reports were compared to the target as well as to a set of control art prints, which served as foils. The pictures were then ranked as to degree of correspondence with the dream reports, both by the subject and by outside judges. In several series, the foil pictures consisted of the remaining targets in a small target pool from which the actual target was chosen. Some subjects obtained highly successful results. For instance, a woman named Felicia Parise obtained 34 “direct hits” (meaning that the target picture was rated first among the pictures in an eight-target pool in terms of correspondence with the subject’s dream) out of 66 trials, as determined by her own ratings of the targets. Only 8.25 direct hits would be expected by chance, so this is a clearly significant result. Strangely enough, the independent judges gave Ms. Parise only nine direct hits (about what would be expected by chance). Another subject, Dr. Robert van de Castle, himself a dream researcher, spent eight nights in the laboratory as a subject and scored a “hit” (target print ranked in the top half of the eight-target pool) on each night by his own evaluations. The independent judges gave him only six hits, but five of these were direct hits, where only one direct hit would be expected by chance. Many other subjects were less successful.

Sometimes rather striking correspondences between the target print and the subject’s dream were obtained. For instance, on one night the art print chosen as target was Goya’s “The Duelers,” which portrays two Spaniards engaged in a duel with swords. One of the participants has succeeded in making a thrust into the other’s abdomen. The first dream report of the subject, Dr. Robyn Posin, a psychologist, was as follows:

[I was] in the office of a man who is sort of waiting for this woman to arrive. He’s actually … talking about her in the sense that the venom and anger that I experience in him is reserved for her … And he has this thing that’s like a bullwhip … and he hits the wall with the whip and makes a crack … and then thinks of a woman. There was something very impotent about this man’s rage … It wasn’t a bullwhip that he had, it was really a cat-o’-nine tails … It had its origins … in Spain … It was a very frightening experience (Ullman, Krippner & Vaughn, 1973, p. 131).

The researchers go on to report that

In her seventh dream, she was at a Black Muslim rally. “They were really raging, and all of a sudden some doors from an auditorium opened and out came Elijah Mohammed and a bunch of his followers … He had on this huge flaming torch with which to set some more stuff on fire, and I got very scared.” Her associations to this were “It was like a real chaos scene … the terrorism, that same kind of lack of control, I guess, that seemed to me to be anger and hostility and acting out in it … It’s some sort of conflagration, either symbolically or realistically … something rather violent” (Ullman, Krippner & Vaughn, 1973, p. 131).

Subconscious Psi

The above experimental procedures are aimed at detecting the conscious use of psi. In the past few years, there have been an increasing number of experimental investigations of the unconscious or subconscious detection of psi signals. For instance, McDonough, Don and Warren (2002) ran an experiment in which the subject attempted to guess which of four playing cards sequentially presented on a video monitor had been selected as the ESP target. They found a greater amplitude of slow wave brain potential 150-500 milliseconds after the target card was presented compared to that following the control cards.

Similarly, Satori, Massaccesi, Martinelli, and Tressoldi (2004) found subjects’ heart rates were accelerated when the ESP target picture was presented compared to the heart rates when the nontarget pictures were presented. This effect occurred even though the subjects’ scores on the conscious ESP guessing task did not differ significantly from chance.

Radin (1997b, 2003, 2004) has carried out a series of studies in which he found that subject’s heart rates and electrodermal activity (a measure of stress, anxiety or excitement) increased prior to the presentation of emotional pictures when compared to the same time periods prior to the presentation of control pictures. Across all four or Radin’s studies, this effect was statistically significant at the .001 level. Radin terms this effect “presentiment.” It would appear to be a case of subconscious precognition, manifested in physiological activity rather than in the subjects’ consciousnesses. Radin (1997a) attributes Libet’s finding that a widespread buildup in brain potential precedes the conscious experience of a voluntary decision to initiate a finger movement (Libet, 1991a) as a presentiment “presponse” (as opposed to “response”) of the brain’s own decision-making.

In a similar vein, May and Spottiswoode have found an increased startle response (as measured by skin conductance) in three-second epochs (time periods) prior to a startling stimulus (a blast of white noise) relative to control trials in which no startling stimulus was presented (May & Spottiswoode, 2003; Spottiswoode & May, 2003).

Darryl Bem, a prominent social psychologist, has devoted much of the past ten years of his career to parapsychological research. Bem (2003) has recently conducted an experiment in which he presented pairs of positively-valenced (i.e., pleasant) pictures and pairs of negatively-valenced (unpleasant) pictures to human subjects. He asked the subjects which picture of the pair they preferred. Then one picture from each pair was chosen as the target and these targets were then subliminally presented to the subject. Bem found that subjects presented with a pair of positively-valenced pictures preferred the picture that would not be chosen as the target and that subjects presented with pairs of negatively-valenced pictures preferred the picture that would be chosen as the target. Bem attributes his results to “precognitive habituation,” postulating that the repeated subliminal exposure in the future diminished the subject’s affective responses to the targets in the present (i.e., both the positive and negative targets became more neutral in comparison to the nontarget pictures). In short, repeated subliminal exposures of the picture in the future diminished the subject’s emotional/aesthetic responses in the present. Bem also found a precognitive habituation effect for targets that were supraliminally presented to the subject (i.e., the subjects could perceive the pictures consciously), but only for negatively-valenced pictures.

As with most lines of parapsychological research, these results have not been universally replicable. For instance, Broughton (2004) failed to replicate Radin’s “presentiment studies.” Broughton also reported a very poor test-retest reliability score, indicating that his subjects failed to manifest a consistent psi effect. Similarly, Sarra, Child and Smith (2004) failed to replicate Bem’s “precognitive habituation” effect using pictures of spiders as the negatively valenced targets.

PK Tests - “Micro-PK”

PK tests may be divided into roughly two types: “micro-PK” tests, in which the evidence for PK is primarily based on deviations from statistical distributions expected by chance (such as those governing the fall of dice) and “macro-PK” tests, in which the subject attempts to create a macrophysical change in the target object (such as by bending a spoon).

In the early days of parapsychology, dice typically served as the target objects in micro-PK experiments. In the modern era, quantum-mechanically based random event generators (REGs) and living systems have been the most frequently used target objects.

In one typical experiment involving animals as subjects, Schmidt (1970) enclosed his pet cat in a cold shack. In the shack was a 200-watt lamp, which served as a source of heat for the cat. Once each second, a quantum-mechanically based REG of the type described previously sent either an “on” or “off” signal to the lamp. The REG was designed in such a way that the probability of an “on” signal was 50 percent. Thus, the cat could obtain more heat by using its psychokinetic abilities to influence the REG to output more “on” signals than would be expected by chance. In fact, in 9000 trials, 4,615 “on” signals were generated, indicating that the cat may have used its PK to increase the probability of an “on” signal from 50 percent to 51.2 percent, admittedly a very slight increase, but one which would occur by chance in only eight of a thousand such experiments. To check the randomness of the generator, Schmidt ran the REG over a period of 24 nights without the cat in the shack and found no departures from chance levels in a total of 691,200 signals generated.

In a similar experiment Peoc’h (1995) placed a robot mother whose ambulations were determined by the output of an REG in the vicinity of a group of young chicks. The chicks were able to influence the output of the REG in such a way that the robot mother spent more time in close proximity to the chicks than would be expected by chance.

Some micro-PK experiments are designed to detect a psychokinetic influence on living organisms. For instance, Braud (1979) conducted a study in which human subjects attempted to influence the spatial orientation of a knife fish (Gymnotus carapo). The fish generates its own electrical field, which Braud monitored through two parallel copper plates placed in the fish’s tank. When the fish swam parallel to the plates, a weak signal was recorded; the signal became stronger as the fish rotated its position to become perpendicular to the plates. The human subject’s task was to increase the strength of these signals during certain time periods designated as conformance epochs. The strength of this signal was then compared with that generated in other time periods that served as controls. The signal was found to be stronger during the conformance epochs than during the control epochs, indicating that the subjects were successful in influencing the fish to adopt an orientation perpendicular to the plates.

In recent years, there has been a flurry of research reports relating to the effects of global consciousness on the output of REGs. Specifically, it is asserted that events that produce a state of widespread excitement through the world (or smaller region), resulting in a coherent state of consciousness involving many individuals, are associated with the anomalous behavior of REGs. One research initiative to study this phenomenon is called the Global Consciousness Project (GCP) and involves the continuous monitoring several REGs paced at different locations around the world.

Radin (2002) reported that the GCP REGs showed a high degree of correlation in their behavior on September 11, 2001 (the day of the terrorist attack on the World Trade Center) as well as on other days involving major news events over a 250 day-period. Similarly, Nelson (2002) reports that the behavior REGs at 40 host sites became more correlated with one another at the time of the September 11 attack, with this effect being statistically significant at the 10-7 level. He notes that some global events are accompanied by the anomalous behavior of REGs, while others are not (e.g., widespread flooding in Nicaragua resulting from the collapse of the Casaitas volcano). It should, however, be noted that Scargle (2002) has criticized both Radin and Nelson for reporting exploratory analyses as if they were preplanned and for “lying with statistics” by presenting misleading graphs.

Hirukawa and Ishikawa (2002) report evidence of anomalous deviations in the output of an REG toward the end of the Aomori-Nebuta summer festival in Japan.

Experiments on the effects of global consciousness on the behavior of REGs in fact predate the September 11, 2001 tragedy. Radin, Rebman and Mackwe (1996) report evidence for increased variance in the output of an REG during times of high group coherence in a Breathwork workshop, but no increased variance during times of low group coherence. Radin et al. also report a correlation in the outputs of two REGs separated by 12 miles during the first half of the broadcast of the 67th annual Academy Awards. This correlation declined in the second half of the broadcast, as did the television audience, and the strength of the correlation was significantly related to the decline in the size of the television audience.

Bierman (1996) found an increased variance in the output of an REG during time periods in which disturbances occurred in a poltergeist case in the Netherlands. He also reports deviations in the output of an REG during the time of a soccer match between the Dutch and Italian teams. The REG’s behavior returned to normal after the winning score by the Dutch team with two minutes left in the game.

Nelson, Bradish, Dobyns, Dunne and Jahn (1996) report significant deviations in the output of an REG during periods of high attention, intellectual cohesiveness, and shared emotions of a discussion group. In a review of 61 field REG experiments, Nelson, Jahn, Dunne, Dobyns and Bradish (1998) report highly significant deviations from chance expectation in REG output during intense emotional events during small group interactions, but no such deviations during times of less emotional events. In a recent summary of this line of research, Jahn and Dunne (2005) state that in general, they have found high deviations from the distribution of REG outputs expected by chance at times of highly cohesive events producing “resonance,” but low deviations from chance during time of more “mundane” events.

Of course, it is difficult to see why a coherent state of group consciousness should have an effect on the behavior of an REG that is otherwise not connected to the group. As Palmer (1997) points out, the success of these “field REG” experiments is more likely due to psi influence by the experimenter, who has a vested interest in the experiment’s outcome than to psi influence by the group members, who are generally not focusing on, and in many instances are unaware of, the REG.

PK Tests - “Macro-PK”

The subject of macro-PK is much more problematic. Macro-PK, which may involve the bending of metal specimens, the ostensibly paranormal production of images on photographic film, or the movement of small objects across the surface of a table, usually involves special subjects having the status of semiprofessional psychics. Because the psychic himself to a large extent determines the nature of the phenomena he may produce and the conditions under which he feels comfortable in producing them, the investigator does not have the same control over the experimental procedure that she would have in a micro-PK experiment instigated and designed by herself. In fact, in macro-PK research, experimental procedures and conditions often must be negotiated with the psychic if he is to perform at all. Consequently, proper procedures are much less well-defined in macro-PK research than they are in micro-PK research. As most special macro-PK subjects are suspected of, and accused of, fraud by skeptical scientists and writers, the suspicion arises that these psychics will not perform unless they have succeeded in negotiating conditions and procedures that will allow them to produce the alleged macro-PK phenomena fraudulently. Thus, there is considerable debate, both within and outside the parapsychological community, over the adequacy of the methods and safeguards taken in macro-PK research. In fact, several macro-PK subjects have indeed been detected in fraud, as will be discussed in greater detail in the section on subject fraud below.

Separation of Psi Modalities

In the beginnings of experimental parapsychology, it was thought possible to separate psi abilities into several component subtypes: telepathy (the ability to read the mind of another person or being, usually assumed to involve direct contact between minds at the mental, rather than physical, level), clairvoyance (the paranormal ability to acquire information directly from objects, such as when a subject is able to identify a card which has been hidden in a container and whose identity is known to no one at the time), precognition (the ability to foretell events that are yet to happen), retrocognition (the direct paranormal knowledge of past events), and psychokinesis (the ability of mind to influence matter directly). To this list could be added retroactive psychokinesis, the rather outlandish ability to influence events that have already occurred in the past. This seemingly implausible psi power was first proposed to exist by Helmut Schmidt (1975a, 1975b, 1984), who has since gone on to amass a considerable amount of experimental data in its support (see Schmidt , 1976, 1981, 1985, 1986, 1993; Terry & Schmidt, 1978; Gruber, 1980; Schmidt, Morris & Rudolph, 1986; and Schmidt & Schlitz, 1988). In a typical retroactive-PK experiment, a subject may be asked to use his mental abilities to increase the rate at which one of two lights comes on. Unknown to the subject, the behavior of the lights is governed by the output of a random event generator (REG) of the “Schmidt machine” type that was generated two weeks previously. Thus, the subject’s covert task is to extend his PK influence backward in time to influence the behavior of the REG two weeks in the past. Schmidt has actually provided a fairly plausible account of why such retroactive PK effects might be expected to occur, based on his reading of quantum mechanics. Schmidt, along with many other theorists, believes that the outcome of a quantum process does not take on a definite value until it is observed by a conscious being (even if a considerable period of time elapses before the observation takes place).

Separation of Types

Precognition. Early on in parapsychology, it became apparent that it was difficult to establish the existence of any of these pure forms of psi in a definitive manner. For instance, in Schmidt’s four-button precognition experiment, instead of using precognition to guess the identity of the correct lamp, the subject may rather be pushing a button and then using her psychokinesis to cause the correct lamp to light up.

Recently, Steinkamp (2003, submitted) has attempted to test PK counterexplanations of the evidence for precognition by conducting an experiment in which an ESP target was determined through a complex calculation involving future closing prices of stocks. Her experiment failed to produce statistically signification evidence of precognition, which she took as evidence that precognition does not exist; however, the results of her nonprecognitive ESP control trials also failed to be statistically significant. Thus, her experiment failed to produce any evidence of psi scoring at all. As critics of parapsychology are fond of pointing out, it is impossible to “prove” a negative hypothesis such as the nonexistence of precognition. To use William James’ favorite example, no number of black crows can falsify the hypothesis that white crows exist, and one white crow would establish the truth of that hypothesis. (Even an exhaustive search of the terrestrial avian population may not suffice for a determined believer who may respond by postulating the existence of extragalactic white crows.) Thus, Steinkamp’s negative evidence does not disprove the existence of precognition (or of ESP in general for that matter).

Steinkamp’s experiment is based on the argument that stock market data would be impervious to psychokinetic manipulation due to the large number of persons with a vested interest in the behavior of stock prices, whose own PK efforts would be expected to overwhelm those of Steinkamp’s subjects.

While this argument might have much validity regarding the rise and fall of the prices of individual stocks, it is not clear that the general public has a vested interest in the outcome of a complex mathematical manipulation of the closing prices of several stocks to select a target from a target pool. Also, parapsychologists have generally found the strength of psi effects to be unrelated to the complexity of the target systems (see Stanford, 1978; Schmidt, 1984; and Foster, 1940). What would be needed to settle this issue is an experiment to see if subjects can subtly manipulate the collective behavior of stocks through PK to produce a desired outcome in regard to target selection.

In designing her experiment, Steinkamp followed the thought of her mentor Robert Morris, who held the Koestler Chair in parapsychology at Edinburgh University up until the time of his recent death and was responsible in large part for the growth in European parapsychology over the past two decades. Morris (1982) argued that experiments that have used complex procedures to determine an entry point into a random number table to generate the targets for a precognition experiment, such as the complex procedure using 10-sided dice used by Mangan (1955) or Nash’s use of stock market data (Nash, 1960) provide a strong suggestion that “true precognition” exists. However, as noted above, in view of the task-complexity independence and goal-orientation of ostensible PK phenomena, it may be premature to assert that such systems are not susceptible to PK influence on the basis of their complexity.

Perhaps, at this stage of the game, the best evidence for pure precognition comes from cases of spontaneous psi. Even in this regard, Tanagras (1967) has argued that all cases suggestive of precognition can be explained away on the basis of psychokinetic induction of the precognized event. Thus, a woman harboring an unconscious death wish against her husband may dream of her husband’s dying in a car crash and then use her psychokinetic powers to cause the accident itself. Similarly, Eisenbud (1982a) proposed that all instances of ostensible precognition may be explained in terms of forward causal chains (in which causes precede their effects in time). Such explanations might involve unconscious inferences, psychokinetic influences and “real time” telepathic interactions between the parties involves. Mundle (1964) has also expressed a preference for explanations of precognitive experiences that employ only forward causal chains, primarily on the basis of what he perceives as insurmountable difficulties with multidimensional models of time (about which more will be said in the next chapter).

Many parapsychologists reject PK-based counterexplanations of cases of ostensible precognition on the basis that some cases on record involve mine collapses, plane crashes, tornados, and other events that make Carrie’s high school prom look like an idyllic class picnic in comparison. They argue that, aside from these precognition cases there is little evidence that events of such a magnitude can be produced through psychokinesis. However, in making his case against precognition, Eisenbud asserted that no limitations on PK influence should be assumed. With respect to the experimental evidence for precognition, Targ and Harary (1984) cite the fact that scoring rates are typically not as high in successful PK experiments as they are in successful precognition experiments as evidence against the hypothesis that the experimental evidence for precognition can be explained on the basis of psychokinesis.

Telepathy. There is a similar difficulty in separating telepathy (direct mind to mind interaction) from clairvoyance (direct knowledge of a target object). For instance, in a telepathy experiment in which a percipient attempts to guess what card a sender is looking at, it is quite possible that the percipient may use her clairvoyant powers to read the card directly rather than reading the mind of the sender. Alternatively, if the sender is merely thinking of a card, the identity of which he will announce later, the percipient might use precognitive clairvoyance to access whatever physical record of the target is later made. Also, even if a seemingly pure test of telepathy could be devised, any alleged telepathy on the part of the percipient could be interpreted as clairvoyant perception of the brain state of the agent. Such considerations led J. B. Rhine (1974) to call the existence of telepathy an “untestable hypothesis” and to recommend that the problem of proving the existence of pure telepathy be “shelved.” Also, the experimental evidence for ESP is confronted by counterexplanations in terms of PK in the same way that the experimental evidence for precognition is.

PK. It is also possible that much of the evidence for PK, especially that arising from micro-PK experiments with REGs as targets, might be explained in terms of precognition. Specifically, consider Schmidt’s experiment with his cat and its heater lamp, as described above (Schmidt, 1970). Rather than assuming that the excess of “on” signals is due to the cat’s PK abilities, it might be argued that Schmidt used his precognitive ability to initiate the experiment (e.g., by pushing a button) at the precise time that a series containing an excess of “on” signals was about to be generated. As evidence that the experimental evidence for PK can be explained on the basis of such psi-mediated “decision augmentation,” May, Utts and Spottiswoode (1995) cite the fact that the statistical significance levels in reported PK experiments with REGs as targets do not increase with the number of trials in the experiment. (Normally, one would expect that, if the subject’s PK scoring rate is constant, this would result in an increased level of statistical significance. For instance, while there is a 3% chance that 53% of more of 1000 flips of a fair coin will come up “heads,” there is only one chance in a billion that 53% or more of 10,000 coin flips will come up “heads.” This is known as “the law of large numbers” in statistics: the greater the number of trials, the less likely it is that one could achieve a given above-chance scoring rate by “sheer luck.”) Thus, May et al. construe the fact that the statistical significance of PK experiments does not increase with the number of trials as evidence against a PK influence on the part of the subjects.

The lack of dependence of the statistical significance level on the number of trials is certainly evidence against the hypothesis that subjects’ PK scoring rates per trial will be the same no matter what the number of trials. However, it is not inconceivable that subjects may tire with increasing numbers of trials or may not be able to devote full attention to each trial if the number of trials per unit of time is increased (as is often the case with large sample PK experiments).

It should be noted that Dobyns and Nelson (1998) have examined the database of PK trials compiled at the Princeton Engineering Anomalies Research laboratories and have found the results to be compatible with the constant PK scoring rate hypothesis in that statistical significance does increase with increasing numbers of trials. Similarly, Ibsen (1998) found that PK scoring rates did not differ significantly between 200 binary trials and 2 million binary trials, contradicting May et. al’s “decision augmentation theory.”

However, Pallikari (2004), in a largely graphical analysis of the results of PK tests with random event generators, found that the size of the obtaining hitting rate declined with the square root of the number of trials, as would be predicted by May et al.’s “decision augmentation theory.” Pallikari further found that that the reported odds against the effects being due to chance were generally less than 1 in 100 million, whereas if the effects were due to a constant PK-hitting rate, the odds would be expected to increase without bounds as the number of trials increases. Pallikari proposes this as a type of “ceiling” on the level of significance that can be obtained in a single PK-REG experiment. Pallikari also contends that the existing evidence indicates that human subjects can create a “broadening” in the hit rates produced in single experiments but cannot impose a specific hit rate over multiple trials. Pallikari notes that human PK influences act to increase the numbers of runs of hits or misses within an experimental series, which will tend to balance out to chance expectation over a long series of runs.

The debate over May et. al’s “decision augmentation theory” is by no means resolved and is an active area of investigation at the present time.

Means of Resolution. Because of the difficulty of obtaining an experimental separation of the various types of psi phenomena, the parapsychologists Robert Thouless and B. P. Wiesner suggested adopting the neutral term “psi” to designate parapsychological phenomena of unspecified type (Wiesner & Thouless, 1942; Thouless & Wiesner, 1948). They further suggested that psi might be broken into psi-gamma, the receptive type of psi seen in clairvoyance, telepathy and precognition experiments, and psi-kappa, the active type of psi generally seen in PK experiments (although as we have just seen, it may be difficult to distinguish between these two types of ability in practice). The British psychologist John Beloff (1979b) has argued for the retention of the traditional categories of psi in general and of telepathy in particular. Beloff feels that interpretations of phenomena suggestive of telepathy in terms of clairvoyance of brain states is questionable, as it is doubtful that a person could interpret the idiosyncratic “neural code” employed by another person’s brain.

Parapsychologists continue to use terms such as precognition, clairvoyance and psychokinesis in describing their own experimental procedures, but these typically refer to the experimental task as described to the subject rather than implying that a particular set of experimental results is definitely due to, say, precognition rather than PK.

Criticisms of Parapsychological Research

We will now turn to an examination of the controversies surrounding experimental work in parapsychology, the lessons that may be learned from them regarding proper methodology and the reasons for the continuing resistance of the scientific establishment to experimental psi research.

Irrational motives. Before plunging into a discussion of the various “rational” criticisms that have been made of experimental methodology such as charges of statistical and methodological errors in psi research and allegations of fraud by subjects or experimenters, it may be instructive to consider first the possible irrational thought processes that may account for the fact that some people seem to rush to embrace a belief in psi phenomena while other people summarily dismiss the possibility that psi phenomena may exist, often before examining the evidence for such phenomena. The irrational bases for both belief and disbelief in psi may include emotional and religious motives, metaphysical prejudices and the various types of illogical reasoning that often underlie the formation of attitudes in general.

To begin with, at least some skeptics have indeed manifested a fairly closed-minded and prejudiced attitude against psi research. The reader will recall from Chapter 0 the quotations from the eminent psychologist Donald Hebb and the equally eminent physicist Hermann von Helmholtz that no amount of evidence would suffice to convince them of the existence of psi phenomena.

In part, this rejection is based on the perception that the existence of psi phenomena would be incompatible with known scientific principles. Certainly it is true that it would be hard to explain ESP and PK on the basis of currently understood physical processes. This does not, however, imply that psi phenomena are necessarily in conflict with any known laws of physics (except perhaps for certain types of psychokinetic phenomena). In fact, not only are many of the theories proposed by parapsychologists to account for psi compatible with known scientific principles, a large number of them are even based on such principles (these theories will be discussed more completely in the next chapter). It is probably true that the ultimate explanation of psi phenomena will require the postulation of new entities or processes that are not part of current scientific theories, but this does not mean that psi phenomena need violate any established law of science, as will be made clear in the succeeding chapter. In this light, the (a priori) resistance of orthodox scientists may be based more on a desire for closure or a tendency to see the work of science as being completed than on any real logical contradiction between psi phenomena and established theories.

With such a desire on the part of some scientists to see the picture of the world constructed by science as finished and final, it is not surprising that there is antagonism toward psi phenomena, with their implication that current scientific pictures of the world are incomplete. This “principle of closure” was recognized by the Gestalt psychologists as a common psychological tendency arising from the pressure to achieve a solution to a problem-solving task. One way a person may reduce such psychological pressure is to enter a state of premature closure, in which the problem is viewed as having been solved when in fact it has not been. This tendency toward closure is illustrated in comments by the biologist Sidney Fox, who argues that, if new laws are required to explain the emergence of life, then they lie “outside the realm of science” (Fox, 1988, p. 45). Fox thus equates science with established scientific theories rather than with the process of scientific discovery.

Religious motives. There is no doubt that the same psychological needs that promote belief in various religions (including desires for control over the elements, knowledge of the future, protection from natural forces and the vagaries of chance, power over disease, and a life after death) are also responsible for the widespread belief in parapsychological phenomena. Modern science has discredited naive and literal interpretations of many religions, and for many people parapsychology fills the void thus created. Parapsychology not only has all the accouterments of science itself but also promises to satisfy most of the psychological needs underlying religious belief through its alleged demonstration of such paranormal phenomena as psychokinesis, precognition, psychic healing, and the survival of death.

Religious motivations and a desire to overthrow what they regarded as the depressing mechanistic cosmology proposed by nineteenth-century science formed an explicit and openly acknowledged part of the motivations of the founders of the Society for Psychical Research (S.P.R.) in late nineteenth century Britain. Prominent among these concerns was unquestionably the fact of death with its promise of total annihilation of the human personality. This great concern of the early psychical researchers with the problem of the survival of death was undoubtedly in part attributable to the biological instinct for survival as well as a desire to be reunited with lost loved ones. As noted in Chapter 0, this fear of death arises in part from the identification of oneself with the Person (the physical body conjoined with the collection of memories, motives and emotions that comprise one’s personality) rather than a potentially recyclable field of pure consciousness.

It is well known, for example, that the public’s interest in mediums and seances, with their offer of a chance to communicate with deceased loved ones, tends to increase markedly during and after times of great tragedy, such as world wars. As psi phenomena seemed to contradict the exclusively materialistic outlook of nineteenth century physics and to point to a mental realm over and above the physical world, they were readily embraced by persons seeking scientific support for the concept of a spiritual realm. Indeed, as recently as 1982, upon the occasion of the centenary of the founding of the S.P.R., the prominent British parapsychologist John Beloff (1983) asserted in his presidential address to the Parapsychological Association that the survival of death was contingent upon the existence of psi. This is probably a questionable assertion, as it is quite conceivable that the mind could survive death even if it did not possess the powers of ESP and PK, but it does show how closely related the issues of the existence of paranormal powers and the survival of death are in the minds of many parapsychologists.

The scientific community may in turn have feared (and may still fear) a trend toward irrationalism and a possible reemergence of religious persecution with an accompanying attempt to suppress scientific doctrines. Certainly the memory of the Christian resistance to the heliocentric (sun-centered) model of the solar system has never been far from the consciousness of the scientific community. In more recent times, the scientific community has been concerned with attempts to suppress the teaching of Darwin’s theory of evolution in the public schools in America and to insert various creationist theories into the biology and physics curricula. Certainly, the credulous attitude of many elements of the public and even some self-proclaimed parapsychologists toward various purported paranormal phenomena has done little to lessen the skeptics’ fears. The skeptics, however, do a disservice to the enterprise of rational inquiry when they misleadingly classify parapsychologists who adhere to the principles of science in their investigations of purported paranormal phenomena together with wide-eyed believers in the Loch Ness monster and Bible-thumping creationists.

Psychodynamic factors. Paranoid mechanisms undoubtedly account for some portion of the belief in psychic powers, especially one’s own psychic powers. Paranoid delusions of persecution, for instance, commonly include the belief that one’s enemies are paranormally monitoring and manipulating one’s thoughts. James Alcock (1981) is among several skeptics who contend that such “magical thinking” also underlies parapsychologists’ belief in psi.

On the other hand, ardent disbelief in psi phenomena could be construed as a form of defense against paranoid thoughts (and the causal efficacy of unconscious wishes) or as fear of the uncanny and unknown. Charles Tart, a psychologist well known for his studies of altered states of consciousness, has attributed skepticism toward psi phenomena to a “primal repression” of threatening telepathic interactions between mother and child (Tart, 1982). In an attempt to document such a widespread fear of psi, Tart asked subjects to imagine that they possessed extraordinarily strong psi powers that were effective within a 100-yard radius. He found that the reactions of the subjects in this “belief experiment” were predominantly negative (Tart & Lebore, 1986). The Australian researcher Harvey Irwin (1985) found fear of psi to correlate negatively with sympathy for psi research, lending support to Tart’s contention that such fear forms a motivational basis for skepticism. Such fears of psi are sometimes explicitly stated by skeptics. For instance, the prominent skeptic Robert Baker (1990) expresses his horror at the thought that psi powers might exist, as that would imply that physical disasters might result from the slightest angry thoughts, privacy would be at an end, and the world would be populated by human monsters.

Social and attitudinal processes. Obviously, the subfield of social psychology known as “attitude theory” may have much to tell us about how attitudes toward parapsychology are formed. According to Hovland, Janis and Kelly’s (1953) reinforcement theory of attitude formation, one tends to hold beliefs that one has been rewarded for expressing and to extinguish beliefs for which one has been punished for expressing. To express a belief in psi phenomena or to pursue psi research may result in a loss of tenure and funding and a general ostracism from the orthodox academic community. Thus, as the reward structures within the academic community tend to favor anti-psi beliefs, one would expect them to engender skepticism. Of course, the propensity of some members of the lay public to provide monetary support and sometimes even adulation to investigators expressing belief in psi powers might be a factor serving to increase belief in psi among such investigators.

According to Festinger’s “cognitive dissonance” theory, one method of avoiding the stress arising from an inconsistent belief system is to avoid exposure to high-quality communications and arguments that run counter to one’s own position on a given issue (Festinger, 1957). This might explain the tendency of some parapsychologists to ignore or repress legitimate and constructive criticisms of their methodology. Sometimes this can result in disaster, as illustrated by the case of Project Alpha, in which researchers at Washington University in St. Louis were deceived by stooges of critic James “the Amazing” Randi posing as metal-bending psychics, primarily because the researchers failed to employ safeguards suggested by Randi (Randi, 1983a, 1983b, 1986). Project Alpha will be discussed in more detail in the section on subject fraud below.

On the other side, critics often fail to heed (or at least discuss) the better-conducted studies in the field of parapsychology. Occasionally, critics write books and articles debunking the weakest claims in the field, such as Arthur Conan Doyle’s alleged pictures of fairies (Randi, 1980) and the psychic entertainer Kreskin’s stage performances (Marks & Kammann, 1980), while at the same time claiming to have debunked the entire field of parapsychology. This may be understandable in light of the fact that the scientific community’s main concern may be directed toward a possible trend toward irrationalism (and the concern of magicians such as Randi may be directed at possible dishonest and attention-stealing uses of conjuring techniques). The primary concern of such critics may thus not be so much with “legitimate” parapsychology, but with clearly quack science and charlatanism. To the extent that this is the case, parapsychologists should applaud their efforts (but not their claim to have debunked the entire field of parapsychology).

Another technique for avoiding a sense of cognitive dissonance is to reduce the psychological importance of an issue about which there is conflict. I recall the remarks of one prominent cognitive psychologist who told me that, although he did not know whether parapsychological phenomena existed or not, he did not see why they were of any importance (perhaps reflecting his concern as a psychological rather than a physical theorist). A related strategy for reducing the stress of cognitive inconsistency is to “stop thinking” (Abelson & Rosenberg, 1958). To some extent, this has been the traditional response of academic psychology to the claims of parapsychology, which are almost never discussed in any detail in the academic curricula of psychology departments. Because of this “heads in the sand” approach, departments of psychology have in many instances failed in what should be their responsibility to provide a responsible (even if skeptical) discussion of alleged parapsychological phenomena, a topic that is of great interest to students and to the public in general.

Lack of social support for one’s beliefs is another source of cognitive dissonance, according to Festinger. Certainly, the tendency of people to conform to group opinion, the pressure put upon people to conform to majority opinions by groups, and the tendency of people to obey authority figures have been amply demonstrated in classic psychological experiments by Asch (1958), Schachter (1951), and Milgrim (1963, 1968). Within the academic community, one would expect such pressures to favor skepticism with regard to psi phenomena. One way to decrease the cognitive dissonance arising from lack of social support is, according to Festinger, to decrease the perceived attractiveness of the disagreeing parties (which might result in a skeptic classifying all parapsychologists as fairy-worshipping lunatics or a parapsychologist classifying all skeptics as unimaginative, narrow-minded bigots).

In closing, it should be noted that the denial of funds and professional opportunities is not confined to parapsychologists but is a problem faced by non-mainstream scientists in general. Eliot Marshall (1990), for instance, describes the denial of telescope time to heterodox plasma theorist Halton Arp, depriving him of the opportunity to make even basic observations. By restricting telescope time to supporters of orthodox views, Marshall notes, the process of science is thereby distorted and the potential for dialogue between opposing views closed off. (Indeed, the possibility of observationally-supported heterodox views even arising is virtually eliminated). As a resolution to this problem, Marshall recommends that some funds be allocated to non-mainstream scientists, without requiring the process of “peer review” by advocates of orthodox positions.

Rational Bases

We will now consider more rational grounds for the rejection of psi phenomena that are based on legitimate concerns regarding proper methodology, the issue of the replicability of the effects and the possibility of fraud.

Sensory cues. When one is attempting to establish the existence of an ability to identify target material that lies outside of the normally recognized channels of the physical senses, it is obviously important to exclude the possibility that the subject’s knowledge of a target is based on “sensory cues” (that is, information acquired through the usual physical senses). The early days of experimental parapsychology were not characterized by the stringent safeguards against sensory cues that are (usually) employed today. For instance, an “agent” might sit at one end of a table, pick up an ESP card and attempt to project its identity into the mind of a percipient seated at the other end of the table. Under these circumstances, the percipient might learn the card’s identity by seeing the card reflected in the agent’s eyes or by picking up on cues unconsciously provided by the agent (such as tilting the head when viewing a “star,” etc.). The behaviorist B. F. Skinner pointed out that the cards in the Zener ESP deck used by Rhine in his early experiments could be read from the back under certain lighting conditions, invalidating any experiment in which the percipient could see the backs of the cards. Parapsychologists were quick to respond to such critiques by totally isolating the subject from the targets (such as by having them in a separate building, as was done in the Pearce-Pratt series discussed previously, for instance). Most forced-choice experiments in parapsychology today are characterized by adequate shielding of the target from the percipient. Exceptions do of course still occur, as no field is immune from methodological errors committed by its practitioners. For instance, Don, Warren, McDonough and Collura (1988) report an experiment in which the ESP cards used as targets were placed directly on the hand of the special subject (Olof Jonsson), allowing him access to possible sensory cues arising from the back of the cards, as well as possible glimpses of the fronts of the cards. The random number table used to generate the targets was also in the room with the subject and could have been a source of additional cues.

Rupert Sheldrake (1998b) reports an experimental investigation of the hypothesis that people know when someone is looking at them and that this knowledge is mediated by ESP. However, in Sheldrake’s experiment, the “starer” was sitting directly behind the “staree.” This allows for the possibility of sensory cues, in that the starer’s breathing and body movements may be different during staring trials from those during non-staring trials. The subject might be able to use to such cues to differentiate between staring and non-staring trials.

In an attempt to placate his critics, Sheldrake (2001) repeated his experiment with the subjects blindfolded and trial-by-trial feedback eliminated (i.e., the subject was not told immediately after the trial whether the trial had been a staring or non-staring trial). However, this halfhearted attempt at sensory shielding still leaves open the possibility that the subject could be responding to differences in the starer’s breathing patterns and bodily movements between staring and non-staring trials.

Lobach and Bierman (2004a) repeated Sheldrake’s experiment with improved sensory shielding. Also, to eliminate artifacts due to response bias (e.g., a subject who calls “staring” on 80% of the trials would be expected to have a hit rate of 80% on staring trials, not the 50% rate that would be expected to obtain across all trials by chance), Lobach and Bierman did not analyze staring and non-staring trials separately, as did Sheldrake. In three attempts to replicate Sheldrake’s findings, Lobach and Bierman found no evidence that subjects could distinguish between staring and non-staring trials at a rate significantly greater that what would be expected by chance. They conclude that Sheldrake’s staring detection effect is not as easily replicated as claimed by Sheldrake.

Sheldrake (2005), however, points to the success of experiments in which the “starer” watches the “staree” over a closed television circuit in a separate room as evidence against the hypothesis that the remote detection of staring is due to sensory cues. In this regard, he cites meta-analysis of 15 such experiments by Schmidt, Schneider, Utts and Walach (2004) indicating that there was overall statistically significant evidence of psi under such conditions.

Sensory cues may result from more subtle flaws in an experiment. For instance, the noted critic Martin Gardner (1981) has pointed out that a light system that was used by the “sender” to signal the beginning of the next trial to the percipient in Charles Tart’s well-known experiments on training ESP ability (Tart, 1976) might allow the sender to provide cues as to the identity of the next target through the conscious or unconscious use of a time delay code (e.g., the sender might delay the signal longer for some targets than others). As the percipient was provided with trial-by-trial feedback, he or she might consciously or subconsciously become aware of this tendency. As a rule of thumb, in a parapsychological experiment no person with knowledge of a target’s identity should be allowed to communicate with the subject attempting to guess that target until after the subject has made his guess.

Sheldrake and Smart (2003) investigated the hypothesis that people sometimes know who is calling on the telephone before even picking up the phone. Of course, as a spontaneous phenomenon, this sort of guessing may be mediated by knowledge who is likely to call at which time of day, ongoing crises and other daily events that may involve some friends (possible callers) more than others, and the amount of time that has elapsed since the person last called. In Sheldrake and Smart’s experiment, subjects had to guess which one of four target persons was calling on the phone during a preassigned time interval. Sheldrake’s subjects were able to identify the caller before picking up the phone with a frequency that would occur by chance less than four times in a million.

A similar experiment was run by Lobach and Bierman (2004b). In their experiment, one of four target persons was randomly assigned to call the subject during a preassigned five-minute time interval. The subjects were able to identify the caller on 29.4% of the trials compared to the 25% rate that would be expected by chance (and this difference was statistically significant at the .05 level). Lobach and Bierman note that almost all of their above chance scoring occurred around 13:30 local sidereal time (i.e., time relative to the “fixed stars” rather than the sun).

Both the study conducted by Sheldrake and Smart and that conducted by Lobach and Bierman are susceptible to explanation in terms of a time-delay code. For instance some callers may call early in the five-minute trial interval and others may call late. The subject may learn to use such differences in calling times to identify the caller. The possibility may also exist that different phones produce different rings on the receiving phone.

It should also be noted that Schmidt, Muller and Walach (2004) attempted to replicate Sheldrake and Smart’s phone-calling experiment, but they obtained nonsignificant results.

In the same vein, Sheldrake and Smart (2000) report an experiment in which Pam Smart’s dog Jaytee seemed to know when its owner was coming home and would go to the window or the porch to await her arrival. However, Wiseman, Smith and Milton (2000) failed to confirm Sheldrake and Smart’s results when strict quantitative criteria were used to define the event “Jaytee goes to the porch.” They attribute Sheldrake and Smart’s results to Jaytee’s becoming more anxious regarding Pam Smart’s absence as time went on, resulting in more frequent trips to the porch and window.

Trial-by-Trial Feedback The mathematician Persi Diaconis (1978) pointed out the danger of giving trial-by-trial feedback to a subject guessing a target pool that is being sampled without replacement, such as might occur if the subject is guessing a deck of ESP cards and being shown each card after every guess. Such feedback would enable the subject to improve his chances by avoiding guesses corresponding to already sampled targets (e.g., if the subject guessing an ESP deck has already seen all five circle cards, he can improve his chances by not guessing “circle” again). This is of course correct, but Diaconis’ implication that this was a typical testing procedure in parapsychology at that time is misleading. In fact a search conducted by Charles Tart two years prior to the publication of Diaconis’ article revealed only four studies using such a procedure, three of them appearing in an unpublished master’s thesis. Tart had labeled all four studies as methodologically defective in his review of the literature pertaining to studies employing trial-by-trial feedback (Tart, 1976).

While it is rare for forced-choice experiments to employ trial-by-trial feedback with such a “closed deck” procedure, this does occur more often in free-response experiments, in which subjects give their subjective impression of a target rather than guessing it directly. In one type of procedure, these subjective impressions are ranked or matched against all the targets used in the experiment. For instance, Puthoff, Targ and Tart (1980) conducted an experiment in which a subject attempted to use her ESP to describe ten different target objects. Because the subject was shown the target object after each trial, she could, on subsequent trials, avoid giving descriptions corresponding to previously seen targets, thus artifactually inflating the probability that her descriptions would be correctly matched to the targets by the judges. Diaconis’ criticism is clearly applicable to this sort of free-response experiment. In a similar vein, Marks and Kammann (1978, 1980) have argued that in Targ and Puthoff’s main remote viewing research (e.g., as reported in Targ & Puthoff, 1977) the subjects’ remarks contained clues as to trial order (by referring to “two previous targets,” for example) and target identity (by explicitly referring to previously seen targets, which the judges would then know not to match with the present transcript). Tart, Puthoff and Targ (1980) attempted to respond to Marks and Kammann’s critique by conducting an analysis showing that the results were still statistically significant even after these cues had been edited out of the transcripts. One can of course quibble about the efficacy of the editing process, and Marks and Scott (1986) have argued that statements left in the transcripts after the editing that referred to the subject’s location could constitute residual order cues that might be used by judges. For instance, in one trial Targ asked the subject if he noticed “any difference being in a shielded room rather than in the park.” This was the first trial done in a shielded room. In any event, the basic problem (namely, the avoidance by the subject of responses descriptive of previously seen targets) that arises from use of trial-by-trial feedback under conditions of sampling without replacement remains, no matter how effective one assumes the editing process was.

Cuing of Judges. A related problem in free-response experiments involves the nonverbal sensory cuing of judges. For instance, in remote viewing experiments conducted by Bisaha and Dunne (1979), judges were provided with pictures of the target location taken on the day of the remote-viewing trial. Thus, cues as to weather conditions, seasonal variations (e.g., foliage conditions), time of day, and so forth, could have been present in both the subject’s transcripts and the pictures, and the judges could then use these cues, consciously or unconsciously, to match the transcripts to the targets. Bisaha and Dunne deny that such cues exist, but in the only two picture sets they reproduce from their first experiment, the leaves are still on the trees in one, whereas the trees are bare in the other. Also, Marks (1986) has pointed out that, in Bisaha and Dunne’s experiments, the decision as to whether the subject’s drawings of his or her impressions of the target site would be presented to the judges was made on an ex post facto basis (that is, after examination of the drawings), which may have biased the information presented to the judges in favor of correct transcript-target matchings. Marks goes on to note that the choice of which photographs of the target site to present to the judges may have been a further biasing factor.

As another example of how sensory cues may be inadvertently provided to judges in free-response experiments, one can cite the transoceanic remote-viewing experiment reported by Schlitz and Gruber (1980). In the first judging of the experiment, the agent’s impressions of the target site were included among the material provided to the judges. This may have resulted in cues as to target order being given to the judges (e.g., the agent and the percipient might both refer to a news event occurring on a given day). Schlitz and Gruber themselves pointed out this problem and reported a rejudging of the experiment with the cues deleted; the results were still significant (Schlitz & Gruber, 1981). The skeptic Ray Hyman (1986) later detected a further possible source of sensory cues to judges arising from the fact that Gruber, who served as agent and hence knew the identity of the target for each trial, was responsible for translating the remarks of Schlitz, the percipient, into Italian for presentation to the judges. His translation might have been biased by his knowledge of the target site.

A very similar problem exists in two dream telepathy experiments reported by Child, Kanthamani and Sweeney (1977). In each experiment, an agent attempted to send a different target picture into the dreams of a percipient on each of eight different nights. The eight sets of dreams and impressions were then ranked by the agent against the eight targets. However, as the agent knew which target was used on which night and as the percipient’s dreams might be expected to incorporate certain “day residues” reflecting the events of the prior day, the agent could have used this knowledge to match the dreams against the targets.

Most of the feedback-related problems discussed above arise from the fact that the experiments in question used a procedure involving sampling without replacement from a finite target pool, often combined with a procedure involving judging the entire set of transcripts for a series against the actual targets used in that series. In most free-response experiments, such as the well-known research line of ganzfeld experiments, the responses are judged against a different target pool for each trial and do not suffer from the problem of sensory cuing of judges to the extent that the above-discussed studies do. That is not to say that these experiments have been immune from such problems. For instance, in the early ganzfeld experiments, the subject often judged his own transcripts against the same physical target pool used by the agent. Thus, it is possible that the subject could obtain cues as to which target was actually sent by the agent by examining the target pictures for fingerprints, crumpling effects and so on. In an initial attempt to test this “greasy fingers” hypothesis, John Palmer (1983, 1984) found no evidence that subjects in fact used such cues to identify the picture held by the agent, although the results of this study were to some extent contradicted by a later study by Palmer and Kramer (1986) indicating that subjects could indeed use such cues to identify the target picture when they were specifically instructed to do so. This problem has been eliminated in the recent ganzfeld experiments through either the use of duplicate sets of pictures, one to be used by the agent and the other by the judge, or the use of electronically stored and presented targets.

It is obvious that the problem of sensory-cuing can be a subtle one. While the problem of eliminating sensory cues to subjects has long ago been resolved, procedures for the elimination of sensory cues to judges are still evolving. How prevalent is the problem of sensory cuing in parapsychological research? In order to find out, Akers (1984) conducted an analysis of a sample of 54 parapsychological experiments. Among his criteria for inclusion were that each experiment should have produced significant evidence of psi and that the experiment be from a relatively repeatable line of research. Consequently, Akers’ sample included a large number of ganzfeld experiments, as this is one of the lines of parapsychological research that have come the closest to producing a repeatable parapsychological experiment. Of the 54 experiments, Akers cites 22 studies as possibly providing sensory cues regarding the target’s identity to subjects or judges. One would suspect, however, that, had Akers’ sample included a greater proportion of (less repeatable) forced-choice studies, the proportion of studies with methodological flaws involving sensory cues would be reduced.

Now we turn from the problem of sensory cues in ESP experiments to the related problems of motor artifacts in PK experiments.

Motor artifacts. In certain types of psychokinesis experiments, it is very important to ensure that the subject cannot use his or her motor skills to influence the target apparatus. For instance, in a “placement PK” experiment in which a subject is attempting to use psychokinesis to influence a series of balls rolling down a chute to go into the left or right side of a collection bin, it is important to ensure that the subject cannot influence the balls by breathing on them, rocking the table, altering the air currents by changing his or her body position, and so forth. Also, it is important to ensure that the balls be placed in the apparatus in the same way at the beginning of each trial, otherwise the experimenter may subconsciously learn how to place the balls in such a way that the desired result occurs.

As an example of such artifacts, consider an experiment reported by Egely (1986) in which a subject attempted to influence the motion of objects floating in liquid in a Petri dish. The subject was allowed to put his hand into the shielded box and next to the Petri dish. Thus, it is quite possible that the obtained motions of the target object might have been due to air currents set in motion through the moving of the subject’s hand. As another example, Taylor (1980) has contended that the movement of small objects by the Russian psychic Alla Vinogradova was due to an electrostatic effect. Specifically, he contends that a repulsive force may have been induced by electrical charges on her hands as well as on the object to be moved. Indeed, motion pictures taken of this type of “psychokinetic” motion (which was much in “vogue” in the 1970s) do suggest this possibility, as the objects are typically a short distance from the psychic’s fingertip and moving away from it, much like a peanut one centimeter in front of the nose of a contestant in a peanut race. Taylor also cites the fact that Vinogradova typically rubbed her hands together prior to her performances as further evidence that an electrostatic effect was responsible for the motion of the objects.

Experiments that involve an attempted psychokinetic influence of living targets should also include precautions against normal sensory-motor influence. For instance, in an experiment conducted by Barry (1968a, 1968b), subjects sat for fifteen minutes at a distance of one and a half meters from a set of ten Petri dishes, attempting to inhibit the growth of the fungus in five experimental dishes while “ignoring” the control dishes. Under these conditions, it might be possible for a subject to influence the growth by, for instance, breathing on the dishes. In the early 1950s, Richmond (1952) reported an experiment in which he was successful in influencing paramecia to swim to a specified target quadrant as he viewed them through a microscope. As Richmond was obviously close to the microscope, he could have influenced the paramecia through his breath or by jiggling the slide. Another problem in this experiment is that, while the target quadrant was randomly selected, Richmond did not use a random process to select the paramecium to be influenced; thus he may have selected paramecia already predisposed to move to the desired quadrant, as has been pointed out by Johnson (1982).

Pleass and Dey (1987) report an experiment in which subjects attempted to use their PK abilities to influence the motion of specimens of the marine alga Dunaliella. A problem with this experiment is that the subjects were allowed to select which time periods would be the PK influence periods rather than having these periods specified in advance. Thus, it is possible that the subjects may have been able to choose favorable periods based on sensory cues derived from observations of the algae or of environmental factors that were associated with the motion of the algae.

Fortunately, most PK experiments reported in the modern literature are relatively free from such artifacts. Indeed, the bulk of the literature uses quantum-mechanically based Schmidt REGs as target objects, and radioactive decay is hardly subject to sensory-motor influence!

Violations of blindness. It is important in parapsychology, as in other disciplines, that measurement of certain variables be done by a person who is blind as to the values of other variables. For instance, if an experimenter who is rating a person’s extraversion based on clinical observations during an interview already knows that person’s score on an ESP test, the experimenter may consciously or unconsciously tend to give higher extraversion ratings to persons with high ESP scores, thus artifactually producing another confirmation of the generally-obtained positive relation between ESP and extraversion.

It is also important that anyone interacting with a subject prior to his description of an ESP target be blind as to the identity of that target. For instance, Palmer and Lieberman (1976) report an “out-of-body experience” study in which an experimenter who knew the identity of the target was in the room with the subject. This experimenter might have been able to exert some sort of subtle influence to predispose the subject to give a description corresponding to the target, without even being aware of doing so.

Persons physically interacting with PK target materials should also be kept blind as to the target. For instance, in an experiment reported by Nash (1982), the experimenter placing fungus samples in an incubator was not blind as to which funguses were to be psychokinetically inhibited and accelerated and which were controls. Thus the experimenter might have been able to influence the outcomes in the desired direction by selective placement of the funguses in the incubator. Obviously, in such an experiment it is important that the person measuring the fungus growth also be blind as to the experimental condition.

An artifactual correlation between ESP scores and a personality trait may result if the subject has knowledge of his ESP score prior to taking the personality test, as the subject’s responses to the latter test may be biased by his knowledge of his ESP scores. That this possibility is a legitimate concern is evidenced by results reported by Palmer and Lieberman (1975). They found a positive relationship between imagery score and ESP score for subjects who took the Betts Imagery Scale after getting feedback about their ESP scores, but not for subjects who took the imagery test prior to the ESP test. On the basis of these results, they conclude that a previously reported positive relationship between imagery and ESP reported by Palmer and Vassar (1974) was probably due to the same artifact.

A “meta-analysis” of experiments exploring the relationship between ESP ability and extraversion by Honorton, Ferrari and Bem (1990) strongly suggests that much of the evidence for a positive relationship between these two variables might be due to a similar artifact. In particular, these authors found no evidence for a positive relationship between ESP and extraversion in forced-choice studies in which the extraversion measurement preceded the ESP test. The evidence for a positive relationship in forced-choice experiments thus appears to be an artifact of the subjects’ knowledge of their ESP scores when responding to the extraversion test. In 11 of 14 free-response studies, the extraversion test preceded the ESP test, and a positive relationship between the two variables was still found in these studies, which is apparently not due to the artifact in question. In an attempt to test whether this artifact is indeed a problem, Krishna and Rao (1991) gave subjects false feedback as to their ESP scores in order to see if such feedback would indeed bias their responses on a personality test. They did not find a significant difference in extraversion scores between subjects given positive and negative false feedback, in opposition to the bias hypothesis. It should, however, be noted that there was no significant relation between extraversion and (real) ESP scores in this study.

As a general principle, recording or measurement of experimental conditions and targets and of the related psi effects should be carried out under conditions of mutual blindness in both PK and ESP experiments.

Due to the defensive stance parapsychologists must take in light of criticism by a community of hostile skeptics, parapsychologists tend if anything to be more careful about blindness violations than most scientists. In his analysis of 54 parapsychological experiments, Akers (1984) cited nine for having flaws involving the nonblind measurement of personality variables. As Akers’ sample included a large number of studies relating personality variables to ESP, this is probably an overestimate of the rate at which blindness violations are committed in parapsychology. In a review of the more recent scientific literature, Sheldrake (1998a) found that blinding procedures were used more often by parapsychologists than by researchers in other fields.

Nonrandom target selection. In order to eliminate the hypothesis of chance coincidence, it is important that the targets in parapsychological experiments be selected randomly. It will not do, for instance to have a person select a target by thinking of a number between 1 and 10, as certain numbers are more likely to pop into his mind (and the guesser’s) than are others, raising the probability to a correct guess above the chance level of 0.1. For instance, in an early experiment by Tyrrell (1936), the subject had to guess which of five targets lamps would be lit on each trial. In the initial stages of the research, Tyrrell himself selected the targets, attempting to be “random” but not employing any formal randomization procedure. Thus, the subject could quickly learn Tyrrell’s favorite targets and guess these more frequently than the others. She could also increase her score by not calling lamps that had just been lit or by calling the same lamp again if it had not been lit on the previous trial (as people attempting to produce a random sequence of targets tend to avoid repetitions, thus producing sequential dependencies in the target series). Therefore, the subject could expect to do much better than the twenty percent hit rate expected by chance. This sequential dependency problem was pointed out to Tyrrell by G. W. Fisk, and the experiment was continued using random numbers to select the targets. Under these conditions, the subject was still able to achieve a highly significant score.

Blackmore (1984c) criticized a study by Spinelli (1978) on the basis that the children who served as agents in Spinelli’s experiment were simply allowed to choose which picture they wished to “send” to the percipient via ESP. This is Tyrrell’s initial flaw in a more modern guise.

Randomization of target selection has also been a problem in some free-response experiments involving the drawing of pictures, including the classic initial experiments reported by Upton Sinclair (1930/1962) and Rene Warcollier (1948/1964). In more recent times, this problem has been exemplified in the picture-drawing experiments with the psychic Uri Geller at Stanford Research Institute conducted by Targ and Puthoff (1977). In these experiments, the target was generated by opening a dictionary “at random” and drawing the first “drawable” word on the page. (Which words are to be considered “drawable” is an arbitrary decision that may itself disqualify the target selection process as a truly random procedure.) The investigators allowed considerable latitude in the interpretations of the word in the target drawing. For instance, the one target selection that Targ and Puthoff describe in detail involved the word “farmer.” In the target drawing, the farmer is equipped not only with a pitchfork, but also with horns and an elaborate tail; in addition, the label “Devil” appears above the figure. If Geller and the target preparer had just been viewing The Exorcist, for instance, that common experience could account for both the target drawing and Geller’s religiously oriented response. This amount of latitude in interpretation destroys any claim to random selection of the target.

Nonrandom target selection has also plagued some recent remote viewing research. For instance, in the “volitional mode” technique employed in the “remote perception” studies by Jahn and Dunne (1987) and conducted at the Princeton Engineering Anomalies Research (PEAR) center, the target to be visited was merely selected by the agent rather than chosen randomly from a prepared target pool (no such pool even existed on these trials). Also, in Jahn and Dunne’s experiments, the agent was frequently allowed to wander from the assigned target area, to take photographs and to write descriptions of the target area. As these materials were provided to the judges, they essentially constitute the target. The agent was therefore free to construct a target that might match the verbal transcript likely to be provided by the percipient on that particular day. In 211 of Jahn and Dunne’s 336 formal “remote perception” trials the agent was free to choose or construct the target in a nonrandom manner.

In the most recent work on “remote perception” conducted by the PEAR team, as described in Dunne and Jahn (2003), the procedure is to have the agent and percipient check off a list of “descriptors” regarding the target location (e.g., whether the scene is “confined or expansive,” whether it is “noisy or quiet,” whether it involves the presence of water, is indoors or outdoors, etc.). The degree of match between the descriptors checked (or rated) by the percipient and agent are then compared to the statistical distribution of matches between the percipient’s descriptor list and those provided for other locations on other trials. However, the same problems exist as for the pictures taken by the agent in Dunne’s early research. The location is not the target, rather the target is the agent’s description of the location. Common thought processes and common experiences could thus lead the agent and the percipient to provide similar descriptions. For instance, if they are in glum mood, they may both rate the location as “confined and quiet” rather than as “expansive and noisy.”

The procedure of comparing descriptor lists for the target to the descriptor lists of the targets used on other trials also runs into the problem that the percipient may consciously or subconsciously avoid given descriptions that correspond to previously seen target locations.

There is a very clean procedure that would avoid all of these problems. Create a pool of targets for each trial prior to the trial. Have the target descriptions prepared before the trials. Then choose a target location randomly from the pool, and compare the correspondence of percipient’s description with the chosen target location against that with the alternative locations that were not selected. This procedure was pointed out to the PEAR team by Hansen, Utts and Markwick (1992) in their extensive critique of the PEAR remote perception research, and more recently as applied to the descriptor lists by Stokes (2004).

In his analysis of a sample of 54 ESP experiments, Akers (1984) concluded that target randomization was informally done (e.g., by hand-shuffling of cards) or inadequately described in about half of the studies. Some of these randomization flaws may be debatable (e.g., the fact that an “untrained agent” prepared targets from a random number table). He also found that randomness tests were not conducted on the apparatus used in 10 out of a sample of 27 psychokinesis experiments. He further notes that control runs are done infrequently in PK experiments and calls for an increased use of control runs as well as for more frequent randomness checks on the equipment to be performed during the experiment and within the actual experimental environment.

Statistical Controversies

The use of improper statistical tests. In parapsychology as in any other science, researchers do occasionally apply inappropriate statistical tests to their data or commit other statistical errors. As universal perfection is not likely to be achieved in any field of study, the rate at which such errors occur is not likely to be reduced to zero. In recent years, however, parapsychologists have been fairly meticulous about ensuring that the statistical tests they perform are appropriate to their data. They have been held to higher standards than the practitioners of other disciplines due to unrelenting and vigorous attacks by critics. Thus, the statistical practices of parapsychologists tend, if anything, to be a little more rigorous than those of many other scientific disciplines. Some researchers will always be more competent than others, and some errors are bound to occur from time to time.

Attacks on probability theory itself. There have been some critics who have gone so far as to suggest that basic tenets of probability theory should be given up in order to avoid swallowing the bitter pill of the existence of psi. Among them are George Spencer Brown (1953, 1955, 1957), J. Barnard Gilmore (1989), and James Alcock (1981). This would be more than throwing out the baby with the bathwater. The whole family tree would be getting the toss, because virtually all branches of science reply on statistics and probability theory to reach and justify their conclusions.

In this context, it should be noted that there are other writers, such as Arthur Koestler and Alister Hardy, who have also argued for the existence of basic flaws in probability theory (e.g., Hardy, Harvie & Koestler, 1975; Koestler, 1978). They differ from the critics mentioned above primarily in terms of motivation. They see such flaws as supporting the theory of synchronicity or meaningful coincidences promulgated by Carl Jung. (Jung’s theory of synchronicity will be taken up again in the next chapter.)

Specifically, Koestler argued that the “law of large numbers” constitutes a paradox in probability theory. This law asserts, for instance, that if an unbiased coin is tossed a large number of times, the obtained proportion of heads will be very nearly equal to the value of 1/2 that would be expected by chance. Koestler contended that this requires an “acausal connection” or conspiracy among parts of the series such that an initial run of, say, heads will be balanced by a later run of tails. However, the law of large numbers is mathematically derived from the precise assumption that no such conspiracy exists, that is to say, that the outcome of one trial has no effect on the outcome of any other. Thus there is no need for, and indeed no room for, Koestler’s proposed mysterious acausal connections. After all, the fact that the laws of probability theory are obeyed in the law of large numbers hardly constitutes evidence against those laws!

There have been several critics who have objected to parapsychologists’ comparison of their data to theoretical distributions derived from the theory of probability rather than to empirical control groups (e.g., Calkins, 1980; Alcock, 1981; Girden, 1978; Moss & Butler, 1978; Gilmore, 1989; Hyman, 1996). In this context, the skeptic James Alcock has contended that the comparison of experimental with control groups “makes artifact only a minor problem” in “normal” science (Alcock, 1984, p. 317). However, in a crude (and methodologically unacceptable) PK experiment in which coins are tossed by hand, motor skills could be used to increase the number of heads in the “heads” condition and the number of tails in the “tails” condition. Thus, the use of a control group would hardly eliminate all artifacts in this case, as it would not in the case of a psychology experiment in which violations of blindness could lead a researcher to treat subjects in his experimental and control groups differently, perhaps in the process consciously or unconsciously using subtle means to make them behave in such a manner as to confirm his hypothesis. Use of a control group does not automatically eliminate all sources of experimental error, as Alcock seems to assume.

Some critics have even chastised parapsychologists for their use of control groups. The well-known critic C. E. M. Hansel (1980) has criticized Helmut Schmidt for using a high-aim condition (in which subjects try to guess which of four lamps will be lit) and a low-aim condition (in which subjects try to guess a lamp that will not light) in the second experiment in his investigation into the precognition of a quantum process (Schmidt, 1969), which was discussed earlier in this chapter. In Schmidt’s experiment, the subject indicated his or her response by pushing a button on the machine, and the guess and target for each trial as well as the type of condition (high- or low-aim) were recorded on tape. Schmidt’s low-aim condition thus served as an excellent control for his high-aim condition (and would guard against some types of possible machine artifacts). Hansel makes the point that the overall results were not significantly different from chance. However, when the results are scored in the intended direction (that is, high- or low-aim), they are highly significant. Hansel also recommends that different machines be used for the high- and low-aim conditions. However, as the machines could each be mechanically biased in the desired directions, this would negate the advantage of using the same machine as a control for itself.

Thus, parapsychologists are caught in a double bind, as critics have castigated them both for their employment of a control condition and for their failure to employ one. They use theoretical distributions to attack parapsychological work (as Hansel did in attacking Schmidt) but disapprove of parapsychologists’ use of the same distribution (Alcock, Hyman). Hansel also uses a theoretical distribution to attack results showing differences in psi scoring rates between groups of subjects (such as extraverts and introverts, for example) when he notes that, although the groups differ from one another (one scoring above chance and one below), the overall score does not differ significantly from chance.

It should also be noted that the traditional statistical tests of differences between experimental and control data themselves rely on theoretical distributions (such as the t distribution). In any event, Akers (1984) points out that, in his analysis of a sample of 54 ESP studies, the statistical tests used were directed at a comparison between empirically obtained means in about two-thirds of the cases, so the use of purely theoretical distributions is by no means as rampant as these critics have charged.

Multiple analyses. It is possible for a parapsychologist to conduct so many statistical tests on his data that some of them would be expected to be significant purely by chance. If you try hard enough to find patterns in random data, you will eventually be successful. These patterns will be meaningless, however, as they represent nothing but the fluctuations that would be expected to occur by chance. There are mathematical corrections that may be used to take the number of analyses performed on the data into account, and these are increasingly being used by parapsychologists.

A related problem is that of post hoc analysis. Suppose a researcher glances at her data and happens to notice that female subjects tended to get higher ESP scores in her experiment than did male subjects. If she runs a test to see if this difference is statistically significant, we once again run into the problem of multiple analyses, as we have no way of how many potential patterns of this type might have “caught her eye” had they been present in the data. In this case, it is impossible to correct for the number of analyses performed, as it is unclear how many different patterns could have been noticed by the experimenter. The proper thing to do in this case would be for the researcher to label her finding as “post hoc” and as only providing a suggestion that an effect might be present. The demonstration of the effect’s reality would have to involve further experimental testing to see if the effect occurred again. It is crucial in parapsychology as in all areas of scientific investigation that the hypotheses that are to be tested in an experiment be stated before collecting and examining the data.

In his review of 54 ESP experiments, Akers (1984) cites only two studies for flaws involving multiple analyses. He disputes skeptic Ray Hyman’s contention that 39 of 42 ganzfeld studies suffer from flaws involving multiple analyses (Hyman, 1983), based on the fact that which analysis should be regarded as the primary analysis in these studies could be inferred from the author’s previous practices.

Data selection. If only a portion of the data of a parapsychological experiment is singled out for analysis on an ex post facto basis (e.g., because the hitting rate was particularly high for this portion of the experiment), a spurious psi effect may be generated, as improbable subsequences will exist in any sufficiently long series of random events. Stenger (1990), for instance, claims that data selection took place in the picture-drawing experiments conducted with Uri Geller at Stanford Research Institute, insofar as many unsuccessful trials were never reported or included in the overall analysis. Data selection within a single study has become a less frequent problem in parapsychology over the years (except for the data selection inherent in many post hoc analyses, as discussed above). Akers (1984), for instance, classified only 4 of the 54 ESP studies in his sample as having results that could be attributed to data selection.

One special type of data selection involves what is known as “optional stopping.” This occurs when a researcher is monitoring the data of the experiment and waits until an opportune time to stop the study (“quitting while he is ahead,” so to speak). To avoid this problem, it is important that the length of any study be specified in advance or at least prior to any examination of the data.

Another form of data selection occurs when only significant results are published or selected for analysis, with insignificant studies being ignored. This form of data selection has long been a standard allegation of the critics. Alcock (1981) for instance charges that parapsychologists only tend to publish significant studies, intimating that the overall evidence for psi might be due to this type of data selection. In rebutting Alcock, Stanford (1982) notes that the Parapsychological Association has long had a policy of encouraging the publication of nonsignificant studies, and indeed it does appear that negative results have been published with greater frequency in recent years. To gauge the extent of selective reporting in one specific area of psi research, Blackmore (1980) sent a questionnaire to members of the Parapsychological Association asking about unpublished ganzfeld studies. She uncovered 20 such studies, 37 percent of which were significant, as opposed to 57 percent of published studies. (One of the twenty studies could not be evaluated as to significance.) A statistical test indicated this slight difference in success rate between published and unpublished studies was not statistically significant. Thus, selective reporting does not appear to be a major problem in this area of research. Also, the high significance levels obtained in some parapsychological studies argue against a data selection explanation for those particular results. If an experimenter obtains an ESP scoring rate in his study that would only be expected to occur in one in a billion such studies by chance, we may conclude that the result is not due to data selection, as it would be absurd to assume that a billion such studies have been conducted and have gone unreported.

Also, using the technique of meta-analysis (to be discussed below) it is possible to estimate the number of nonsignificant studies that must have gone unreported in order that a particular line of experimentation might be attributed to data selection. Often the number of unpublished studies that must be postulated is unreasonably high, as we shall see.

The Problem of Fraud. One reason for critics’ reluctance to accept the experimental findings of parapsychologists is the possibility that they may be the result of fraud, either by subjects or by the investigators themselves. We will discuss each in turn, beginning with the former.

Fraud by Subjects. In certain types of parapsychological experiments, fraud by subjects is a possibility that must be carefully guarded against. This is especially true of experiments employing “special subjects,” a term used to designate persons with a reputation for having extraordinary paranormal abilities and whose livelihood often depends on the exhibition of such abilities. Subject fraud is much less of a concern in situations in which a group of supposedly average citizens participates in an experiment initiated and designed by the parapsychological investigator, although even in this type of experiment it is prudent to take precautions against the possibility of deceit by subjects.

Several instances of subject fraud occurred in the very early investigations of “mind-reading” teams. Hansel (1966, 1980) describes how such fraud occurred in the Society for Psychical Research’s (S.P.R.s) investigations of the mind-reading team of Douglas Blackburn and G. A. Smith, which were conducted over the time period from 1882 to 1894. Blackburn served as the telepathic agent in a series of apparently successful picture-drawing experiments in which Smith was the receiver. Blackburn confessed in 1908 that these results had been due to fraud. He as agent had transferred the target drawings onto cigarette paper and then was able to get this paper to Smith when Smith reached for his pencil. Smith himself never admitted to his involvement in the fraud.

The subjects in another one of the S.P.R.’s investigations, the Creery sisters, likewise admitted six years after the investigation that they had used auditory and visual codes to transmit the identity of playing cards to one another. (Gurney, 1888-1889). Similarly, Hansel (1966, 1980) describes how two Welsh schoolboys, Glyn and Ieuan Jones, used signals involving coughs and leg movements to transmit the identity of cards from one to another during experiments run by the British parapsychologist S. G. Soal and his coworkers. The alleged telepathic effects ceased when the door joining the boys’ rooms was closed, illustrating the importance of eliminating the possible use of sensory codes by members of alleged mind-reading teams. This is of course merely a special case of the elimination of sensory cues, which is a necessary feature of any properly designed parapsychological experiment.

Hansel (1966, 1980) has also charged that the results of the Pearce-Pratt experiment with Zener cards, discussed earlier in the chapter, were due to subject fraud. Specifically, Hansel proposes an implausible scenario that involved Pearce’s sneaking out of the library and peering through the transom at the top of the door to Pratt’s room to learn the identity of some of the cards as Pratt recorded the order of the target deck. In presenting his hypothesis, Hansel distorted the architectural plan of the building in which Pratt was located, as has been pointed out by Stevenson (1967). Also, the rather obvious possibility of detection would surely have acted as a strong deterrent to such a scheme. Nevertheless, the fact that Pearce’s whereabouts were not well monitored renders this experiment less than definitive. Irwin (1994) has argued that declines in Pearce’s scoring rates over the course of each session that were discovered after the experiment was concluded, are further evidence for the authenticity of the data, as such decline effects are commonly observed in ESP scoring. A determined skeptic could of course argue that these decline effects may be due to subjects’ tendency to cheat early in a session and then “rest their case.”

Subject fraud becomes a central concern in experiments on “macro-PK.” Macro-PK experiments involve the production of macrophysical effects, as opposed to subtle influences on random event generators that may only be detectable through statistical analysis. Such macrophysical effects include the bending of metal specimens, the levitation or anomalous movement of macrophysical (that is, nonmicroscopic) objects, the starting or stopping of watches, and the apparently paranormal production of photographs. Macro-PK experiments typically but not always involve “special subjects” (persons with prior reputations regarding their ability to manifest extraordinary physical effects of an apparently paranormal nature). It is of course quite possible that such subjects may use fraudulent means to simulate paranormal effects, and it is thus very important that every precaution be taken to eliminate the possibility of such fraud. Unfortunately, macro-PK experiments have frequently lacked the kind of rigorous conditions that would enable the fraud hypothesis to be definitively ruled out. For instance, in the research on metal-bending conducted by physicist John Hasted (1981), the subjects were allowed physical contact with the target object. One subject, Masuaki Kiyota, was even allowed to carry a spoon he was attempting to bend psychokinetically around with him in his pocket. Such lax conditions allow the subjects the opportunity to bend the metal specimens through covert muscular action. Hasted eschewed the use of a video camera, as he felt that such a device with its implied mistrust would decrease his rapport with the subject.

Martin Gardner, the noted writer of popular books on mathematics and a staunch critic of parapsychology, has criticized macro-PK researchers for their failure to employ traps such as a one-way mirror to detect cheating (Gardner, 1986). That such devices may be effective in detecting fraud is borne out by the research of Pamplin and Collins (1975), who used a one-way mirror to observe several young subjects using fraudulent means to bend metal specimens.

James Randi, who is a professional magician as well as being a prominent critic of psi research, sent two young magicians to a parapsychological laboratory in St. Louis to pose as special macro-PK subjects in an operation Randi (1983a, 1983b, 1986) dubbed “Project Alpha.” Randi found that the researchers had ignored his own advice as to what precautions should be taken against subject fraud. Objects were marked with tags that could be switched. The subjects were allowed to handle sealed envelopes containing ESP targets when they were alone and unobserved. They were able to remove the targets and return them, replacing the staples on the envelope. They were able to remove metal specimens and other target objects from containers supposedly designed to prevent their removal. They were also able to introduce a gap in the sealing of a bell jar, allowing movement of a rotor inside to be produced through air puffs.

Another area in which subject fraud has frequently been charged is that of psychic photography or “thoughtography,” in which a psychic is allegedly able to impress his mental images directly onto film, often through a sealed camera or a camera in which the lens has been removed. Several investigations of thoughtography have been reported by Jule Eisenbud and his coworkers (e.g., Eisenbud, 1967, 1977a, 1977b, 1982b; Eisenbud, Pratt & Stevenson, 1981). Eisenbud’s most prominent subject was Ted Serios, an alcoholic who was consistently intoxicated throughout the experiments and often insisted that the other people present at the experimental sessions drink with him. A party atmosphere often prevailed, with many people milling about. Needless to say, this did not make for the best observational conditions. Serios used a “gizmo,” a cylindrical device that he held up to the camera when the pictures were taken. Many skeptics have contended that this gizmo provided Serios an opportunity to engage in sleight-of-hand, such as by secreting a photographic transparency and lens in the gizmo. A gizmo-like device was also used by another of Eisenbud’s subjects, stuntman Willie Schwanholz. It is also unclear in many of the experiments how closely guarded the camera and filmpacks were during the sometimes lengthy proceedings. Another subject, the aforementioned Masuaki Kiyota, was allowed to take the camera in the room by himself and to unload the film by himself.

Thus, it is clear that investigations of special macro-PK subjects have all too often fallen short of ideal standards of rigor and sometimes lack necessary precautions against subject fraud. It is important that the control of experiments reside with the experimenter; the subject simply must not be allowed to dictate conditions to the extent that all precautions and safeguards are abrogated.

Akers (1984) cited 12 of his 54 experiments for allowing the possibility of subject fraud. Again, subject fraud is not usually so great a concern in experiments involving unselected subjects as it is in experiments with special subjects.

Experimenter fraud. There remains the possibility that the experimenters themselves might engage in fraud. Certainly, some parapsychological researchers have been caught red-handed in such activity. Experimental studies of telepathy by S. G. Soal (Soal & Bateman, 1954), long regarded as among the studies providing the most impressive evidence for ESP, have been demonstrated through statistical analyses by Scott and Haskell (1974) and a computer analysis of Soal’s target series by Markwick (1978) to be due to a crude form of fraudulent alteration of the experimental data by Soal.

The second major scandal involving investigator fraud in parapsychology involved Walter J. Levy, a young medical school graduate, who had recently been appointed as director of J. B. Rhine’s research institute and whom many people regarded as Rhine’s heir-apparent. When I joined Rhine’s research staff in 1974 shortly after completing my own doctorate, one of the primary things that lured me to the lab was Levy’s active and hugely successful program in investigating the psi powers of animals, including the precognitive abilities of jirds (a fancy name for what are essentially gerbils) and the psychokinetic powers of rats and chicken embryos. The hapless little jirds had to use their ESP to avoid getting zapped by an electrical shock by moving to the part of their cage or exercise wheel that was would be spared the electricity. The chicken embryos (still of course encased in their eggs) had to use their PK powers to get a random event generator (REG) to turn on a light to warm them up in lieu of a hen. The rats had to use their PK to convince an REG to send them a jolt in the pleasure center of their brains. It was this last experiment that proved to be Levy’s undoing. His fellow researchers noticed him frequently puttering around the equipment when experiments were in progress and there would normally be no reason to be interacting with the experimental apparatus. To see if he were up to some monkey business, they secretly wired up the computer to make a duplicate record of the output of the REG. This second record showed the output of the REG to be perfectly random, while Levy’s official record showed that the rat was getting jolt after jolt to his pleasure center and obtaining truly prodigious PK scores in the process. It transpired that Levy was disconnecting the wire that recorded the trials on which the rat was unsuccessful and shorting it out on the side of the computer for brief periods of time, thus making it seem as though the rat was achieving remarkable PK success. Confronted with the evidence of his crimes, Levy was forced to resign as the director of Rhine’s lab and returned to the practice of medicine.

Some critics, including Hansel (1980), have alleged fraud in a great many other investigations. Hansel provides lengthy analyses of experiments showing how significant results could have been produced by fraud on the part of one or more members of the investigating team. His postulation of honesty on the part of some of the participants increases the fun of his analyses, but of course any significant result could be the result of collusion on the part of everyone concerned. Occasionally, Hansel goes somewhat overboard, such as when he fishes through the data of the famous Pratt-Woodruff experiment (Pratt & Woodruff, 1939) until he finds an anomalous pattern in the data and then proceeds to use that pattern as evidence for a fraud hypothesis—a hypothesis that was itself undoubtedly constructed on the basis of the pattern in an ex post facto manner (although Hansel does not present it that way). Such flagrantly circular reasoning and unwarranted inferences from post hoc analyses are no more appropriate when they are employed by a skeptic like Hansel than when they are employed by the parapsychologists he criticizes.

It should be borne in mind that parapsychology is by no means unique in having had investigators exposed in fraudulent activity. Few areas of science have escaped the problem of experimenter fraud, as is evident to anyone reading the pages of Nature and Science over the past few decades. For a good discussion of the problem of fraud in more orthodox areas of science, see Broad and Wade (1982) and Kohn (1986). As these authors note, even such great scientists as Galileo, Newton and Mendel apparently succumbed to the temptation to fudge their data from time to time. It must stand to parapsychology’s credit that the major instances of fraud in parapsychology have been unearthed by the parapsychologists themselves. Like most areas of science, parapsychology is self-policing and most parapsychologists wish simply to get at the truth underlying ostensible psi phenomena rather than having any dogmatic pro-paranormal ax to grind. Even archskeptic Martin Gardner has stated that he believes that such cheating by experimenters is not much more of a problem in parapsychology than it is in more orthodox areas of science (Gardner, 1986). However, in view of the fact that most investigators are not able to obtain reliable and replicable experimental evidence for psi, the possibility that the most striking evidence for psi is due to experimenter fraud should not be completely discounted. If such is the case, parapsychology would stand head and shoulders above the typical run-of-the-mill case of experimenter fraud in terms of the large number of investigators and studies involved. It would be fraud on a scale that is unprecedented in the history of science. But then again, the phenomena that are claimed to occur by parapsychologists constitute an anomaly with implications that are also unprecedented in terms of the magnitude of the revision in existing scientific theories and indeed in the basic world view of contemporary scientists that would be required in order to accommodate them, as will become apparent in the next chapter. It may not be surprising that most orthodox scientists opt for an explanation of psi in terms of widespread experimenter fraud and incompetence, in that the hypothesis of experimenter malfeasance/incompetence constitutes the lesser of the two horns of the dilemma that faces them in terms of sheer incompatibility with their existing worldview.

One method of minimizing the possibility of experimenter fraud, which was endorsed by J. B. Rhine, is to run experiments in such a way that the integrity of the procedures and data is under the control of several observers. Experiments can (and have been) designed in such a way that no one member of an investigating team could fraudulently generate a significant result. In such experiments, a determined skeptic would have to postulate a conspiracy among all the members of the investigating team, which is of course much less plausible than the allegation of fraud against a single person.

Psi-mediated experimenter effects. It is certainly true that some investigators seem to be able to obtain significant psi effects on a fairly regular basis, while other investigators (including the author) almost never obtain significant evidence of psi despite years of prodigious effort spent in the laboratory. Experimenter fraud is of course one possible explanation for this state of affairs. Another possibility is a difference in personality traits and social interaction style between so-called “psi-facilitory” and “psi-inhibitory” experimenters. J. B. Rhine, for instance, contended that it takes a great deal of skill, personal warmth and enthusiasm on the part of an experimenter to elicit psi from subjects in a laboratory situation. A third possibility, suggested by Jim Kennedy (a member of the research team that exposed Levy) and Judy Taddonio among others, is that the “psi-facilitatory” experimenters are themselves the real source of psi in their experiments (Kennedy & Taddonio, 1976; Kennedy, 1994, 1995). In their view, such experimenters might subconsciously use their own psi powers to generate significant results, such as “PK-ing” extraverts to score above chance and introverts to score below chance in an ESP test in order to confirm a pet hypothesis, or by using their precognitive powers to select an appropriate entry into a random number table to ensure that the targets will coincide with the subjects’ guesses. For instance, Helmut Schmidt (1970) once ran an experiment to see if cockroaches could use their psychokinetic powers to avoid getting electrical shocks. Instead, he found that they got more shocks than they would have been expected to by chance. Does this mean that cockroaches are masochistic? Not necessarily. It seems that Schmidt may have rather enjoyed the sight of the cockroaches popping up from their grid floor like so much popcorn whenever a shock was delivered. Thus, he may have been the real source of the PK effect.

Some studies (e.g. Parker, 1977, and Sargent, 1980) have shown that psi-facilitating experimenters outperform psi-inhibiting experimenters in tests of their own psi ability. There is even evidence that the person who tabulates the data of an experiment after it has been completed may influence the outcome of the experiment (the “checker effect”), possibly through the use of retroactive psychokinesis (see Weiner & Zingrone, 1986, and White, 1976) for a review of such evidence. Some writers, including Millar (1978), have suggested that psi ability may be comparatively rare in the population and that successful experimenters may represent some of the few available “psi sources.” Of course, if these psi-facilitating experimenters are using their own psi abilities to force the data to conform with their own theories, it would be wise to be cautious about such experimental “confirmations” of hypotheses.

The repeatability problem. One of the reasons why parapsychology has not been embraced by the scientific establishment is that many or most researchers have been unable to obtain reliable evidence of psi. In the critic’s mind, this raises the suspicion that the evidence for psi may be due to undetected methodological errors or possibly even fraud on the part of the experimenters. Parapsychologists have been quick to point out that many naturally occurring phenomena, such as ball lightning and meteorite landings, are not reproducible on demand but are nonetheless real. (Interestingly, both ball lightning and meteorites were initially disputed by many scientists in much the same way as current scientists reject psi phenomena or, for that matter, cryptozoological phenomena such as Bigfoot sightings).

Some psi proponents suggest that the mere presence of a skeptic may inhibit the manifestation of psi, in a kind of reverse psi-mediated experimenter effect. The noted physicist A. J. Leggett (1987b) has described the principle of the repeatability of observations in science as the “no ghost” hypothesis (that is, that scientific results are not dependent on the presence or absence of certain observers or on the mental state of the observer). This is a not-so-veiled jab at parapsychology. David Ray Griffin (1988a), on the other hand, contends that the repeatability problems of parapsychology are due to the fact that “purposive elements” do not behave with same regularity as the billiard balls of Newtonian physics. Indeed, results in the field of orthodox psychology are rarely as repeatable as are well-established findings in physics.

Beloff (1984) has concluded that “strict repeatability” (“repeatability by any competent observer who adopts the prescribed procedures”) is what is needed to convince the critics of the validity of psi. Beloff contends that one form of repeatable observation would be the examination of a film of a strong macro-PK performance or of a “permanent paranormal object” (PPO), such as two interlocked rings of differing composition (e.g., different types of wood). Beloff notes that precisely the latter form of evidence was allegedly produced by the medium Margery in the 1930s, although the rings later became mysteriously unlocked. Possibly in response to Beloff’s call, Wälti (1990) reported the existence of a PPO, consisting of two interlocked frames of paper and aluminum that was presented to Wälti by the psychic Silvio Meyer. Archskeptic Martin Gardner has, however, discussed methods suggested to him by his readers whereby Meyer might have fraudulently produced the object. One of the methods involved manufacturing the paper square around the aluminum frame (Gardner, 1991).

Meta-analysis. Recently a statistical technique known as “meta-analysis” has played an extraordinarily active role both in parapsychological research and debates about the reality of parapsychological effects. A meta-analysis consists of a statistical examination of a group of experimental studies, sometimes consisting of an entire line of research, in order to determine the strength, direction and statistical significance of any overall effects as well as the possible influence of moderator variables (e.g., barometric pressure or the sex of the experimenter) on the size and direction of the effect in question.

One of the earliest uses of meta-analytic techniques was in the now classic debate over the significance and replicability of the ganzfeld line of research between the parapsychologist Charles Honorton and the critic Ray Hyman (Honorton, 1985; Hyman, 1985). Since that time, meta-analyses of a great many lines of parapsychological research have been reported, including forced-choice precognition experiments (Honorton & Ferrari, 1989), free-response ESP experiments (Milton, 1993), bio-PK studies (Braud, 1985; Braud, Schlitz & Schmidt, 1990; Walach & Schmidt, 2005), PK experiments with REGs (Radin, May & Thomson, 1986; Steinkamp, Boller and Bösch, 2002; Pallikari, 2004; Jahn & Dunne, 2005; Ehm, 2005), PK experiments with dice (Radin & Ferrari, 1991), and experiments investigating the effects of hypnosis on ESP (Schechter, 1984; Stanford & Stein, 1994), studies with subjects recruited via the mass media (Milton, 1994), the relationship between ESP scores and the psychological trait of defensiveness (Watt, 1991; Haraldsson & Houtkooper, 1994), the relationship between ESP and extraversion (Honorton, Ferrari & Bem, 1990), and detection of being watched over a closed television circuit monitor (Schmidt, Schneider, Utts & Wallach, 2004), to name just a few.

Among other things, meta-analysis offers a means of deciding whether a given line of research has produced overall results that differ significantly from what would be expected by chance. For instance, in a vote-counting type of meta-analysis (that has now gone out of fashion), Radin, May and Thomson (1986) analyzed 332 PK experiments with random event generators that had been published between 1969 and 1984, finding 71 of them to have produced statistically significant evidence of psi. They compute the probability of this happening by chance to be less than 5.4 x 10-43.

Meta-analysis also provides a means for answering the charge of data selection. For instance Radin, May and Thomson compute that, in order to reduce the overall REG-PK effect to nonsignificance, it would have to be assumed that 7,778 nonsignificant studies had been conducted but not reported..

Similarly, Honorton (1985) performed a meta-analysis of 28 ganzfeld studies and concluded that, in order to reduce the cumulated ESP effect to nonsignificance, it would have to be assumed that a “filedrawer” containing 423 nonsignficant and unpublished studies would have to exist into order to reduce to overall ESP effect to nonsignificance.

The traditional “filedrawer” computation in meta-analysis assumes that the overall chance distribution is replicated in the unpublished studies in the filedrawer. Scargle (2000) has noted that this assumption may be invalid, in that if all the positive, statistically-significant studies are published, the file drawer will consist of studies that do not fall in the upper “tail” of the distribution. Hence, the average psi effect in the studies in the filedrawer will tend to be slightly negative rather than zero as assumed in the traditional filedrawer calculation. Stokes (2001) performed a Monte Carlo analysis and determined that if one takes 90 ganzfeld “pseudoexperiments” simulated using random numbers to simulate chance performance and then selects the 28 highest experiments, the odds against chance for the 28 highest experiments will be more than one billion to one (as found in Honorton’s meta-analysis). Thus, one would only have to assume the existence of 62 unpublished studies, rather than 423 studies as computed by Honorton (1985). While it stretches the mind to think that there are 423 unpublished ganzfeld studies given the small size of the parapsychological research community, it may not be so unthinkable that there are 62 unpublished studies.

As a way around the filedrawer problem, Bösch (2004) suggested that psi experiments be preregistered before the data collection process. Kennedy (2004) proposes that parapsychology adopt the standard used in the pharmaceutical industry (in which Kennedy works) and that research protocols be developed and registered and calculations of statistical power (the ability to detect an effect of the predicted size) be performed prior to the data collection process. Kennedy questions the value of post hoc meta-analyses in that meta-analyses may be manipulated to produce a desired outcome (such as when a judgment of study quality is made after the results of the study are known.)

Statistical calculations can go a long way toward settling the question of whether the significant results reported in a line of research could have arisen by chance or as a result of data selection. There are, however, other ways significant effects could arise and still not be due to psi. Procedural errors and fraud are two possibilities. What might satisfy the critics would be something approaching repeatability upon demand. If virtually anyone could produce psi effects under conditions he or she found acceptable, there would certainly no longer be much debate about their reality. Perhaps, if a certain minimal proportion of all investigators could obtain evidence of psi, then the critics would be satisfied. Or possibly, if certain individual critics obtained evidence for ESP, the battle for the acceptance of psi would be won. Replication may well be at heart a political process rather than an issue that can be decided by statistical analysis.

Conclusions

After the review of the evidence for psi in these past two chapters, the reader undoubtedly finds himself or herself in the position of a spectator in a 125 year long prize fight. The skeptics have delivered a few good blows, perhaps even a few knockdowns. They are likely ahead on points. Both sides are glassy-eyed. But the parapsychologists sit in their corner, apparently more than ready for yet another round. Nothing is certain and the skeptics’ chin still reels the invisible blow that knocked my student’s father off his park bench and other phenomena that cannot be so easily explained by normal processes.

The existing evidence does not compel the conclusion that psi phenomena exist. The determined skeptic who wishes to ascribe all the experimental evidence to a combination of experimenter incompetence, methodological errors and outright fraud can rest easy knowing that, given the poor replicability of psi results, his or her position will not easily refuted. However, to attribute the existing evidence for psi is to such factors is to postulate a level of experimenter malfeasance/incompetence that is unparalleled in the history of science.

Spontaneous phenomena may also be explained away in terms of memory distortions, embellishments, runaway fantasies, coincidence, psychosis, and outright falsification and collusion on the part of the witnesses involved. Again the skeptic can rest easy, knowing that such attributions cannot easily be disproved. The case for psi has not been conclusively established. Conclusive proof will likely await the development of repeatable means of eliciting psi.

Like many self-professed skeptics I have conversed with, I find the evidence from spontaneous cases perhaps the most convincing.

My graduate adviser in psychology at the University of Michigan, aware of my dismayingly growing interest in psi research in the early 1970s, stated that he didn’t dispute the reality of psi phenomena but questioned their importance. Perhaps that is because the major implications of psi phenomena lie in the area of our understanding of spacetime and the mind’s role in the physical universe, normally more the concern of physicists and philosophers rather the falling within the narrowly focused world of the experimental psychologists that constituted my social milieu in graduate school.

If psi phenomena exist, their implications for our fundamental understanding of the nature of reality are profound. These implications form the subject of the next chapter.

It should be stressed, however, that the nonexistence of psi would not alter in any important way the core conclusions put forth in these pages regarding the fundamental nature of the conscious self, its interactions with the physical body, its likely central role in the causation of physical phenomena, and its likely survival of the death of the physical body as outlined in Chapter 0 and in later portions of this book.

Previous: 3. The Evidence for Psi: Spontaneous Phenomena

Up: Consciousness and the Physical World

Next: 5. The Implications of Psi