SPECIAL ARTICLE |
Pregnancy and Perinatology Branch Center for Developmental Biology and Perinatal Medicine, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| ABSTRACT |
|---|
|
|
|---|
Key Words: ethics history of medicine history of statistics neonatal intensive care society and medicine
| WILLIAM SEALY GOSSET: THE "STUDENT" WHO DEVELOPED THE "t" TEST |
|---|
|
|
|---|
To brew a perfect beer, one had to mix exact amounts of yeast to the continuously fermenting barley; too little led to incomplete fermentation, and too much led to a bitter taste. Ambient temperature was also an unpredictable variable. Gosset's first task was to count the yeast colonies, for which he learned to use the newly developed hemacytometer. However, he had to overcome the challenge of estimating the quantity of colonies in entire jars based on small samples taken from them. Gosset used his mathematical and statistical skills, not chemistry, to solve this practical problem.
The conceptual basis for Gosset's solution had evolved over 150 years.14 Mathematicians knew that observations were prone to errors. As the measurement errors became smaller with improved technology, the hitherto-unrecognized "random error" became apparent, especially in biological measurements. In 1820, Laplace proposed that random errors (deviations of the observed from the predicted) can be plotted; such plots became known as the "normal" or "Gaussian-distribution" curves.3,4
The British mathematician and philosopher Karl Pearson (18571936) took the concept of distributions a step further.4 He proposed that all experiments provided only pieces of information regarding a larger, immeasurable, original scatter. He showed that measurements themselves, not just the random errors, had probability distribution properties, which could be described by using 4 parameters, namely, the mean, standard deviation, symmetry, and kurtosis. Pearson proposed the term "parameter," which had a Greek root meaning of "almost measurements."4
Pearson held that if one knew the 4 parameters, one could locate the probability that an observed number will be at a certain location in the population scatter. He proposed a family of skewed distribution curves to describe such scatters.14 Pearson was a towering personality. He founded Biometrika, a major statistical journal, the first issue of which appeared in October 1901.
Gosset noticed that his yeast colony counts did not fit any of Pearson's skewed curves but did fit the Poisson model, named for the 18th-century French mathematician Siméon Denise Poisson. In November 1904, Gosset presented a report to the Guinness Board titled "The Application of the Law of Error to the Work of the Brewery."1
Student's First Article
Gosset first met Pearson in 1905, and the two became friends. Because Guinness did not allow its scientists to publish articles, Gosset had to negotiate with the company hierarchy for permission. A permission was granted, provided Gosset used a pseudonym and did not divulge any confidential data. Pearson, too, agreed to these conditions; he published Gosset's first article, "On the Error of Counting With a Hemacytometer," under the pseudonym "Student," in Biometrika in February 1907.5 The article explained how the scatters of colony counts were similar to the exponential limits of binominal distribution.
Developing the t Test
Pearson had strongly argued that only with large samples could one estimate population parameters. Because most researchers cannot obtain large samples, Gosset thought that formal methods ought to be developed by using small samples for estimating population means. He conducted a number of empirical experiments to develop such methods.
In 1 experiment, Gosset prepared 3000 pieces of cardboard, on each of which he wrote 2 sets of data on 3000 "criminals."6 One set of values were heights, and the other values were the lengths of the left middle fingers. Gosset shuffled the cards, drew at random 750 samples of 4 cards each, and computed means and standard deviations of each. Then he obtained the difference between each sample mean and the population mean (n = 3000) and divided the difference by the sample standard deviation to obtain 750 z scores. He plotted the scores as probability functions and discovered that even without any of 4 parameters of Pearson, one could estimate the population mean and the associated error with a degree of certainty.4
Gosset published these results in his second article using the pseudonym "Student" under the title "The Probable Error of a Mean" in Biometrika in March 1908.7 Despite its long, algebraic discourses and mathematical arguments, the article is a classic. It is simple, lucid, and free from jargon, as a few introductory paragraphs from it reveal7:
The usual method of determining the probability that the mean of the population lies within a given distance of the mean of the sample is to assume a normal distribution about the mean of the sample with a standard deviation equal to s/where s is the standard deviation of the sample, and to use the tables of probability.
But as we decrease the number of experiments [sample sizes], the value of the standard deviation found from the sample of experiments becomes itself subject to an increasing error, until judgments used in this way become altogether misleading ...
The aim of the present paper is to determine the point at which we may use the tables of the probability integral in judging of the significance of the mean of a series of experiments, and to furnish alternative tables for use when the numbers of experiments [sample sizes] is too low.
The test that Gosset described in this article became the famous t test. However, for a number of years the article had received little attention until Ronald A. Fisher provided a mathematical proof and showed the practical utility of the t test.1,2,4 Gosset had not called it a t test but used "z" instead to denote the key ratio. Because the convention was to use z for population parameters and t for samples, at Fisher's suggestion Gosset published another set of tables in 1925 for testing the significance of observations from small samples. He used the t ratio: t = z
where, n1 was the number of degrees of freedom. This table became the "table of Student's t distributions" and the test became the "Student's t test."4,8
For 30 years, Gosset wrote a number of articles on statistics, all attempting to solve practical problems encountered in the brewery. Yet the Student's identity remained a secret until his death on October 16, 1937. There were tributes and obituaries in Biometrika,9,10 and some of his friends solicited and obtained a gift from the Guinness company to publish Gosset's collected articles in 1942.8
Gosset was also an avid writer of letters, maintaining regular correspondence with a large circle of friends and scientists. The famous statistician Egon Pearson (Karl Pearson's son) wrote the history of probability statistics and a statistical biography of Gosset based on the latter's correspondence.1,11
| WILLIAM A. SILVERMAN (19172004): A TEACHER AND A STUDENT |
|---|
|
|
|---|
Like Gosset, Silverman was also an avid writer of letters. He wrote to the editors of newspapers, magazines, and medical journals and to a wide circle of students, friends, and scientific colleagues, regardless of whether he knew them personally. Also similar to Gosset, Silverman often wrote anonymously. Since 1977, he began sending clippings and quotations from newspapers, reflections and notes from scientific articles, annotations from court documents, pieces from personal letters, and materials from obscure sources to Pediatrics. These were printed (and continue to be printed) as blurbs at the end of journal articles with the distinctive signature line "Submitted by Student."
I think that these materials may be worthy of study by students of medical history. Analyses of even a select thousand of these may provide a useful perspective on contemporary medicine, society, and ethics as seen through the eyes of a visionary.
Why did Silverman choose anonymity in an era when many people seek publicity? He explained this in the preface to his book, Where's the Evidence? Controversies in Modern Medicine,18 a monograph based on his columns, Fumes From the Spleen (also written under a pseudonym, "Malcontent").19,20 He felt, as did the famous Anglo-American poet W. H. Auden, that an unsigned work forced the reader to respond to the "reasoning, not to the reasoner."18
Perhaps such caution was not necessary. Despite disagreements with some of his views,21 Silverman was universally respected for his integrity and honesty and admired for his intellectual rigor. Like Gosset, Silverman was a quintessential guru and a perpetual student, striving to learn, as much as to teach, the human side of medicine. A picture in his book Retrolental Fibroplasia: A Modern Parable22 depicts Moses Maimonides holding a sign that reads, "Teach thy tongue to say I do not know and though shalt progress." A more fitting epitaph for William Silverman cannot be found.
|
|
|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Address correspondence to Tonse N. K. Raju, MD, DCH, National Institutes of Health-NICHD/PPB, 6100 Executive Blvd, Room 4B03, Bethesda, MD 20878. E-mail: rajut{at}mail.nih.gov
No conflict of interest declared.
PEDIATRICS (ISSN 0031 4005). Published in the public domain by the American Academy of Pediatrics.
| REFERENCES |
|---|
|
|
|---|
Read all P3Rs
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||