Applied Bayesian and classical inference: the case of The Federalist papers
Gespeichert in:
Vorheriger Titel: | Mosteller, Frederick The Federalist |
---|---|
Hauptverfasser: | , |
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
New York, NY [u.a.]
Springer
1984
|
Ausgabe: | 2. ed. |
Schriftenreihe: | Springer series in statistics
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | 1. Aufl. u.d.T.: Mosteller, Frederick: Inference and disputed authorship. 1964 |
Beschreibung: | XXXVII, 303 S. |
ISBN: | 0387909915 3540909915 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV000298011 | ||
003 | DE-604 | ||
005 | 20160512 | ||
007 | t | ||
008 | 870612s1984 |||| 00||| eng d | ||
020 | |a 0387909915 |9 0-387-90991-5 | ||
020 | |a 3540909915 |9 3-540-90991-5 | ||
035 | |a (OCoLC)10726742 | ||
035 | |a (DE-599)BVBBV000298011 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-12 |a DE-384 |a DE-739 |a DE-355 |a DE-824 |a DE-19 |a DE-706 |a DE-83 |a DE-188 | ||
050 | 0 | |a JK155 | |
082 | 0 | |a 342.73/029 |2 19 | |
084 | |a HF 184 |0 (DE-625)48788: |2 rvk | ||
084 | |a QH 233 |0 (DE-625)141548: |2 rvk | ||
084 | |a QH 234 |0 (DE-625)141549: |2 rvk | ||
084 | |a SK 830 |0 (DE-625)143259: |2 rvk | ||
084 | |a SK 840 |0 (DE-625)143261: |2 rvk | ||
100 | 1 | |a Mosteller, Frederick |d 1916-2006 |e Verfasser |0 (DE-588)118940627 |4 aut | |
245 | 1 | 0 | |a Applied Bayesian and classical inference |b the case of The Federalist papers |c Frederick Mostelller ; David L. Wallace |
250 | |a 2. ed. | ||
264 | 1 | |a New York, NY [u.a.] |b Springer |c 1984 | |
300 | |a XXXVII, 303 S. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Springer series in statistics | |
500 | |a 1. Aufl. u.d.T.: Mosteller, Frederick: Inference and disputed authorship. 1964 | ||
600 | 1 | 7 | |a Hamilton, Alexander |d 1757-1804 |0 (DE-588)118545302 |2 gnd |9 rswk-swf |
630 | 0 | 4 | |a Federalist |
630 | 0 | 7 | |a The federalist |0 (DE-588)4202673-8 |2 gnd |9 rswk-swf |
650 | 4 | |a Englisch | |
650 | 4 | |a English language |x Word frequency |v Case studies | |
650 | 4 | |a Mathematical linguistics |v Case studies | |
650 | 0 | 7 | |a Statistik |0 (DE-588)4056995-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Bayes-Entscheidungstheorie |0 (DE-588)4144220-9 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Englisch |0 (DE-588)4014777-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Worthäufigkeit |0 (DE-588)4132191-1 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4522595-3 |a Fallstudiensammlung |2 gnd-content | |
689 | 0 | 0 | |a Hamilton, Alexander |d 1757-1804 |0 (DE-588)118545302 |D p |
689 | 0 | 1 | |a The federalist |0 (DE-588)4202673-8 |D u |
689 | 0 | 2 | |a Worthäufigkeit |0 (DE-588)4132191-1 |D s |
689 | 0 | 3 | |a Statistik |0 (DE-588)4056995-0 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Hamilton, Alexander |d 1757-1804 |0 (DE-588)118545302 |D p |
689 | 1 | 1 | |a The federalist |0 (DE-588)4202673-8 |D u |
689 | 1 | 2 | |a Bayes-Entscheidungstheorie |0 (DE-588)4144220-9 |D s |
689 | 1 | |5 DE-604 | |
689 | 2 | 0 | |a Englisch |0 (DE-588)4014777-0 |D s |
689 | 2 | 1 | |a Worthäufigkeit |0 (DE-588)4132191-1 |D s |
689 | 2 | |5 DE-604 | |
700 | 1 | |a Wallace, David L. |e Verfasser |4 aut | |
780 | 0 | 0 | |i 1. Auflage |a Mosteller, Frederick |t The Federalist |
856 | 4 | 2 | |m HEBIS Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=000181698&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
940 | 1 | |q TUB-nveb | |
999 | |a oai:aleph.bib-bvb.de:BVB01-000181698 |
Datensatz im Suchindex
_version_ | 1804114762753114112 |
---|---|
adam_text | Frederick Mosteller
David L Wallace
Applied Bayesian and
Classical Inference
The Case of The Federalist Papers
2nd Edition of Inference and Disputed Authorship:
The Federalist
Springer-V erlag
New York Berlin Heidelberg Tokyo
Analytic Table of Contents
Chapter 1 The Federalist Papers As a Case Study
1 1 Purpose
To study how Bayesian inference works in a large-scale data analysis,
we chose to try to resolve the problem of the authorship of the disputed
Federalist papers
1 2 The Federalist papers
The Federalist papers were written by Hamilton, Madison, and Jay
Jay s papers are known Of the 77 papers originally published in news
papers, 12 are in dispute between Hamilton and Madison, and 3 may
regarded as joint by them Historians have varied in their attributions
1 3 Early work
Frederick Williams and Frederick Mosteller found that sentence length
and its variability within papers did not discriminate Tables 1 3-1,
2, 3, 4 show that they found some discriminating power in percentage of
nouns, of adjectives, of one- and two-letter words, and of the s Together
these variables could have decided whether Hamilton or Madison wrote
all the disputed papers, if that were the problem, but the problem is to
make an effective assignment for each paper
1 4 Recent work—pilot study
We call marker words those which one author often uses and the other
rarely uses Douglass Adair found while (Hamilton) versus whilst
(Madison) We found enough (Hamilton) and upon (Hamilton); see
Tables 1 4-1, 2 for incidence and rates Tables 1 4-3, 4, 5 give an over
view of marker words for Federalist and non-Federalist writings Alone,
they would not settle the dispute compellingly
1 5 Plots and honesty
Some say that the dispute is not a matter of honesty but a matter of
memory Hamilton was hurried in his annotation by an impending duel,
but Madison had plenty of time Editing may be a hazard We want to
use many words as discriminating variables
1 6 The plan of the book
xiii
xiv ANALYTIC TABLE OF CONTENTS
Chapter 2 Words and Their Distributions 16
2 1 Why words? 16
Hamilton and Madison use the same words at different rates, and so
their rates offer a vehicle for discrimination Some words like by and to
vary relatively little in their rates as context changes, others like war
vary a lot, as the empirical distributions in the four tables show
Generally, less meaningful words offer more stability
2 2 Variation with time 19
In Table 2 2-2, a separate study illustrated by Madison s rates for 11
function words over a 26-year period examines the stability of rates
through time We desire stability because we need additional text of
know authorship to choose words and their rates for discriminating
between authors Among function words, some pronouns and auxiliary
verbs seem unstable
2 3 How frequency of use varies 22
For establishing a mathematical model, we need to find out empirically
how rates of use by an author vary from one chunk of writing to another
2 3A Independence of words from one block of text to another 23
A special study of extensive empirical data tests the independence of
the occurrences of the same word (for 51 words) in four successive blocks
of approximately 200 words of Hamilton text Table 2 3-1 compares the
observed counts with the binomial distributions for the 39 sets of four
blocks for each word Some words give evidence of lack of independence,
especially his, one, only, and than
2 3B Frequency of occurrence 28
For 51 words we show in Table 2 3-3 the frequency distribution of
occurrences in about 250 blocks of 200 The Poisson distribution does not
fit all the empirical distributions of the number of occurrences of high-
frequency words, but the negative binomial distribution comes close to
doing so For 10 of these words Poisson and negative binomials are fitted
and displayed in Table 2 3-4 for Hamilton and for Madison The nega
tive binomial distribution allows for higher tails than does the Poisson
2 4 Correlations between rates for different words 35
Theoretical study shows that the correlation between the rates of
occurrence for different words should ordinarily be small but negative
An empirical study whose results appear in Table 2 4-1 shows that these
correlations are ordinarily negligible for our work
2 5 Pools of words 37
Three pools of words produced potential discriminators
2 5A The function words 39
From a list of 363 function words prepared by Miller, Newman, and
Friedman, we selected the 70 highest-frequency and a random 20 low-
frequency words without regard to their ability to discriminate author
ship They appear in Tables 2 5-2 and 2 5-3
ANALYTIC TABLE OF CONTENTS XV
2 5B Initial screening study 39
We used some of the papers of known authorship to cut 3000 candidate
words to the 28 listed in Table 2 5-4, based on ability to discriminate
2 5C Word index with frequencies , 42
From 6700 different words, 103 non-contextual words were chosen from
240 that looked promising as discriminators on papers of known author
ship Of these words, the 48 in Table 2 5—6 were new
2 6 Word counts and their accuracies 43
Some word counts were carried out by hand using slips of paper, one
word per slip Others were done by a high-speed computer which con
structed a concordance
2 7 Concluding remarks 45
Although words offer only one set of discriminators, one needs a large
enough pool of potential discriminators to offer a good chance of success
We need to avoid selection and regression effects Ideally we want
enough data to get a grip on the distribution theory for the variables
to be used
Chapter 3 The Main Study 46
In the main study, we use Bayes theorem to determine odds of author
ship for each disputed paper by weighting the evidence from words
Bayesian methods enter centrally in estimating the word rates and
choosing the words to use as discriminators We use not one but an
empirically based range of prior distributions We present the results
for the disputed papers and examine the sensitivity of the results to
various aspects of the analysis
After a brief guide to the chapter, we describe some views of prob
ability as a degree of belief and we discuss the need and the difficulties
of such an interpretation
3 1 Introduction to Bayes theorem and its applications 49
We give an overview, abstracted from technical detail, of the ideas and
methods of the main study, and we describe the principal sources of
difficulties and how we go about meeting them
3 1A An example applying Bayes theorem with both initial odds and
parameters known 52
A simple probability calculation gives the probability of authorship
from evidence on one word Casting the result, the classical Bayes
theorem, into odds form is helpful and gives:
Final odds = initial odds x likelihood ratio
3 1B Selecting words and weighting their evidence 54
Applying Bayes theorem to several words, and taking logarithms gives
the final log odds as the sum of initial log odds and the log likelihood
ratios for the separate words The difference between the expected log
likelihood ratio for the two authors is a measure of importance of the
xvi ANALYTIC TABLE OF CONTENTS
word as a discriminator We discard words with small importances No
bias arises from selection when rates are known
3 1C Initial odds 56
Initial odds of authorship reflect the investigator s assessment of the
historical evidence The final odds is a product of the initial odds and the
likelihood ratio, and a large likelihood ratio can overwhelm most vari
ation in initial odds We concentrate on the likelihood ratio Our serious
use of Bayes theorem lies elsewhere, in our handling of unknown
parameters
3 1D Unknown parameters 57
Even if data distributions were Poisson, we would not know the mean
rates From the known Hamilton and Madison texts, we can estimate the
rates, but with important uncertainties :thesimpleuseofBayes theorem
is not quite right, and the selection effects in choosing the words are not
negligible We treat the rates as random quantities and use the con
tinuous form of Bayes theorem to determine the posterior distribution
to represent their uncertainty Figure 3 1-1 shows the logical structure
of the two different uses of Bayes theorem The factor from initial odds
to final odds is no longer a simple likelihood ratio, but a ratio of two
averaged probabilities, averaged over the posterior distributions of the
word rates The factor can often be approximated by a likelihood ratio
for an appropriately estimated set of rates
3 2 Handling unknown parameters of data distributions 60
We begin to set out the components of our Bayesian analysis
3 2A Choosing prior distributions 61
We expect both authors to have nearly the same rates for most words,
we shift to parameters measuring the combined rate and a differential
rate For any word, let cr be the sum of the rates for the two authors and
let T be the ratio of Hamilton s rate to the sum cr Empirical evidence
on 90 unselected words illustrated in the Figure 3 2-1 plot of estimated
parameters guides the choice of families of prior distributions for cr and r
3 2B The interpretation of the prior distributions 63
We work with a parametric family of prior distributions, and call its
parameters underlying constants By 1984, hyperparameters has become
the accepted term for them
3 2C Effect of varying the prior 63
We do not determine a single choice of the underlying constants, but
study the sensitivity of the assessments of authorship to changes in the
prior distributions reflecting changes in the underlying constants
3 2D The posterior distribution of (o, t) 64
For any choice of the underlying constants, the joint posterior density
of (cr, T) follows directly from Bayes theorem The mode of the posterior
density can be located by numerical methods and gives the modal
estimates of parameters used for determining the odds of authorship
ANALYTIC TABLE OF CONTENTS
3 2E Negative binomial
The negative binomial data distribution underlies our best analysis of
authorship The parametrizations and the assumed families of prior
distributions are set forth The priors are parametrized by five under
lying constants Posterior modal estimates were obtained for all words
under each of 21 sets of underlying constants For one typical set,
Table 3 2-3 presents the modal estimates of the negative binomial para
meters for the final 30 words used to assess the disputed papers
3 2F Final choices of underlying constants
Analyses (to be described in Section 4 5) of a pool of 90 unselected words
provide plausible ranges for the underlying constants Table 3 2-2 shows
six choices in that range We interpret the effect of the five underlying
constants and describe an approximate data-equivalence for the prior
distributions they specify
3 Selection of words
The prior distributions are the route for allowing and protecting against
selection effects in choice of words We use an unselected pool of 90 words
for estimating the underlying constants of the priors, and we assume the
priors apply to the populations of words from which we developed our
pool of 165 words We then selectively reduce that pool to the final 30
words We describe a stratification of words into word groups and our
deletion of two groups because of contextuality
4 Log odds
We compute the logarithm of the odds factor that changes initial odds
to final odds and call it simply log odds The computations use the
posterior modal estimates as if they were exact and are made under the
various choices of underlying constants and using both negative binomial
or Poisson models
3 4A Checking the method
Table 3 4-1 shows the total log odds over the 30 final words when each
of the 98 papers of known authorship is treated as if unknown It shows
the results for six choices of prior for the negative binomial, four for the
Poisson For almost all papers with known author, the log odds strongly
favor the actual author Choice of prior makes about 10 per cent
difference in the log odds Choice of data distribution has far larger
effects Paper length matters, and paper-to-paper variation is huge
3 4B The disputed papers
For each disputed paper, Table 3 4-2 shows the log odds factors, totaled
for the 30 final words, for ten choices of priors, six for the negative
binomial and four for a Poisson model The evidence strongly favors
Madison, with paper 55 weakest with an odds factor of 240 to 1
5 Log odds by words and word groups
3 5A Word groups
Table 3 5—1 breaks the log odds into contributions by the five word
groups for the disputed, joint, and some papers of known authorship
The general consistency of evidence is examined
ANALYTIC TABLE OF CONTENTS
3 5B Single words
Tables 3 5-2A, B, C show the contributions to the log odds from single
words: 9 high-frequency words, 11 Hamilton markers, 9 Madison
markers The gross difference between behavior of Poisson and nega
tive binomial models for extreme usages of rare words is illustrated
3 5C Contributions of marker and high-frequency words
Table 3 5-3 shows how papers with words at the Madison mean rate, at
the Hamilton mean rate, and at the average would be assessed; also how
papers with all or none of the Hamilton or of the Madison markers
would fare The comparisons support the fairness of the final 30 words
6 Late Hamilton papers
We assess the log odds for four of the late Federalist papers, written by
Hamilton after the newspaper articles appeared and not used in any of
our other analyses The log odds all favor Hamilton, very strongly for
all but the shortest paper
7 Adjustments to the log odds
Through special studies, we estimate the magnitude of effects on the log
odds of various approximations and imperfect assumptions underlying
the main computations and results presented in Section 3 4 Percentage
reductions in log odds are a good way to extrapolate from the special
studies to the main study
3 7A Correlation
The study of correlations among words suggests that log odds based on
independence should be reduced by an amount between 6 per cent and
12 per cent
3 7B Effects of varying the underlying constants that determine the prior
distributions
The choice of prior distribution used in most of the presented results is
in the middle of the estimated range of the underlying constants Other
choices might raise or lower the log odds, but not likely by more than
±12 per cent
3 7C Accuracy of the approximate log odds calculation
A study of the approximation for five of the most important words
suggests that the modal approximation tends to overstate the log odds
and that a 15 per cent reduction is indicated
3 7D Changes in word counts
Some word changes between the original newspaper editions and the
McLean edition we used for making our word counts require adjust
ment Two changes involving upon and whilst reduce the log odds for
Madison in two disputed papers Other errors, including counting errors,
are smaller and nearly balanced in direction
3 7E Approximate adjusted log odds for the disputed papers
Table 3 7-2 shows the log odds for the disputed papers after making
the specific adjustments for the major word changes, and with three
levels of a composite adjustment for other effects Even the extreme
ANALYTIC TABLE OF CONTENTS xix
adjustment leaves all but two papers with odds of over 2500 to 1 favoring
Madison, and the two weakest at 33:1 (paper #55 with log odds —3 5)
and 180:1 (paper#56 with log odds —5 2)
3 7F Can the odds be believed? 88
The odds, even after adjustment, are often over a million to one, and on
average about 60,000 to 1 We note that all forms of statistical inference
have the equivalents of such strong evidence, but in different forms from
the Bayesian odds calculations We discuss the believability of such odds
from the standpoint of statistical models, and then from a broader view
point external to the model, allowing for what we call outrageous
events We examine how one can ever justify strong evidence for dis
crimination, and how independent evidence can be built up We see how
the evidence from upon is reasonable and more defensible for a pro-
Madison finding than it would have been in a pro-Hamilton finding
We note some potential failings such as computational and other
blunders, fraud and serious errors, which can never be absolutely ruled
out We offer evidence for the implausibility of Madison s having edited
Hamilton s papers to look like his own writings in the way we assess his
style A probability calculation shows how a small probability of an out
rageous event has little impact on weak evidence from a statistical
analysis, but does put a bound on strong evidence
Chapter 4 Theoretical Basis of the Main Study 92
This chapter is a sequence of technical sections supporting the methods
and results of the main study presented in Chapter 3 We set out the
distributional assumptions, our methods of determining final odds of
authorship, and the logical basis of the inference We explain our
methods for choosing prior distributions We develop theory and
approximate methods to explore the adequacy of the assumptions and
to support the methods and the findings
4 1 The negative binomial distribution 93
We review and develop properties of the negative binomial family of
distributions
4 1A Standard properties 93
For the negative binomial, we set out the frequency function, the
cumulant generating function, the first four cumulants, the repre
sentation as a gamma mixture of Poisson distributions, and the limiting
relationship to the Poisson family
4 1B Distributions of word frequency 96
The mixture representation motivates the negative binomial as a
distribution of word frequency
4 1C Parametrization 96
Several parametrizations of the negative binomial are compared by
criteria of interpretability in several modelings, asymptotic ortho
gonality, and stability of value across applications to different words
XX ANALYTIC TABLE OF CONTENTS
We choose the mean and a measure of deviation from the Poisson that
is not the usual choice
4 1D Estimation 97
Parameter estimation by maximum likelihood has no closed forms
(except for the mean when all paper lengths are the same) The method of
moments gives initial estimates for use directly or as starting values for
iteration Explicit method-of-moments estimates and approximate
standard errors are given
4 2 Analysis of the papers of known authorship 99
We treat the choice of prior distributions, the determination of the
posterior distribution, and the computational problem in finding
posterior modes
4 2A The data: notations and distributional assumptions 99
Notation and formal distributional assumptions are set out for all
words and all papers of known authorship for negative binomial and
Poisson models
4 2B Object of the analysis 100
The odds factor for authorship of any unknown paper is a ratio of
posterior expectations, taken over the distribution of parameters
posterior to the data on papers of known authorship A modal ap
proximation is natural and leads to the determination of posterior
modal estimates as a principal intermediate goal of the analysis
4 2C Prior distributions: assumptions 100
For each word, two negative binomial parameters describe Hamilton s
usage, and two describe Madison s usage These four are reparametrized
to a form in which a sampling model for a pool of words is in accord with
empirical support from studies of method-of-moments estimates Table
4 2-1 presents the method-of-moments estimates for 22 function words
A parametric family of prior distributions is assumed with five hyper-
parameters that we call underlying constants The 21 sets of underlying
constants used in sensitivity studies are listed in Table 4 2-2
4 2D The posterior distribution 103
For any choice of underlying constants, the posterior distributions are
independent across words For each word, the posterior is a four-
dimensional density known up to its normalizing constant The posterior
mode and the Hessian matrix of second derivatives of the logarithmic
density are determined by a Newton-Raphson iterative algorithm
4 2E The modal estimates 106
The posterior modal estimates are the main output of the empirical
Bayesian analysis of the papers of known authorship and the main
input for assessing the evidence of authorship on any unknown paper
The Hessian matrices are important for exploring the adequacy of
approximations The modal estimates for the 30 final words and one
choice of prior were set out in Table 4 2-3
ANALYTIC TABLE OF CONTENTS XXi
4 2F An alternative choice of modes 106
Modes of asymmetric densities are not ideal for approximating posterior
expectations Some inexpensive improvements come from using modes
of densities relative to a measure element other than Lebesgue measure
For the gamma- and beta-like prior densities used here, these relative
modes are equivalent to a change in the underlying constants
4 2G Choice of initial estimate 108
Iterative procedures require starting values; method-of-moment esti
mates are natural candidates but are inadequate for low-frequency words
where the shrinking effect of the prior density is strong An approximate
data equivalent of the prior leads to weighted initial estimates of good
quality Combining tight-tailed priors with long-tailed data distributions
gives rise to special needs that must be faced in the absence of sufficiency
or conjugacy
Abstract structure of the main study Ill
We describe an abstract structure for our problem; we derive the appro
priate formulas for our application of Bayes theorem and give a formal
basis for the method of bracketing the prior distribution The treatment
is abstracted both from the notation of words and their distributions and
from numerical evaluations
4 3A Notation and assumptions
Four initial assumptions model the probabilistic relations among the
observables (the data on the disputed papers and the data on the
known-author papers) and the non-observables (the parameters of the
data distributions and the authorship of the disputed papers) The
authorship is the goal of the analysis of The Federalist The basic appli
cation of Bayes theorem represents the final odds of authorship as the
product of the initial odds of authorship and an odds factor that involves
the data on the known papers
4 3B Stages of analysis
The factorization in Section 4 3A divides the analysis into three stages:
choosing data distributions and estimating their parameters, evaluating
the odds factors for the disputed papers, and combining the odds factors
with initial odds of authorship The first two are heavily statistical
4 3C Derivation of the odds formula
The fundamental factorization result of Section 4 3A is derived from
four assumptions
4 3D Historical information
Historical evidence bears on authorship and can be treated as logically
prior to the analysis of the linguistic data A fifth assumption sets out
what is needed for the statistical evidence that determines our odds
factors to be independent of and acceptable to histoiians, regardless of
how they assess the historical evidence This subjective element is
isolated to the assessment of the initial odds
4 3E Odds for single papers
Odds factors for authorship of a single paper are interesting and
important
I l l
xxii ANALYTIC TABLE OF CONTENTS
4 3F Prior distributions for many nuisance parameters 114
Our data consist of word frequencies for more than a hundred words
Modeling each as distributed independently as a negative binomial
leads to four parameters per word Estimating hundreds of parameters
with the available data cannot be done safely using a flat prior, or with
any non-Bayesian equivalent such as maximum likelihood Here, we
consider the abstract notion of modeling the behavior of the word-
frequency parameters as sampled from a hyperpopulation The hyper-
population is modeled as a parametric family of low dimension with
parameters we call underlying constants but for which hyperparameter
has come into common use by 1984 In lieu of an infeasible full Bayesian
analysis, we propose to carry out the main analysis conditional on
assumed known values of the hyperparameters The hyperparameters
are estimated in a separate analysis and the sensitivity of the main
results to the assumed hyperparameters is explored The method is
empirical, and the Bayesian logic is examined Some similarities and
some distinctions from Robbins empirical Bayes procedures are
noted
4 3C Summary 117
4 4 Odds factors for the negative binomial model 117
We develop properties of the Poisson and negative binomial families of
distributions The discussion of appropriate shapes for the likelihood
ratio function may suggest new ways to choose the form of distributions
4 4A Odds factors for an unknown paper 117
The odds factor for an unknown paper is the product, over words, of
a ratio of expectations of two negative binomial probabilities, the
numerator expectation with respect to the posterior distribution of the
Hamilton parameters, the denominator with respect to the posterior
distribution of the Madison parameters for the word
4 4B Integration difficulties in evaluation of X 119
For any word, the posterior distribution for the four parameters is
determined up to a normalizing constant To get the marginal distri
butions of the two Hamilton or of the two Madison parameters would
require quadrature or other approximation The calculations of the
exact odds factor A for any word and unknown paper then is a ratio of
two four-dimensional integrals, a formidable calculation that we bypass
by the modal approximation
4 4C Behavior of likelihood ratios 120
With known parameters and a single word, the odds factor is a simple
likelihood ratio depending on the frequency of the word in the unknown
paper Likelihood ratios whose logarithms are monotone or even linear
are popular in statistical theory, and arise for Poisson and other expon
ential family models For representing intuitive assessment of evidence,
shapes that redescend toward zero for very high (and suspect) fre
quencies are appealing The behavior for the negative binomial is
examined It is not linear, but is unbounded, and to prevent any word
ANALYTIC TABLE OF CONTENTS xxiii
contributing excessively, truncation rules were set up to prevent any
word from contributing more strongly than the extreme observed in
the 98 papers of known authorship
4 4D Summary 124
Further work is needed to develop asymptotic expansions and well-
designed quadrature and Monte Carlo methods to evaluate the integrals
that arise in Bayesian analyses Also needed is a greater range of appro
priate shapes for log likelihood ratios
4 5 Choosing the prior distributions 124
We give methods for choosing sets of underlying constants to bracket
the prior distributions and we explore the effects of varying the prior
on the log odds Choices are based in part on empirical analysis but also on
heuristic considerations of reasonableness, analogy, and tractability
4 5A Estimation of and f}2: first analysis 125
The first two hyperparameters f3y and /?, measure the spread of the
prior distribution of the differential word rate A variance components
analysis of the observed mean word rates in 90 function words stratified
according to total frequency of use leads to estimates of /?x and /?2 for the
pool of function words The stratification can be collapsed to give a better
estimate of We apply the jackknife procedure with eight random
subgroups to produce a standard error for the estimated fty
4 5B Estimation of fi1 and f}2: second analysis 128
If posterior modal estimates of the word rate parameters were used as if
exact to estimate hyperparameters, those measuring variation would be
too small because of the shrinking effect of the Bayesian estimation For
an analogous binomial problem, the extent of underestimation is deter
mined and used as an informal guide to the actual situation
4 5C Estimation of /}3 130
The hyperparameter /?3, measuring the spread of differential non-
Poissonness is assessed informally from the frequency distribution of
method-of-moments estimates and from the posterior modal estimates
of differential non-Poissonness These tend respectively to show too
much and too little variation and bracket /?3 A weakness from an
assumed symmetry is considered
4 5D Estimation of /?4 and /?5 131
These two hyperparameters that measure the mean and variance of the
composite non-Poissonness are assessed by informal analyses
4 5E Effect of varying the set of underlying constants 132
The sensitivity of the final log odds factors to the choice of underlying
constants or hyperparameters is examined by selective comparisons
among the 21 sets chosen to bracket the priors An appropriate response
measure is a proportional change to the log odds, and a 12 per cent
change up or down from the primary choice is judged an adequate
allowance The main effect of changing each hyperparameter is exam
ined as are some interactions
xxiv ANALYTIC TABLE OF CONTENTS
4 5F Upon: a case study 135
The effect of choice of prior on the estimated rates and non-Poissonness
parameters for the highly discriminating word upon illustrates some
possible strange effects of tail behavior of a prior interacting with a four-
dimensional likelihood surface Use of priors conjugate to the likelihood
hold few surprises, even when the priors and likelihood are quite in
consistent, because prior and likelihood effectively represent equivalent
and exchangeable data Our gamma- and beta-like priors have tight
tails, and in extreme situations can dominate the broader tails of the
negative binomial likelihoods, and strongly shift the parameters from
the observed rates This behavior stands in contrast to analyses with
flat priors and tight-tailed data distributions
4 5G Summary 138
Sensitivity to choice of priors is modest relative to other source of
variation The study of Section 4 5F suggests a point likely important
throughout Bayesian inference: the effect of small tails of the prior is
very different from the effect of small tails of the data distribution
4 6 Magnitudes of adjustments required by the modal approximation to the
odds factor 138
We study, by example, the effect of using the posterior mode as if it were
exact To make the assessment we develop some general asymptotic
theory of posterior densities
4 6A Ways of studying the approximation 138
The odds factor is a ratio of two expectations, usually with respect to a
concentrated posterior distribution An expectation can be approximated
by the integrand evaluated at the mean or, to a higher order, by the
next delta-method adjustment using covariances We have only the
posterior modes and the Hessian matrix of second derivatives at
the mode, and want to use that information to assess the modal
approximation
4 6B Normal theory for adjusting the negative binomial modal approxi
mation 140
We further transform the four parameters for each word to a form in which
a normal posterior is a plausible approximation We use the mode and
inverse of the Hessian matrix in the new parametrization as if they were
the exact mean and variance matrix We apply the first two terms of the
delta method approximations to the required expectations, and study the
changes in log odds for five words, including the three strongest discri
minators and two of the strongest rare words The modal approximation
gives log odds that are too large (in magnitude), and a 15 per cent re
duction in total log odds is a rough bound for the effect
4 6C Approximations to expectations 146
The delta method is based on means and covariances The Laplace
integral expansion for a posterior density gives the equivalent approxi
mation in terms of the posterior mode, and second and third derivatives
at the mode Using modes relative to specially chosen density elements
ANALYTIC TABLE OF CONTENTS XXV
can reduce the role of the third derivatives Normal, beta, and gamma
distributions motivate choices of density elements Multivariate
extensions are set forth
4 6D Notes on asymptotic methods 152
The asymptotic basis of the approximations developed and used in the
|
any_adam_object | 1 |
author | Mosteller, Frederick 1916-2006 Wallace, David L. |
author_GND | (DE-588)118940627 |
author_facet | Mosteller, Frederick 1916-2006 Wallace, David L. |
author_role | aut aut |
author_sort | Mosteller, Frederick 1916-2006 |
author_variant | f m fm d l w dl dlw |
building | Verbundindex |
bvnumber | BV000298011 |
callnumber-first | J - Political Science |
callnumber-label | JK155 |
callnumber-raw | JK155 |
callnumber-search | JK155 |
callnumber-sort | JK 3155 |
callnumber-subject | JK - United States |
classification_rvk | HF 184 QH 233 QH 234 SK 830 SK 840 |
ctrlnum | (OCoLC)10726742 (DE-599)BVBBV000298011 |
dewey-full | 342.73/029 |
dewey-hundreds | 300 - Social sciences |
dewey-ones | 342 - Constitutional and administrative law |
dewey-raw | 342.73/029 |
dewey-search | 342.73/029 |
dewey-sort | 3342.73 229 |
dewey-tens | 340 - Law |
discipline | Rechtswissenschaft Anglistik / Amerikanistik Mathematik Wirtschaftswissenschaften |
edition | 2. ed. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02970nam a2200709 c 4500</leader><controlfield tag="001">BV000298011</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20160512 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">870612s1984 |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0387909915</subfield><subfield code="9">0-387-90991-5</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">3540909915</subfield><subfield code="9">3-540-90991-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)10726742</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV000298011</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-12</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-824</subfield><subfield code="a">DE-19</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-188</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">JK155</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">342.73/029</subfield><subfield code="2">19</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">HF 184</subfield><subfield code="0">(DE-625)48788:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 233</subfield><subfield code="0">(DE-625)141548:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 234</subfield><subfield code="0">(DE-625)141549:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 830</subfield><subfield code="0">(DE-625)143259:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 840</subfield><subfield code="0">(DE-625)143261:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Mosteller, Frederick</subfield><subfield code="d">1916-2006</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)118940627</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Applied Bayesian and classical inference</subfield><subfield code="b">the case of The Federalist papers</subfield><subfield code="c">Frederick Mostelller ; David L. Wallace</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">2. ed.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">New York, NY [u.a.]</subfield><subfield code="b">Springer</subfield><subfield code="c">1984</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXXVII, 303 S.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Springer series in statistics</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">1. Aufl. u.d.T.: Mosteller, Frederick: Inference and disputed authorship. 1964</subfield></datafield><datafield tag="600" ind1="1" ind2="7"><subfield code="a">Hamilton, Alexander</subfield><subfield code="d">1757-1804</subfield><subfield code="0">(DE-588)118545302</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="630" ind1="0" ind2="4"><subfield code="a">Federalist</subfield></datafield><datafield tag="630" ind1="0" ind2="7"><subfield code="a">The federalist</subfield><subfield code="0">(DE-588)4202673-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Englisch</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">English language</subfield><subfield code="x">Word frequency</subfield><subfield code="v">Case studies</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mathematical linguistics</subfield><subfield code="v">Case studies</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bayes-Entscheidungstheorie</subfield><subfield code="0">(DE-588)4144220-9</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Englisch</subfield><subfield code="0">(DE-588)4014777-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Worthäufigkeit</subfield><subfield code="0">(DE-588)4132191-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4522595-3</subfield><subfield code="a">Fallstudiensammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Hamilton, Alexander</subfield><subfield code="d">1757-1804</subfield><subfield code="0">(DE-588)118545302</subfield><subfield code="D">p</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">The federalist</subfield><subfield code="0">(DE-588)4202673-8</subfield><subfield code="D">u</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Worthäufigkeit</subfield><subfield code="0">(DE-588)4132191-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Hamilton, Alexander</subfield><subfield code="d">1757-1804</subfield><subfield code="0">(DE-588)118545302</subfield><subfield code="D">p</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">The federalist</subfield><subfield code="0">(DE-588)4202673-8</subfield><subfield code="D">u</subfield></datafield><datafield tag="689" ind1="1" ind2="2"><subfield code="a">Bayes-Entscheidungstheorie</subfield><subfield code="0">(DE-588)4144220-9</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="2" ind2="0"><subfield code="a">Englisch</subfield><subfield code="0">(DE-588)4014777-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="2" ind2="1"><subfield code="a">Worthäufigkeit</subfield><subfield code="0">(DE-588)4132191-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="2" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wallace, David L.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="780" ind1="0" ind2="0"><subfield code="i">1. Auflage</subfield><subfield code="a">Mosteller, Frederick</subfield><subfield code="t">The Federalist</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HEBIS Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=000181698&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="940" ind1="1" ind2=" "><subfield code="q">TUB-nveb</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-000181698</subfield></datafield></record></collection> |
genre | (DE-588)4522595-3 Fallstudiensammlung gnd-content |
genre_facet | Fallstudiensammlung |
id | DE-604.BV000298011 |
illustrated | Not Illustrated |
indexdate | 2024-07-09T15:11:53Z |
institution | BVB |
isbn | 0387909915 3540909915 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-000181698 |
oclc_num | 10726742 |
open_access_boolean | |
owner | DE-12 DE-384 DE-739 DE-355 DE-BY-UBR DE-824 DE-19 DE-BY-UBM DE-706 DE-83 DE-188 |
owner_facet | DE-12 DE-384 DE-739 DE-355 DE-BY-UBR DE-824 DE-19 DE-BY-UBM DE-706 DE-83 DE-188 |
physical | XXXVII, 303 S. |
psigel | TUB-nveb |
publishDate | 1984 |
publishDateSearch | 1984 |
publishDateSort | 1984 |
publisher | Springer |
record_format | marc |
series2 | Springer series in statistics |
spelling | Mosteller, Frederick 1916-2006 Verfasser (DE-588)118940627 aut Applied Bayesian and classical inference the case of The Federalist papers Frederick Mostelller ; David L. Wallace 2. ed. New York, NY [u.a.] Springer 1984 XXXVII, 303 S. txt rdacontent n rdamedia nc rdacarrier Springer series in statistics 1. Aufl. u.d.T.: Mosteller, Frederick: Inference and disputed authorship. 1964 Hamilton, Alexander 1757-1804 (DE-588)118545302 gnd rswk-swf Federalist The federalist (DE-588)4202673-8 gnd rswk-swf Englisch English language Word frequency Case studies Mathematical linguistics Case studies Statistik (DE-588)4056995-0 gnd rswk-swf Bayes-Entscheidungstheorie (DE-588)4144220-9 gnd rswk-swf Englisch (DE-588)4014777-0 gnd rswk-swf Worthäufigkeit (DE-588)4132191-1 gnd rswk-swf (DE-588)4522595-3 Fallstudiensammlung gnd-content Hamilton, Alexander 1757-1804 (DE-588)118545302 p The federalist (DE-588)4202673-8 u Worthäufigkeit (DE-588)4132191-1 s Statistik (DE-588)4056995-0 s DE-604 Bayes-Entscheidungstheorie (DE-588)4144220-9 s Englisch (DE-588)4014777-0 s Wallace, David L. Verfasser aut 1. Auflage Mosteller, Frederick The Federalist HEBIS Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=000181698&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Mosteller, Frederick 1916-2006 Wallace, David L. Applied Bayesian and classical inference the case of The Federalist papers Hamilton, Alexander 1757-1804 (DE-588)118545302 gnd Federalist The federalist (DE-588)4202673-8 gnd Englisch English language Word frequency Case studies Mathematical linguistics Case studies Statistik (DE-588)4056995-0 gnd Bayes-Entscheidungstheorie (DE-588)4144220-9 gnd Englisch (DE-588)4014777-0 gnd Worthäufigkeit (DE-588)4132191-1 gnd |
subject_GND | (DE-588)118545302 (DE-588)4202673-8 (DE-588)4056995-0 (DE-588)4144220-9 (DE-588)4014777-0 (DE-588)4132191-1 (DE-588)4522595-3 |
title | Applied Bayesian and classical inference the case of The Federalist papers |
title_auth | Applied Bayesian and classical inference the case of The Federalist papers |
title_exact_search | Applied Bayesian and classical inference the case of The Federalist papers |
title_full | Applied Bayesian and classical inference the case of The Federalist papers Frederick Mostelller ; David L. Wallace |
title_fullStr | Applied Bayesian and classical inference the case of The Federalist papers Frederick Mostelller ; David L. Wallace |
title_full_unstemmed | Applied Bayesian and classical inference the case of The Federalist papers Frederick Mostelller ; David L. Wallace |
title_old | Mosteller, Frederick The Federalist |
title_short | Applied Bayesian and classical inference |
title_sort | applied bayesian and classical inference the case of the federalist papers |
title_sub | the case of The Federalist papers |
topic | Hamilton, Alexander 1757-1804 (DE-588)118545302 gnd Federalist The federalist (DE-588)4202673-8 gnd Englisch English language Word frequency Case studies Mathematical linguistics Case studies Statistik (DE-588)4056995-0 gnd Bayes-Entscheidungstheorie (DE-588)4144220-9 gnd Englisch (DE-588)4014777-0 gnd Worthäufigkeit (DE-588)4132191-1 gnd |
topic_facet | Hamilton, Alexander 1757-1804 Federalist The federalist Englisch English language Word frequency Case studies Mathematical linguistics Case studies Statistik Bayes-Entscheidungstheorie Worthäufigkeit Fallstudiensammlung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=000181698&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT mostellerfrederick appliedbayesianandclassicalinferencethecaseofthefederalistpapers AT wallacedavidl appliedbayesianandclassicalinferencethecaseofthefederalistpapers |