Empirical Bayes Estimation Via Data Fission: Ignat@uchicago - Edu Dlsun@stanford - Edu

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Empirical Bayes estimation via data fission

Nikolaos Ignatiadis Dennis L. Sun


ignat@uchicago.edu dlsun@stanford.edu

October, 2024

Abstract: We demonstrate how data fission, a method for creating synthetic replicates from
arXiv:2410.12117v1 [stat.ME] 15 Oct 2024

single observations, can be applied to empirical Bayes estimation. This extends recent work on
empirical Bayes with multiple replicates to the classical single-replicate setting. The key insight
is that after data fission, empirical Bayes estimation can be cast as a general regression problem.

This note was prepared as a comment on “Data Fission: Splitting a Single Data Point,” by James Leiner,
Boyan Duan, Larry Wasserman, and Aaditya Ramdas, a discussion paper in the Journal of the American
Statistical Association.

We congratulate Leiner, Duan, Wasserman, and Ramdas on a stimulating article that joins an
elegant method to a compelling application. Their article focuses primarily on applications to selective
inference. In this comment, we demonstrate how data fission can be applied to a very different problem:
empirical Bayes (EB) estimation [Robbins, 1956, Efron, 2019].
In the EB framework, we observe X1 , . . . , Xn generated by
iid ind
θi „ H, Xi | θi „ pp¨ | θi q. (1)

If the prior H were known, then the Bayes estimator θ̂iB “ EH rθi |Xi s would be optimal, achieving
the Bayes risk. In EB, the prior H is not known, so the goal is to construct an estimator θ̂iEB that
approximates θ̂iB using all of X1 , . . . , Xn . Ignatiadis et al. [2023] demonstrated a general way to
construct EB estimators when i.i.d. replicates Xi1 , . . . , XiK are available for each θi . This method,
called Aurora, regresses one replicate on the rest.
But what if there is only one Xi per θi ? This is where data fission comes in. We can use data
fission to generate synthetic replicates and apply Aurora:

1. As in Leiner et al. [2023], we construct functions fτ and gτ appropriate to the likelihood pp¨ | θi q,
with the EB-specific requirement that EH rgτ pXi q | fτ pXi qs “ EH rθi | fτ pXi qs.
• For a normal likelihood (with variance σ 2 ), the construction in Leiner et al. [2023], fτ pXi q “
Xi ` τ Zi and gτ pXi q “ Xi ´ τ ´1 Zi with independent Zi „ Normalp0, σ 2 q, works.
• For a Poisson likelihood, we rescale the construction in Leiner et al. [2023] (also used in
ind
the EB context by Brown et al. [2013]). Let Zi „ BinpXi , 1 ´ τ q for τ P p0, 1q, and set
fτ pXi q “ Zi {p1 ´ τ q and gτ pXi q “ pXi ´ Zi q{τ .
2. Regress gτ pXi q on fτ pXi q, i “ 1, . . . , n using any regression method. Denote the estimated mean
function by m̂p¨q.

3. Estimate θi by θ̂i “ m̂pfτ pXi qq.


4. Repeat data fission multiple times and average the resulting estimates.

1
(a) Xi | θi „ Normalpθi , 1q

(b) Xi | θi „ Poissonpθi q

Figure 1: We simulated n “ 1000 observations, with θi from the three-point prior H “ Unif t1, 4, 7u,
and Xi | θi from either a normal or a Poisson distribution. Each Xi was split into two replicates,
fτ pXi q and gτ pXi q, for two values of τ . Each panel shows a scatterplot of the replicates for a different
value of τ and a different likelihood, along with the true and estimated mean functions (estimated by
isotonic regression). For smaller τ , fτ pXi q contains more information about θi , so the oracle estimator
E rθi | fτ pXi qs is more similar to the Bayes estimator E rθi | Xi s. However, gτ pXi q is noisier, which
makes the regression task more difficult. Following Leiner et al. [2023], we may interpret the values of
τ for the normal and Poisson simulations as the split of the Fisher information between gτ pXi q and
fτ pXi q: in the left panels (small τ ), we have a split of 4:96 and in the right panels (medium τ ) a split
of 14:86.

2
Method Gaussian MSE Poisson MSE
MLE (θ̂i “ Xi ) 1.00 4.01
NPMLE (via REBayes, Koenker and Gu, 2017) 0.60 2.02
Aurora with Isotonic Regression (small τ ) 0.69 2.17
Aurora with Isotonic Regression (medium τ ) 0.64 2.05
Bayes Estimator (oracle, θ̂iB “ EH rθi |Xi s) 0.59 1.97

Table 1: Comparison of mean squared error (MSE) for different methods in the normal and Poisson
simulations. The settings correspond to the panels of Figure 1. We compute the MSE by averaging
over 100 Monte Carlo replicates of each simulation. We apply Aurora by averaging over 100 repetitions
of data fission.

If the mean function m̂p¨q is learned well, then the risk of Aurora should approximately match
the risk of EH rθi | fτ pXi qs. This expectation can be made arbitrarily close to the Bayes estimator
by choosing τ small so that fτ pXi q « Xi . However, the variance of gτ pXi q also increases for small τ ,
making the mean function harder to learn.
Figure 1 illustrates Aurora on simulated data. The scatterplots show replicates generated by
data fission for two values of τ and for normal and Poisson likelihoods. The true mean functions
E rgτ pXi q | fτ pXi qs are graphed as dotted black lines. The mean functions are estimated by isotonic
regression and the estimates m̂p¨q are graphed as solid teal lines.
Table 1 compares the MSE of Aurora with the MLE (θ̂i “ Xi ), the nonparametric maximum
likelihood estimator (NPMLE), and the oracle Bayes estimator. The NPMLE performs best on these
well-specified low-dimensional EB problems, as is known in the literature [Jiang and Zhang, 2009,
Koenker and Mizera, 2014, Polyanskiy and Wu, 2021]. Yet, Aurora remains competitive, developing
a classical connection between EB and regression [Stigler, 1990] into a general methodology (see Jana
et al. [2023], Barbehenn and Zhao [2023] for related ideas). Because Aurora is built on top of regression,
it generalizes naturally to situations with side-information [Ignatiadis and Wager, 2019] and high-
dimensional likelihoods [Daras et al., 2023]. And unlike the NPMLE, Aurora works even when pp¨ | θi q
is not fully specified, as long as data fission or data splitting is possible.

Reproducibility. We provide code to reproduce our numerical results on Github:


https://github.com/nignatiadis/empirical-bayes-data-fission-comment

References
Alton Barbehenn and Sihai Dave Zhao. A nonparametric regression alternative to empirical Bayes
approaches to simultaneous estimation. arXiv preprint, arXiv:2205.00336, 2023.
Lawrence D Brown, Eitan Greenshtein, and Ya’acov Ritov. The Poisson compound decision problem
revisited. Journal of the American Statistical Association, 108(502):741–749, 2013.
Giannis Daras, Kulin Shah, Yuval Dagan, Aravind Gollakota, Alex Dimakis, and Adam Klivans.
Ambient diffusion: Learning clean distributions from corrupted data. In Advances in Neural Infor-
mation Processing Systems, volume 36, 2023.
Bradley Efron. Bayes, oracle Bayes and empirical Bayes. Statistical Science, 34(2):177–201, 2019.

Nikolaos Ignatiadis and Stefan Wager. Covariate-powered empirical Bayes estimation. In Advances
in Neural Information Processing Systems, volume 32, 2019.

3
Nikolaos Ignatiadis, Sujayam Saha, Dennis L. Sun, and Omkar Muralidharan. Empirical Bayes mean
estimation with nonparametric errors via order statistic regression on replicated data. Journal of
the American Statistical Association, 118(542):987–999, 2023.
Soham Jana, Yury Polyanskiy, Anzo Z. Teh, and Yihong Wu. Empirical Bayes via ERM and
Rademacher complexities: The Poisson model. In Proceedings of Thirty Sixth Conference on Learn-
ing Theory, volume 195 of Proceedings of Machine Learning Research, pages 5199–5235, 2023.

Wenhua Jiang and Cun-Hui Zhang. General maximum likelihood empirical Bayes estimation of normal
means. The Annals of Statistics, 37(4):1647–1684, 2009.
Roger Koenker and Jiaying Gu. REBayes: An R package for empirical Bayes mixture methods.
Journal of Statistical Software, 82(8), 2017.

Roger Koenker and Ivan Mizera. Convex optimization, shape constraints, compound decisions, and
empirical Bayes rules. Journal of the American Statistical Association, 109(506):674–685, 2014.
James Leiner, Boyan Duan, Larry Wasserman, and Aaditya Ramdas. Data Fission: Splitting a Single
Data Point. Journal of the American Statistical Association, pages 1–12, 2023.
Yury Polyanskiy and Yihong Wu. Sharp regret bounds for empirical Bayes and compound decision
problems. arXiv preprint, arXiv:2109.03943, 2021.
Herbert Robbins. An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley
Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of
Statistics, pages 157–163. The Regents of the University of California, 1956.

Stephen M. Stigler. The 1988 Neyman Memorial Lecture: A Galtonian perspective on shrinkage
estimators. Statistical Science, 5(1), 1990.

You might also like