Estimating a bivariate density when there are extra data on one or both components

Assume we have a dataset, Z say, from the joint distribution of random variables X and Y , and two further, independent datasets, X and Y, from the marginal distributions of X and Y , respectively. We wish to combine X, Y and Z, so as to construct an estimator of the joint density. This problem is readily solved in some parametric circumstances. For example, if the joint distribution were normal then we would combine data from X and Z to estimate the mean and variance of X; proceed analogously to estimate the mean and variance of Y ; but use data from Z alone to estimate E(XY ). However, the problem is more difficult in a nonparametric setting. There we suggest a copula-based solution, which has potential benefits even when the marginal datasets X and Y are empty. For example, if the copula density is sufficiently smooth in the region where we wish to estimate it, then the effective dimension of the structure that links the marginal distributions is relatively low, and the joint density of X and Y can be estimated with a high degree of accuracy. Similar improvements in performance are available if the marginals are close to being independent. We suggest using wavelet estimators to approximate the copula density, which in cases of statistical interest can be unbounded along boundaries. Our techniques are also useful for solving recently-considered related problems, for example where the marginal distributions are determined by parametric models. Therefore the methodology has application beyond the context which motivated it. The methodology is also readily extended to more general multivariate settings.

Peter Hall, Natalie Neumeyer

eng

text/html

http://hdl.handle.net/10419/22618

Adapt according to the presented license agreement and reference the original author.