Correcting sample selection bias in maximum entropy density estimation

Part of Advances in Neural Information Processing Systems 18 (NIPS 2005)

Bibtex Metadata Paper


Miroslav Dudík, Steven Phillips, Robert E. Schapire


We study the problem of maximum entropy density estimation in the presence of known sample selection bias. We propose three bias cor- rection approaches. The first one takes advantage of unbiased sufficient statistics which can be obtained from biased samples. The second one es- timates the biased distribution and then factors the bias out. The third one approximates the second by only using samples from the sampling distri- bution. We provide guarantees for the first two approaches and evaluate the performance of all three approaches in synthetic experiments and on real data from species habitat modeling, where maxent has been success- fully applied and where sample selection bias is a significant problem.