Probabilistic models of image statistics underlie many approaches in image analysis and processing. An important class of such models have variables whose dependency graph is a tree. If the hidden variables take values on a finite set, most computations with the model can be performed exactly, including the likelihood calculation, training with the EM algorithm, etc. Crouse et al. developed one such model, the hidden Markov tree ( HMT). They took particular care to limit the complexity of their model. We argue that it is beneficial to allow more complex tree-structured models, describe the use of information theoretic penalties to choose the model complexity, and present experimental results to support these proposals. For these experiments, we use what we call the hierarchical image probability (HIP) model. The differences between the HIP and the HMT models include the use of multivariate Gaussians to model the distributions of local vectors of wavelet coefficients and the use of different numbers of hidden states at each resolution. We demonstrate the broad utility of image distributions by applying the HIP model to classification, synthesis, and compression, across a variety of image types, namely, electrooptical, synthetic aperture radar, and mammograms (digitized X-rays). In all cases, we compare with the HMT.