Mammographic Computer-Aided Diagnosis (CAD) systems are an approach for low-cost double reading. Though results to date have been promising, current systems often suffer from unacceptably high false positive rates. Improved methods are needed for optimally setting the system parameters, particularly in the case of statistical models that are common elements of most CAD systems. In this research project we developed a framework for building hierarchical pattern recognizers for CAD based on information theoretic criteria, e.g., the minimum description length (MDL). As part of this framework, we developed a hierarchical image probability (HIP) model. HIP models are well-suited to information theoretic methods since they are generative. We developed architecture search algorithms based on information theory, and applied these to mammographic CAD. The resulting mass detection algorithm, for example, reduced the false positive rate of a CAD system by 30% with no loss of sensitivity. We showed that the criteria reliably correlate with performance on new data. The framework allows many other applications not possible with most pattern recognition algorithms, including rejection of novel examples that can’t be reliably classified, synthesis of artificial images to investigate the structure learned by the model, and compression, which is as good as JPEG.