In this paper we explore the use of feature selection techniques to improve the generalization performance of pattern recognizers for computer-aided diagnosis. We apply a modified version of the sequential forward floating selection (SFFS) of Pudil et al. to the problem of selecting an optimal feature subset for mass detection in digitized mammograms. The complete feature set consists of multi-scale tangential and radial gradients in the mammogram region of interest. We train a simple multi-layer perceptron (MLP) using the SFFS algorithm and compare its performance, using a jackknife procedure, to an MLP trained on the complete feature set (35 features). Results indicate that a variable number of features is chosen in each of the jackknife sets (12 +/- 4) and the test performance, Az, using the chosen feature subset is no better than the performance using the entire feature set. These results may be attributed to the fact that the feature set is noisy and the data set used for training/testing is small. We next modify the feature selection technique by using the results of the jackknife to compute the frequency at which different features are selected. We construct a classifier by choosing the top N features, selected most frequently, which maximize performance on the training data. We find that by adding this `hand-tuning’ component to the feature selection process, we can reduce the feature set from 35 to 8 features and at the same time have a statistically significant increase in generalization performance (p < 0.015).