Purpose: : The segmentation and quantitation of drusen is important for diagnosis and monitoring of retinal disease, particularly age–related macular degeneration (AMD). Many approaches to drusen segmentation have been based on heuristics, such as morphological analysis, image thresholding, etc. and require significant parameter tuning. Here we describe an unsupervised method for segmenting drusen in fundus images. Methods: : Color fundus images were acquired using the Topcon50EX fundus camera and digitized on the Nikon 2000 Coolscan. Two sets of images were considered a) leveled (using a published quadratic and spline model of green (G) channel background (R. T. Smith, Arch. Ophthalmol., 2005; 123:200–6.)) and b) unleveled. A “gold–standard” was constructed by manual segmentation of the drusen by a retinal specialist. Data from the three channels (R,G,B) were processed using non–negative matrix factorization (NMF, D. Lee, NIPS, 2001; 13:556–62). NMF decomposes a multi–variant data set (X) into two matrices: a matrix of spectral signatures (S) and their corresponding spatial distribution (A). The spatial distribution matrix (A) was analyzed using K–means clustering to label all pixels in the image into one of three classes. The entire method is unsupervised and does not require manual intervention. This method has been previously demonstrated for use in NMR based metabolomics (S. Du, Proc. EMBS, Sept., 2005). Results: : Visual inspection of the labellings produced by the algorithm tended to correspond to drusen, blood vessels and normal retinal tissue. Comparison of segmentation with the manually defined gold standard showed a range of sensitivity for detection of drusen (leveled data 51%–88%, unleveled 70–75%), with a specificity of (leveled data 78–97%, unleveled 71–97%) across four cases. Interestingly, in many cases false negatives produced by the algorithm were along the borders of the gold–standard defined drusen, indicating that individual drusen were detected by the algorithm, though their size was underestimated. Conclusions: : The NMF method is able to recover spectral signatures of drusen, as well as other anatomical structures in the retina, using only the three bands in color fundus images. Related work of our group is exploring the use of hyperspectral imaging which provides richer spectral signatures and is in fact even better suited for an NMF–based decomposition/segmentation.
Purpose: : These studies were carried out to determine the spectral signatures of retinal structures which can be used for analysis and automated diagnosis of retinal disease. Since conventional pathology cannot determine the nature of retinal lesions in–situ, non–invasive methods must be used to quantify retinal pathology. One such a method may be by applying multispectral and hyperspectral methods and automated data analysis. Methods: : Unstained cross–sections of rabbit retinas mounted on glass slides were placed under a microscope and illuminated by white light. A monochromatic CCD camera combined with a liquid crystal tunable filter operating in the visible range was used to record the images at 10 nm intervals between 440 nm (blue) and 720 nm (red). Two methods were used to characterize the spectral signatures of the constituent tissues. The first required manual segmentation and consisted of determining the gray scale values as a function of frequency of the reflected light for the neural retina, the RPE, the choroid, and the sclera. The second used an unsupervised decomposition called non–negative matrix factorization (NMF) for the same four layers. NMF decomposes a multivariate data set into two matrices; a matrix of spectral signatures and their corresponding spatial distribution. Results: : The reflectance spectrum of each of the tissue layers obtained by the manual method formed a characteristic curve (signature) distinct in the frequency range studied and different for each layer. The signatures recovered using NMF have spatial distributions consistent with those obtained with manual segmentation. Both were consistent in recovering four distinct signatures. Conclusions: : Spectral signature characteristic of each of retinal layers investigated appears to be unique by both methods. As such these signatures lend themselves to being a tool for diagnosing retinal lesions that may have a different neural retina, RPE, choroidal, or scleral component.
This paper discusses the creation of a system for computer-aided communication through automated analysis and processing of electrooculogram signals. In situations of disease or trauma, there may be an inability to communicate with others through standard means such as speech or typing. Eye movement tends to be one of the last remaining active muscle capabilities for people with neurodegenerative disorders, such as amyotrophic lateral sclerosis (ALS) also known as Lou Gehrig’s disease. Thus, there is a need for eye movement based systems to enable communication. To meet this need, the Telepathix system was designed to accept eye movement commands denoted by looking to the left, looking to the right, and looking straight ahead to navigate a virtual keyboard. Using a ternary virtual keyboard layout and a multiple feature classification model, a typing speed of 6 letters per minute was achieved
Magnetic resonance spectroscopic imaging (MRSI) is utilized clinically in conjunction with anatomical MRI to assess the presence and extent of brain tumors and evaluate treatment response. Unfortunately, the clinical utility of MRSI is limited by significant variability of in vivo spectra. Spectral profiles show increased variability due to partial coverage of large voxel volumes, infiltration of normal brain tissue by tumors, innate tumor heterogeneity and measurement noise. This study investigates spectral separation as a novel quantification tool, addressing these problems directly by quantifying the abundance (i.e. volume fraction) within a voxel for each tissue type instead of the conventional estimation of metabolite concentrations from spectral resonance peaks. Present results on 20 clinical cases of brain tumors show reduced cross-subject variability. This reduced variability leads to improved discrimination between high and low-grade gliomas, confirming the physiological relevance of the extracted spectra. Further validation on phantom data demonstrates the accuracy of the estimated abundances. These results show that the proposed spectral analysis method can improve the effectiveness of MRSI as a diagnostic tool.
Purpose:: Current efforts for assessing macular disease have focused on the retina, for instance quantitation of drusen distributions. Retinal imaging, however, does not provide a complete picture of the nature of the expected vision loss. Important to consider is how the visual cortex responds to the resulting scotomata and distortion of the retinal input. Methods:: In this study we used an anatomically and physiologically detailed spiking neuron model of V1 (Wielaard and Sajda, Cerebral Cortex. 2006 16(11) 1531-1545) to investigate the effect of macular disease on cortical activity, tuning, and selectivity. We segmented fundus images and use them as “masks” for input to our cortical simulations. The model was probed using simulated drifting sinusoidal grating stimuli. All simulations were done using monocular input. We analyzed the firing rates and orientation selectivity of cells in parvocellular (4Cß) and magnocellular (4Cα) versions of the cortical model as a function of normal and abnormal retinal input. To analyze orientation selectivity we computed the circular variance (CV) across the population of cells. Results:: We found for the magnocellular model an overall reduction of firing rates of all cortical neurons. However there were no obvious “holes” of activity indicative of clusters of inactive neurons whose spatial position could be correlated with the spatial distribution of drusen. Analysis of orientation selectivity showed a dramatic reduction in selectivity for the normal vs abnormal cases. For the abnormal cases there was a shift of the CV distribution toward 1.0, indicating poorer orientation selectivity of the cells in 4Cα. For 4Cß the results are somewhat different. Unlike the magnocellular model, the parvocellular model showed clusters of inactivity which correlated with the spatial distribution of drusen. However the orientation selectivity was not significantly affected, with distributions between normal and abnormal cases being indistinguishable. Conclusions:: The magno system appears to fill-in spatial information though at the cost of a loss of orientation selectivity, were as the parvo system maintains orientation selectivity however with scotoma present in the cortical activity. This analysis is only “first order” in that drusen are treated purely as masking out the visual input, when in fact their effect on retinal ganglion cell activity can be more complex. Nonetheless, the simulations offer some insight into how responses of cortical neurons are affected by retinal disease.
Proton magnetic resonance spectroscopic imaging ( 1H MRSI) is a noninvasive metabolic imaging technique that has emerged as a potentially powerful tool for complementing structural magnetic resonance imaging (MRI) in the clinical evaluation of neurological disorders and diagnostic decision making. However, the relative complexity of methods that are currently available for analyzing the derived multi-dimensional metabolic imaging data has slowed incorporation of the technique into routine clinical practice. This paper discusses this impediment to widespread clinical use of 1H MRSI and then describes an automated data analysis approach that promises to facilitate use of the technique in the evaluation of intracranial lesions, with the potential to enhance the specificity of MRI and improve clinical decision-making.
We investigate using a previously developed spiking neuron model of layer 4 of primary visual cortex (V1)  as a recurrent network whose activity is consequently linearly decoded, given a set of complex visual stimuli. Our motivation is based on the following: 1) Linear decoders have proven useful in analyzing a variety of neural signals, including spikes, firing rates, local field potentials, voltage sensitive dye imaging, and scalp EEG, 2) linear decoding of activity generated from highly recurrent, nonlinear networks with fixed connections has been shown to provide universal computational capabilities, with such methods termed liquid state machines (LSM)  and echo state networks (ESN) , 3) in LSMs or ESNs often little is assumed about the recurrent network architecture. However it is likely that for a given type of stimulus/input, the architecture of a biologically constrained recurrent network is important since it shapes the spatio-temporal correlations across the neuronal population, which can potentially be exploited efficiently by an appropriate decoder. We conduct experiments using a two-alternative forced choice paradigm of face and car discrimination, where a set of 12 face (Max Plank Institute face database) and 12 car grey-scale images are used . All the images (512 x 512 pixels, 8 bits/pixel) have identical Fourier magnitude spectra. The phase spectra of the images are manipulated using the weighted mean phase method to introduce noise, resulting in a set of images graded by phase coherence. The sequence of images are presented to the V1 model (detailed in ) in a block design, where a face or car image is flashed for 50 ms, followed by interval of 200 ms in which a mean luminance background is shown. We use a linear decoder to map the spatio-temporal activity in the recurrent V1 model to a decision on whether the input stimulus is a face or a car. We employ a sparsity constraint on the decoder in order to control the dimension of the effective feature space. Sparse decoding is also consistent with previous research efforts on decoding multi-unit recording and optical imaging data. We evaluate the decoding accuracy of the linear decoding of the activity in the V1 model and compare that to a set of psychophysical data using the same stimuli. We construct a neurometric function for the decoder, with the variable of interest being the stimulus phase coherence. We find that linear decoding of neural activity in arecurrent V1 model can yield discrimination accuracy that is at least as good as, if not better than, human psychophysical performance for relatively complex visual stimuli. Thus substantial information for superaccurate decoding remains at the level of V1 and loss of information needed to better match behavioral performance is predicted to occur downstream in the decision making process. We also find a small improvement in discrimination accuracy when a spatio-temporal word is used relative to a spatial-only word, providing insight into the utility of a temporal vs. a rate code for behaviorally relevant decoding.
Purpose: : How visual stimuli map to neural activity and ultimately perception is important not only for understanding normal visual function but also for assessing how abnormalities and pathologies, for instance those arising in the retina, may ultimately affect perception. In this study we use a model of primary visual cortex (V1) as a substrate for mapping visual stimuli to a large population of neural activity and subsequently compare the accuracy of decoding this activity to the accuracy of human subjects for the same visual discrimination task. Methods: : We use a previously developed spiking neuron model of V1 as a recurrent network whose activity is consequently linearly decoded, providing a link to perception in the context of a visual discrimination task. We introduce a sparsity constraint in the decoder, given the hypothesis that information is sparsely distributed in a highly recurrent network of V1. A spatio-temporal word is constructed from the population spike trains, as input to the sparse decoder, to fully exploit the full dynamics of the model. We evaluate the decoding accuracy using a two alternative forced choice paradigm (face versus car discrimination) where we control the difficulty of the task by modulating the phase coherence in the images. We compare neurometric functions, constructed via the sparse decoding of the neural activity in the model, to psychometric functions obtained from 10 human subjects. Results: : In general, we find that relatively small fractions of the neurons are required for highly accurate decoding of the visual stimuli. We find that linear decoding of neural activity in a recurrent V1 model can yield discrimination accuracy that is at least as good as, if not better than, human psychophysical performance for relatively complex visual stimuli. Thus substantial information for super-accurate decoding remains at the level of V1 and loss of information needed to better match behavioral performance is predicted to occur downstream in the decision making process. We also find marginally better decoding accuracy by fully utilizing the spatial-temporal dynamics compared with a static decoding strategy. Conclusions: : We have demonstrated how we can link the visual stimulus to perception via a mapping through a spiking neuron model of the early visual system. Future work will consider this as a framework for potentially analyzing the perceptual effect of retinal vision loss in patients with mild yet progressive macular disease, comparing predictions to those obtained strictly from the analysis of the spatial distribution of retinal abnormalities such as drusen.
Purpose: : Clinical assessment of macular disease typically relies on direct analysis of retinal imaging, which does not necessarily provide a complete picture of expected vision loss. A potential advancement is a framework for predicting how retinal disease affects cortical activity and ultimately perceptual performance. Methods: : Fundus images for low-vision patients with macular disease were segmented to create masks, used to simulate disease-specific distortion at the level of the retina. A 2-AFC perceptual task was designed with the goal to discriminate face and car images in the presence of noise. 10 subjects with normal vision performed the task and their results were assessed via psychometric curves. We simulated the cortical activity given the stimuli and used linear decoding of spike trains to generate neurometric curves for the model. The sparse linear decoder was optimized to maximize discrimination and not to match subjects’ psychometric curves. We simulated the cortical activity of low-vision subjects using the mask-distorted stimuli and carried out the decoding analysis in the same manner as normal subjects. Results: : Shown are the mean psychometric curve for normal subjects (red), individual subjects (light red), mean neurometric curve for simulated “normal” subjects (black), and a simulated “low-vision” subject (gray). The mean simulated “normal” subject has a neurometric curve that is a reasonable match to normal subjects, for the most part falling within the inter-subject variation. For the simulated “low vision” case, the neurometric curve is shifted to the right indicating degradation in perceptual performance. Conclusions: : Our results are promising in that they predict healthy subject perceptual performance and also result in systematic shifts in performance for simulated “low-vision” cases. Future work will quantify the predictive value of the model for a population of low-vision patients.
We investigated neural correlates of target detection in the electroencephalogram (EEG) during a free viewing search task and analyzed signals locked to saccadic events. We adopted stimuli similar to ones we used previously to study target detection in serial presentations of briefly flashed images. Subjects performed the search task for multiple random scenes while we simultaneously recorded 64 channels of EEG and tracked subjects’ eye position. For each subject we identified target saccades (TS) and distractor saccades (DS). For TS, these were always saccades which were directly to the target and were followed by a correct behavioral response (button press); for DS, we used saccades in correctly responded trials having no target (these were 28% of the trials). We sampled the sets of TS and DS saccades such that they were equalized/matched for saccade direction and duration, ensuring no information in the saccade properties themselves was discriminating for their type. We aligned EEG to the saccade and used logistic repression (LR), in the space of the 64 electrodes, to identify components discriminating a TS from a DS on a single-trial basis. Specifically, LR was applied to narrow time windows (50ms) and discrimination was done for windows having varying latencies relative to the saccade. We found that there is significant discriminating activity in the EEG both before and after the saccade—average discriminability across 7 subjects was AUC=0.64, 80 ms before the saccade, and AUC=0.68, 60 ms after the saccade (p[[lt]]0.01 established using bootstrap resampling). Between these time periods we saw substantial reduction in discriminating activity (for 7 subjects, mean AUC=0.59). We conclude that that we can identify neural signatures of detection both before and after the saccade, indicating that the subject anticipates where the target is before he/she makes the last saccade to foveate and respond.
Purpose: Retinal imaging does not necessarily provide a complete picture of expected vision loss for macular disease. We use a psychophysics test coupled with computational modeling to relate pathologies, found via fundus imaging, to expected perceptual function for a group of AMD patients. Methods: We recruited 10 low-vision patients with mild yet progressive AMD, as well as 10 age-matched healthy controls at the Edward Harkness Eye Institute, Columbia Presbyterian Medical Center. Both patients and controls, whose ages ranged from 65 to 84, were corrected to 20/20 to 20/50 visual acuity. All the subjects participated in a 2-AFC perceptual task, in monocular mode, where they were required to discriminate face and car images in the presence of variable noise. Color fundus photographs were collected using a Zeiss FF 450 Plus camera. Fundus images were segmented using a robust and automated algorithm to quantify disease-specific pathologies on the retina. We mapped each patient’s retinal pathology to cortical activity and neurometric curves using a computational model of V1 and a decoding framework. We compared the psychometric curves between controls and patients, and investigated the quality of the neurometric predictions. We further analyzed the correlation between the neurometric curves with statistics of drusen in the masks. Results: AMD patients had substantially lower discrimination accuracies compared to controls. Moreover, the degradation in the discrimination accuracy of AMD patients was much more pronounced at higher signal-to-noise (SNR) levels of the stimulus. We observed a positive correlation (r = 0.67) between the fraction of drusen free area on the mask with the predicted perceptual discrimination at the highest SNR level for the stimulus. Conclusions: The psychophysics and modeling framework we developed provides a quantitative assessment for the perceptual consequences of AMD and can potentially serve as a method for relating clinical findings in retinal imaging to perceptual function.
Purpose: Drusen, the hallmark lesions of age related macular degeneration (AMD), are biochemically heterogeneous and the identification of their biochemical distribution is key to understanding AMD. Yet the challenges are to develop imaging technology and analysis tools which respect the physical generation of the hyperspectral signal in the presence of noise and multiple mixed sources while maximally exploiting the full data dimensionality to uncover clinically relevant spectral signatures. Methods: 7 patient eyes with drusen were imaged with the snapshot hyperspectral camera previously described (doi:10.1117/1.2434950). Regions of interest (ROI’s) of drusen were identified in each image. Multiple images were acquired of one eye. We performed statistical intra-subject analysis to investigate the reproducibility of non-negative matrix factorization (NMF) in AMD patients with different types of drusen. Given a data matrix D and a positive integer r the NMF problem is to compute a decomposition with r being the low-rank factor, W the basis vectors, and H the linear encoding representing the mixing coefficients. Results: Figure 1 shows central slices of 5 different ROIs for patient P=c. In each ROI a drusen sensitivity spectrum was recovered with a response peak between 550 and 600nm. This spectrum had low variability across different ROIs within a patient (mean-standard error (σ) = 0.01) and between patients (σ=0.041). Conclusions: Snapshot hyperspectral images analyzed with NMF, which imposes physically realistic positivity constraints on the mixing process, recovered spectral profiles that reliably identified drusen. The recovered spectra were consistently similar for drusen in different areas of the macula from the same eye and also in different eyes. Our results suggest that hyperspectral imaging can detect biochemically meaningful components of drusen.
Purpose Excitation of RPE autofluorescence with different wavelengths produces different but closely related spectral data. We hypothesized that simultaneous decomposition of multiple hyperspectral datasets into major spectral signatures and their spatial distributions with non negative matrix factorization (NMF) could exploit these relationships to recover results superior to factoring any single hypercube. Methods Pure RPE/BrM flat mounts were separately excited at 436-460nm and 480-510nm and hyperspectral emission data were captured by methods described in detail by Johri and Agarwal abstracts. Standard NMF factors a hypercube A into the product of matrices W and H (Fig 1a), where W is the spectra of the recovered sources and H carries their spatial localizations (abundance images). In our formulation, we always retrieve 4 spectral signatures for RPE and one for BrM. We paired each signal found at 436nm excitation to its corresponding signal at 480nm, and linked the two datasets by requiring that the spatial localizations of the paired signals must be exactly the same, because they come from the same compound. (Fig 1b) Results Fig. 2 (a, b) shows the 5 spectra recovered from the fovea of a 34 y/o female donor at 436nm and 480nm with standard NMF. The spectra are clearly paired according to the emission maxima. Fig 2c shows the results when the data are decomposed simultaneously: 10 abundant spectra are clearly paired in shape and location, suggesting single species. Each pair corresponds to one clearly defined abundance image. Conclusions Simultaneous decomposition of multiple RPE hyperspectral datasets is superior to standard NMF at breaking down a complex spectrum representing a mixture of fluorophors into its individual spectral signals, hence providing better candidates for biochemical identification.
Purpose Isolate and compare individual candidate autofluorescence (AF) signals from human RPE/Bruch’s membrane (BrM) flat mounts with hyperspectral AF imaging and mathematical modeling across age, retinal locations and two excitation wavelengths. Methods RPE/BrM-only flat-mounts from 11 belts of normal chorioretinal human tissue (5 donors < 50 yrs, 6 > 80 yrs; 8 females, 3 males) were prepared by removing the retina and choroid under photographic control for maintaining foveal position. Spectral microscopy and hyperspectral AF imaging were performed at 2 excitation bands, 436-460nm and 480-510nm, with emissions captured using the Nuance FX camera (Caliper Life Sciences, US) between 420-720nm in 10nm intervals at 3 locations: fovea, parafovea (2-4mm superior to fovea, at the rod peak) and periphery (8-10mm superior, at the highest rod:cone ratio), giving 66 hyperspectral data sets, consisting of photon counts per second recorded at each spatial pixel in the 40X field and wavelength. Results Gaussian mixture modeling and mathematical factorization of the hypercubes were applied to extract four RPE candidate spectra for lipofuscin at each location for each donor (see abstract by Johri et al for details). The four peaks were seen at average wavelengths of 566±6nm, 604±27nm, 645±8nm, and 701±8nm at the 436-460nm excitation (Fig. 1) and 558±8nm, 606±5nm, 646±8nm and 694±13nm at the 480-510nm excitation across all donors. The peak near 600nm (A2E-like) was generally the smallest peak amongst the four. The emission maxima varied for donors across age and locations, but all spectra were present in all but 6/108 data sets. There were no consistent regional or age trends in peak intensities. Conclusions Hyperspectral AF imaging analysis of the RPE ex vivo consistently reports the presence of at least 4 abundant fluorophors with well-defined emission maxima across all studied ages, retinal locations, and excitation wavelengths. Determining the actual abundant source molecules that produce these signals will be important in understanding RPE physiology.
Purpose Identify and quantify candidate AF signals from BrM in RPE/BrM with flat-mounts of human donor eyes using ex vivo hyperspectral AF imaging and mathematical modeling. Methods Flat-mounts from 11 human eyes lacking chorioretinal pathology (6 donors 80 yrs) were prepared by removing the retina and choroid and studied at 3 locations with distinct photoreceptor content in overlying retina: fovea, perifovea (2-4mm superior to fovea), and periphery (10-12mm superior to fovea). RPE was further removed from a region at each location to provide 33 samples of isolated BrM for spectral microscopy (Zeiss Axio Imager A2 microscope (Carl Zeiss, Jena, Germany) with Plan-Apochromat objective optics (excitation: 430 nm; emission: long-pass fluorescence filter). Hyperspectral AF images were captured at emissions between 420 to 720 nm in 10 nm intervals using the Nuance FX camera (Caliper Life Sciences, US) and saved as data hypercubes with two spatial and one wavelength dimension. Results Gaussian mixture modeling and mathematical factorization of the hypercubes were applied to extract 4 dominant BrM candidate spectra from each sample. Comparison with lipofuscin spectra independently obtained from these locations showed two shorter wavelength peaks that were unique to BrM, one always present near 533nm and another near 488nm that was present at a statistically significantly higher rate in the older donor populations (Fischer exact test, p = 0.0272) (table1). There was also a trend for the 488nm wavelength to be present more peripherally (table 2). Two other peaks were found near 600nm and 690nm. The mean values (nm) of peak centers were: Fovea: 698 ± 22.4, 605 ± 15.7, 534 ± 4.0, 489 ± 2.9 Parafovea: 688 ± 28.2, 603 ± 19.4, 536 ± 7.2, 492 ± 4.4 Periphery: 689 ± 12.6,, 602 ± 11.4, 534 ± 3.7, 489 ± 3.3 Conclusions Candidate individual emission spectra for BrM suggest a population of fluorophors. A well-defined source with emission at 488nm appears to increase with age. Peaks at 600-690nm resemble those independently determined for RPE lipofuscin at the same locations. Whether these represent bis-retinoids requires further elucidation in tissues subject to lipid extraction. Biochemical identification of these species will be important in understanding BrM physiology in health and disease and for interpreting clinical hyperspectral imaging.
Purpose To devise a mathematical algorithm that can extract individual spectral fluorophor components and their spatial localizations from hyperspectral autofluorescence (AF) emission data taken from RPE and Bruch’s membrane (BrM) human donor flat mounts (ex vivo). Methods Step 1: Hyperspectral cube acquisition: The AF of eleven pure human RPE/BrM flatmounts was studied at 3 locations (fovea, parafovea and periphery) via excitation at wavelengths 436-460 nm and 480-510 nm at 40X magnification. The corresponding hyperspectral emission data (hypercubes of two spatial and one spectral dimension) were captured using the Nuance FX camera (Caliper Life Sciences, US). (Further details in K. Agarwal abstract); Step 2: Gaussian modeling: We fit the original RPE spectra with mixtures of four Gaussian curves (Fig. 1), which provided single peak, smooth candidates for individual fluorophor components; Step 3: NMF modeling: We used these candidate spectra to initialize an NMF technique that factors the entire hypercube to recover constituent source spectra and their spatial localizations minimizing error. We also initialized the NMF with the emission signal from a patch of bare BrM because BrM, underlying the RPE, contributes its signal throughout. Results NMF models with Gaussian/BrM initialization consistently decomposed RPE AF hypercubes into smooth individual candidate spectra with histologically plausible localizations within the flat-mount images (Fig. 2). For example, the shorter wavelength spectral component C3 localized to BrM (Fig. 2, Spatial Abundance C3), while the other four, emitting from 575 nm to 700 nm, localized to the lipofuscin compartment. Conclusions The Gaussian/NMF mixture model enabled consistent recovery of candidate spectra for individual RPE fluorophor emission signals with histologically plausible localizations. These spectra should now be matched to their corresponding biochemical components with techniques like imaging mass spectroscopy.
Purpose Quantify the hyperspectral AF signal from RPE/Bruch’s membrane (BrM) flat mounts. Methods Hyperspectral AF images (hypercubes) were captured from 66, 40X fields in 11 RPE/BrM flat mounts from human donor eyes using techniques described in detail in the abstract submitted by K. Agarwal. Briefly, for each 40X field, the hypercube has the two spatial dimensions of the field, and at each spatial point the photon counts recorded at each wavelength, hence the third or spectral dimension. For reproducible quantification of these data, exposure times were calibrated so that photon counts per spectral channel fell within the 12-bit linear range of the detector and then were offset by the dark current. Scaled counts-per-second were determined by exposure time (Eqn. 1) and calibrated to a standard fluorescent reference (courtesy of F Delori) to correct for any variation in power of the excitation light, yielding quantified hypercubes with units of photon counts per second at each point and wavelength. Results The Root Mean Square(RMS ) difference of quantified hypercubes from repeat imaging of the same location was within the noise level (dark current) of the Nuance detector, establishing reproducibility. Separation of RPE signal from BrM (Fig. 2) and further mathematical analyses of the hypercubes (see abstract of A. Johri) therefore extracted reliable quantitative RPE lipofuscin spectra for individual constituents and their corresponding spatial co-localizations. Conclusions Hyperspectral AF images of human RPE flatmounts may be reliably quantified for use as a surrogates for measurement of abundant of lipofuscin components. Such quantitative information can help guide the analysis of RPE physiology and biochemistry.
Detection of events of interest in video involves evidence accumulation across space and time; the observer is required to integrate features from both motion and form to decide whether a behavior constituents a target event. Do such events that extend in time elicit evoked responses of similar strength as evoked responses associated with instantaneous events such as the presentation of a static target image? Using a set of simulated scenarios, with avatars/actors having different behaviors, we identified evoked neural activity discriminative of target vs. distractor events (behaviors) at discrimination levels that are comparable to static imagery. EEG discriminative activity was largely in the time-locked evoked response and not in oscillatory activity, with the exception of very low EEG frequency bands such as delta and theta, which simply represent bands dominating the event related potential (ERP). The discriminative evoked response activity we see is observed in all target/distractor conditions and is robust across different recordings from the same subjects. The results suggest that we have identified a robust neural correlate of target detection in video, at least in terms of the stimulus set we used-i.e., dynamic behavior of an individual in a low clutter environment. We discuss implications for using such a neural correlate for building a brain-computer interface (BCI) to search and annotate video. This work was done with Lucas Parra of the City College of New York (CCNY) and Dan Rosenthal and Paul DeGuzman of Neuromatters, LLC.
We believe that this special issue will serve to increase the public awareness and foster discussions on the multiple worldwide BRAIN initiatives, both within and outside the IEEE, providing an impetus for development of long-term cost-effective healthcare solutions. We also believe that the topics presented in this special issue will serve as scientific evidence for health and policy advocates of the value of neurotechnologies for improving the neurological and mental health and wellbeing of the general population. Below we briefly highlight the papers and technologies in this special issue.
We hypothesize that certain speaker gestures can convey significant information that are correlated to audience engagement. We propose gesture attributes, derived from speakers’ tracked hand motions to automatically quantify these gestures from video. Then, we demonstrate a correlation between gesture attributes and an objective method of measuring audience engagement: electroencephalography (EEG) in the domain of political debates. We collect 47 minutes of EEG recordings from each of 20 subjects watching clips of the 2012 U.S. Presidential debates. The subjects are examined in aggregate and in subgroups according to gender and political affiliation. We find statistically significant correlations between gesture attributes (particularly extremal pose) and our feature of engagement derived from EEG both with and without audio. For some stratifications, the Spearman rank correlation reaches as high as ρ = 0.283 with p < 0.05, Bonferroni corrected. From these results, we identify those gestures that can be used to measure engagement, principally those that break habitual gestural patterns.
The current bandwidth for understanding cognitive and emotional context of a person is much more limited between robots and humans than among humans. Advances in human sensing technologies over the past two decades hold promise for providing online and unique information sources that can lead to deeper insights into human cognitive and emotional state than are currently attainable. However, blind application of the human sensing technologies alone is not a solution. Here, we focus on the integration of neuroscience with robotic technologies for improving social interactions. We discuss the issue of uncertainty in human state detection and the need to develop approaches to estimate and integrate knowledge of that uncertainty. We illustrate this by discussing two application areas and the potential neuro-robotic technologies that could be developed within them
Multivariate pattern analysis (MVPA) has typically been used in neuroimaging to draw inferences from a single modality, e.g., functional magnetic resonance imaging (fMRI) or electroencephalography (EEG). As simultaneous acquisition of different neuroimaging modalities becomes more common, one consideration is how to apply MVPA methods to analyze the resulting multimodal dataspaces. We present a multi-modal fusion technique that seeks to simultaneously train a multivariate classifier and identify correlated components across the two modalities. We validate our approach on a real simultaneous EEG-fMRI dataset.
As we navigate our environment, we are constantly assessing the objects we encounter and deciding on their subjective interest to us. In this study, we investigate the neural and ocular correlates of this assessment as a step towards their potential use in a mobile human-computer interface (HCI). Past research has shown that multiple physiological signals are evoked by objects of interest during visual search in the laboratory, including gaze, pupil dilation, and neural activity; these have been exploited for use in various HCIs. We use a virtual environment to explore which of these signals are also evoked during exploration of a dynamic, free-viewing 3D environment. Using a hierarchical classifier and sequential forward floating selection (SFFS), we identify a small, robust set of features across multiple modalities that can be used to distinguish targets from distractors in the virtual environment. The identification of these features may serve as an important factor in the design of mobile HCIs.
In this paper we use state-of-the-art multimodal neuroimaging to tease apart the spatio-temporal sequence of neural activity that “goes through a hitter’s mind” when they recognize a baseball pitch. Specifically we utilize electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) to investigate the neural networks activated for correct and incorrect pitch classifications. Our previous analysis has shown where in the trajectory of a pitch the hitter’s neural activity correctly discriminates a pitch type (e.g. fastball, curveball or slider). Here, we show that correct classifications correlate with a neural network including both visual and sub-cortical motor areas, likely demonstrating a link between visual identification and the required rapid motor response. Conversely, we find that not only is this activity lacking in incorrect classifications, but that it is instead replaced by prefrontal cortex activity, which has been shown to be responsible for more deliberative conflict resolution. Synthesizing these and other results, we hypothesize the potential uses of this technology in the form of a brain computer interface (BCI) to measure and enhance baseball player performance.
Regularized logistic regression is a standard classification method used in statistics and machine learning. Unlike regularized least squares problems such as ridge regression, the parameter estimates cannot be computed in closed-form and instead must be estimated using an iterative technique. This paper addresses the computational problem of regularized logistic regression that is commonly encountered in model selection and classifier statistical significance testing, in which a large number of related logistic regression problems must be solved for. Our proposed approach solves the problems simultaneously through an iterative technique, which also garners computational efficiencies by leveraging the redundancies across the related problems. We demonstrate analytically that our method provides a substantial complexity reduction, which is further validated by our results on real-world datasets.
We present an efficient algorithm for simultaneously training elastic-net-regularized generalized linear models across many related problems, which may arise from bootstrapping, cross-validation and nonparametric permutation testing. Our approach leverages the redundancies across problems to obtain ≈ 10x computational improvements relative to solving the problems sequentially by the standard glmnet algorithm of (Friedman et al., 2010). We demonstrate our fast simultaneous training of generalized linear models (FaSTGLZ) algorithm, for multivariate analysis of fMRI and run otherwise computationally intensive bootstrapping and permutation test analyses that are typically necessary for obtaining statistically rigorous classification results and meaningful interpretation.
Logistic regression has been used as a supervised method for extracting EEG components predictive of binary perceptual decisions. However, often perceptual decisions require a choice between more than just two alternatives. In this paper we present results using multinomial logistic regression (MLR) for learning EEG components in a 3-way visual discrimination task. Subjects were required to decide between three object classes (faces, houses, and cars) for images which were embedded with varying amounts of noise. We recorded the subjects’ EEG while they were performing the task and then used MLR to predict the stimulus category, on a single-trial basis, for correct behavioral responses. We found an early component (at 170ms) that was consistent across all subjects and with previous binary discrimination paradigms. However a later component (at 300-400ms), previously reported in the binary discrimination paradigms, was more variable across subjects in this three-way discrimination task. We also computed forward models for the EEG components, with these showing a difference in the spatial distribution of component activity for the different categorical decisions. In summary, we find that logistic regression, generalized to the arbitrary N-class case, can be a useful approach for learning and analyzing EEG components underlying multi-class perceptual decisions.
Visual target detection is one of the most studied paradigms in human electrophysiology. Electroencephalo-graphic (EEG) correlates of target detection include the well-characterized N1, P2, and P300. In almost all cases the experimental paradigms used for studying visual target detection are extremely well-controlled – very simple stimuli are presented so as to minimize eye movements, and scenarios involve minimal active participation by the subject. However, to characterize these EEG correlates for real-world scenarios, where the target or the subject may be moving and the two may interact, a more flexible paradigm is required. The environment must be immersive and interactive, and the system must enable synchronization between events in the world, the behavior of the subject, and simultaneously recorded EEG signals. We have developed a hardware/software system that enables us to precisely control the appearance of objects in a 3D virtual environment, which subjects can navigate while the system tracks their eyes and records their EEG activity. We are using this environment to investigate a set of questions which focus on the relationship between the visibility, salience, and affect of the target; the agency and eye movements of the subject; and the resulting EEG signatures of detection. In this paper, we describe the design of our system and present some preliminary results regarding the EEG signatures of target detection.
This work extends Bilinear Discriminant Component Analysis to the case of oscillatory activity with allowed phase‐variability across trials. The proposed method learns a spatial profile together with a multitaper basis which can integrate oscillatory power in a band‐limited fashion. We demonstrate the method for predicting the handedness of a subject’s button press given multivariate EEG data. We show that our method learns multitapers sensitive to oscillatory activity in the 8–12 Hz range with spatial filters selective for lateralized motor cortex. This finding is consistent with the well‐known mu‐rhythm, whose power is known to modulate as a function of which hand a subject plans to move, and thus is expected to be discriminative (predictive) of the subject’s response.
Drusen, the hallmark lesions of age related macular degeneration (AMD), are biochemically heterogeneous and the identification of their biochemical distribution is key to the understanding of AMD. Yet the challenges are to develop imaging technology and analytics, which respect the physical generation of the hyperspectral signal in the presence of noise, artifacts, and multiple mixed sources while maximally exploiting the full data dimensionality to uncover clinically relevant spectral signatures. This paper reports on the statistical analysis of hyperspectral signatures of drusen and anatomical regions of interest using snapshot hyperspectral imaging and non-negative matrix factorization (NMF). We propose physical meaningful priors as initialization schemes to NMF for finding low-rank decompositions that capture the underlying physiology of drusen and the macular pigment. Preliminary results show that snapshot hyperspectral imaging in combination with NMF is able to detect biochemically meaningful components of drusen and the macular pigment. To our knowledge, this is the first reported demonstration in vivo of the separate absorbance peaks for lutein and zeaxanthin in macular pigment.
In this talk I will describe our work investigating sparse decoding of neural activity, given a realistic mapping of the visual scene to neuronal spike trains generated by a model of primary visual cortex (V1). We use a linear decoder which imposes sparsity via an L1 norm. The decoder can be viewed as a decoding neuron (linear summation followed by a sigmoidal nonlinearity) in which there are relatively few non-zero synaptic weights. We find: (1) the best decoding performance is for a representation that is sparse in both space and time, (2) decoding of a temporal code results in better performance than a rate code and is also a better fit to the psychophysical data, (3) the number of neurons required for decoding increases monotonically as signal-to-noise in the stimulus decreases, with as little as 1% of the neurons required for decoding at the highest signal-to-noise levels, and (4) sparse decoding results in a more accurate decoding of the stimulus and is a better fit to psychophysical performance than a distributed decoding, for example one imposed by an L2 norm. We conclude that sparse coding is well-justified from a decoding perspective in that it results in a minimum number of neurons and maximum accuracy when sparse representations can be decoded from the neural dynamics.
Our group has been investigating the development of BCI systems for improving information delivery to a user, specifically systems for triaging image content based on what captures a user’s attention. One of the systems we have developed uses single-trial EEG scores as noisy labels for a computer vision image retrieval system. In this paper we investigate how the noisy nature of the EEG-derived labels affects the resulting accuracy of the computer vision system. Specifically, we consider how the precision of the EEG scores affects the resulting precision of images retrieved by a graph-based transductive learning model designed to propagate image class labels based on image feature similarity and sparse labels.
A major challenge in single-trial electroencephalography (EEG) analysis and Brain Computer Interfacing (BCI) is the so called, inter-subject/inter-session variability: (i.e large variability in measurements obtained during different recording sessions). This variability restricts the number of samples available for single-trial analysis to a limited number that can be obtained during a single session. Here we propose a novel method that distinguishes between subject-invariant features and subject-specific features, based on a bilinear formulation. The method allows for one to combine multiple recording of EEG to estimate the subject-invariant parameters, hence addressing the issue of inter-subject variability, while reducing the complexity of estimation for the subject-specific parameters. The method is demonstrated on 34 datasets from two different experimental paradigms: Perception categorization task and Rapid Serial Visual Presentation (RSVP) task. We show significant improvements in classification performance over state-of-the-art methods. Further, our method extracts neurological components never before reported on the RSVP thus demonstrating the ability of our method to extract novel neural signatures from the data.
Human visual perception is able to recognize a wide range of targets under challenging conditions, but has limited throughput. Machine vision and automatic content analytics can process images at a high speed, but suffers from inadequate recognition accuracy for general target classes. In this paper, we propose a new paradigm to explore and combine the strengths of both systems. A single trial EEG-based brain machine interface (BCI) subsystem is used to detect objects of interest of arbitrary classes from an initial subset of images. The EEG detection outcomes are used as input to a graph-based pattern mining subsystem to identify, refine, and propagate the labels to retrieve relevant images from a much larger pool. The combined strategy is unique in its generality, robustness, and high throughput. It has great potential for advancing the state of the art in media retrieval applications. We have evaluated and demonstrated significant performance gains of the proposed system with multiple and diverse image classes over several data sets, including those from Internet (Caltech 101) and remote sensing images. In this paper, we will also present insights learned from the experiments and discuss future research directions.
A relatively new neuroimaging modality is simultaneous EEG and fMRI. Though such a multi-modal acquisition is attractive given that it can exploit the temporal resolution of EEG and spatial resolution of fMRI, it comes with unique signal processing and pattern classification challenges. In this paper I will review some our work at developing signal processing and pattern recognition for analysis of simultaneous EEG and fMRI, with a focus on those algorithms enabling a single-trial analysis of the neural signal. In general, these algorithms exploit the multivariate nature of the EEG, removing MR induced artifacts and classifying event-related signals that then can be correlated with the BOLD signal to yield specific fMRI activations.
Recent empirical evidence supports the hypothesis that invariant visual object recognition might result from non-linear encoding of the visual input followed by linear decoding . This hypothesis has received theoretical support through the development of neural network architectures which are based on a non-linear encoding of the input via recurrent network dynamics followed by a linear decoder , . In this paper we consider such an architecture in which the visual input is non-linearly encoded by a biologically realistic spiking model of V1, and mapped to a perceptual decision via a sparse linear decoder. Novel is that we 1) utilize a large-scale conductance based spiking neuron model of V1 which has been well-characterized in terms of classical and extra-classical response properties, and 2) use the model to investigate decoding over a large population of neurons. We compare decoding performance of the model system to human performance by comparing neurometric and psychometric curves.
We investigated neural correlates of target detection in the electroencephalogram (EEG) during a free viewing search task and analyzed signals locked to saccadic events. Subjects performed a search task for multiple random scenes while we simultaneously recorded 64 channels of EEG and tracked subjects eye position. For each subject we identified target saccades (TS) and distractor saccades (DS). We sampled the sets of TS and DS saccades such that they were equalized/matched for saccade direction and duration, ensuring that no information in the saccade properties themselves was discriminating for their type. We aligned EEG to the saccade onset and used logistic regression (LR), in the space of the 64 electrodes, to identify activity discriminating a TS from a DS on a single-trial basis. We found significant discriminating activity in the EEG both before and after the saccade. We also saw substantial reduction in discriminating activity when the saccade was executed. We conclude that we can identify neural signatures of detection both before and after the saccade, indicating that subjects anticipate the target before the last saccade, which serves to foveate and confirm the target identity.
Traditional analysis methods for single-trial classification of electro-encephalography (EEG) focus on two types of paradigms: phase locked methods, in which the amplitude of the signal is used as the feature for classification, i.e. event related potentials; and second order methods, in which the feature of interest is the power of the signal, i.e event related (de)synchronization. The process of deciding which paradigm to use is ad hoc and is driven by knowledge of neurological findings. Here we propose a unified method in which the algorithm learns the best first and second order spatial and temporal features for classification of EEG based on a bilinear model. The efficiency of the method is demonstrated in simulated and real EEG from a benchmark data set for Brain Computer Interface.
Proton magnetic resonance spectroscopic imaging (1H MRSI) is a noninvasive metabolic imaging technique that has emerged as a potentially powerful tool for complementing structural magnetic resonance imaging (MRI) in the clinical evaluation of neurological disorders and diagnostic decision-making. However, the relative complexity of methods that are currently available for analyzing the derived multi-dimensional metabolic imaging data has slowed incorporation of the technique into routine clinical practice. This paper discusses this impediment to widespread clinical use of 1H MRSI and then describes an automated data analysis approach that promises to facilitate use of the technique in the evaluation of intracranial lesions, with the potential to enhance the specificity of MRI and improve clinical decision-making.
In this paper we describe a system for simultaneously acquiring EEG and fMRI and evaluate it in terms of discriminating, single-trial, task-related neural components in the EEG. Using an auditory oddball stimulus paradigm, we acquire EEG data both inside and outside a 1.5T MR scanner and compare both power spectra and single-trial discrimination performance for both conditions. We find that EEG activity acquired inside the MR scanner during echo planer image acquisition is of high enough quality to enable single-trial discrimination performance that is 95 % of that acquired outside the scanner. We conclude that EEG acquired simultaneously with fMRI is of high enough fidelity to permit single-trial analysis.
Event-related potentials (ERPs) recorded at the scalp are indicators of brain activity associated with event-related information processing; hence they may be suitable for the assessment of changes in cognitive processing load. While the measurement of ERPs in a laboratory setting and classifying those ERPs is trivial, such a task presents major challenges in a “real world” setting where the EEG signals are recorded when subjects freely move their eyes and the sensory inputs are continuously, as opposed to discretely presented. Here we demonstrate that with the aid of second-order blind identification (SOBI), a blind source separation (BSS) algorithm: (1) we can extract ERPs from such challenging data sets; (2) we were able to obtain meaningful single-trial ERPs in addition to averaged ERPs; and (3) we were able to estimate the spatial origins of these ERPs. Finally, using back-propagation neural networks as classifiers, we show that these single-trial ERPs from specific brain regions can be used to determine moment-to-moment changes in cognitive processing load during a complex “real world” task.
In this paper we analyze a popular divisive normalization model of V1 with respect to the relationship between its underlying coding strategy and the extraclassical physiological responses of its constituent modeled neurons. Specifically we are interested in whether the optimization goal of redundancy reduction naturally leads to reasonable neural responses, including reasonable distributions of responses. The model is trained on an ensemble of natural images and tested using sinusoidal drifting gratings, with metrics such as suppression index and contrast dependent receptive field growth compared to the objective function values for a sample of neurons. We find that even though the divisive normalization model can produce “typical” neurons that agree with some neurophysiology data, distributions across samples do not agree with experimental data. Our results suggest that redundancy reduction itself is not necessarily causal of the observed extraclassical receptive field phenomena, and that additional optimization dimensions and/or biological constraints must be considered.
The timing of a behavioral response, such as a button press in reaction to a visual stimulus, is highly variable across trials. In this paper we describe a methodology for single-trial analysis of electroencephalography (EEG) which can be used to reduce the error in the estimation of the timing of the behavioral response and thus reduce the error in estimating the onset time of the stimulus. We consider a rapid serial visual presentation (RSVP) paradigm consisting of concatenated video clips and where subjects are instructed to respond when they see a predefined target. We show that a linear discriminator, with inputs distributed across sensors and time and chosen via an information theoretic feature selection criterion, can be used in conjunction with the response to yield a lower error estimate of the onset time of the target stimulus compared to the response time. We compare our results to response time and previous EEG approaches using fixed windows in time, showing that our method has the lowest estimation error. We discuss potential applications, specifically with respect to cortically-coupled computer vision based triage of large image databases
Based on a large scale spiking neuron model of the input layers 4Candof macaque, we identify neural mechanisms for the observed contrast dependent receptive field size of V1 cells. We observe a rich variety of m echanisms for the phenomenon and analyze them based on the relative gain of excitatory and inhibitory synaptic inputs. We observe an average growth in the spatial extent of excitation and inhibition for low contrast, as predicted fr om phenomenological models. However, contrary to phenomenological models, our simulation results suggest this is neither sufficient nor necessary to explain t he phenomenon.
We present a spatio-temporal linear discrimination method for single-trial classification of multi-channel electroencephalography (EEG). No prior information about the characteristics of the neural activity is required i.e. the algorithm requires no knowledge about the timing and/or spatial distribution of the evoked responses. The algorithm finds a temporal delay/window onset time for each EEG channel and then spatially integrates the channels for each channel-specific onset time. The algorithm can be seen as learning discrimination trajectories defined within the space of EEG channels. We demonstrate the method for detecting auditory evoked neural activity and discrimination of task difficulty in a complex visual-auditory environment
Using a rectification model and an experimentally measured distribution of the extracellular modulation ratio (F1/F0), we investigate the consistency between extracellular and intracellular modulation metrics for classifying cells in primary visual cortex (V1). We first demonstrate that the shape of the distribution of the intracellular metric χ is sensitive to the specific form of the bimodality observed in F1/F0. When the proper mapping between F1/F0 and χ is applied to the experimentally measured F1/F0 data, χ is weakly bimodal. We then use a two-class mixture model to estimate physiological response parameters given the F1/F0 distribution. We show, once again, that a weak bimodality is present in χ. Finally, using the estimated parameters for the two cell clases, we show that simple and complex cell class assignment in F1/F0 is more-or-less preserved in a heavy-tailed f1/f0 distribution, with complex cells being in the core of the f1/f0 distribution and simple cells in the tail (misclassification error in f1/f0 = 19%). Class assignment in f1/f0 is likewise consistent (misclassification error in F1/F0 = 15%). Our results provide computational support for the conclusion that extracellular and intracellular metrics are relatively consistent measures for classifying cells in V1 as either simple or complex.
1H magnetic resonance spectra (MRS) of biofluids contain rich biochemical information about the metabolic status of an organism. Through the application of pattern recognition and classification algorithms, such data have been shown to provide information for disease diagnosis as well as the effects of potential therapeutics. In this paper we describe a novel approach, using non-negative matrix factorization (NMF), for rapidly identifying metabolically meaningful spectral patterns in1H MRS. We show that the intensities of these identified spectral patterns can be related to the onset of, and recovery from, toxicity in both a time-related and dose-related fashion. These patterns can be seen as a new type of biomarker for the biological effect under study. We demonstrate, using k-means clustering, that the recovered patterns can be used to characterize the metabolic status of the animal during the experiment.
In this paper we compare three linear methods, independent component analysis (ICA), common spatial patterns (CSP), and linear discrimination (LD) for recovering task relevant neural activity from high spatial density electroencephalography (EEG). Each linear method uses a different objective function to recover underlying source components by exploiting statistical structure across a large number of sensors. We test these methods using a dual-task event-related paradigm. While engaged in a primary task, subjects must detect infrequent changes in the visual display, which would be expected to evoke several well-known event-related potentials (ERPs), including the N2 and P3. We find that though each method utilizes a different objective function, they in fact yield similar components. We note that one advantage of the LD approach is that the recovered component is easily interpretable, namely it represents the component within a given time window which is most discriminating for the task, given a spatial integration of the sensors. Both ICA and CSP return multiple components, of which the most discriminating component may not be the first. Thus, for these methods, visual inspection or additional processing is required to determine the significance of these components for the task.
We present a multi-resolution hierarchical application of the constrained non-negative matrix factorization (cNMF) algorithm for blindly recovering constituent source spectra in magnetic resonance spectroscopic imaging (MRSI). cNMF is an extension of non-negative matrix factorization (NMF) that includes a positivity constraint on amplitudes of recovered spectra. We apply cNMF hierarchically, with spectral recovery and subspace reduction constraining which observations are used in the next level of processing. The decomposition model recovers physically meaningful spectra which are highly tissue-specific, for example spectra indicative of tumor proliferation, given a processing hierarchy that proceeds coarse-to-fine. We demonstrate the decomposition procedure on /sup 1/H long TE brain MRS data. The results show recovery of markers for normal brain tissue, low proliferative tissue and highly proliferative tissue. The coarse-to-fine hierarchy also makes the algorithm computationally efficient, thus it is potentially well-suited for use in diagnostic work-up.
We present an algorithm for blindly recovering constituent source spectra from magnetic resonance spectroscopic imaging (MRSI) of human brain. The algorithm is based on the non-negative matrix factorization (NMF) algorithm, extending it to include a constraint on the positivity of the amplitudes of the recovered spectra and mixing matrices. This positivity constraint enables recovery of physically meaningful spectra even in the presence of noise that causes a significant number of the observation amplitudes to be negative. The algorithm, which we call constrained non-negative matrix factorization (cNMF), does not enforce independence or sparsity, though it recovers sparse sources quite well. It can be viewed as a maximum likelihood approach for finding basis vectors in a bounded subspace. In this case the optimal basis vectors are the ones that envelope the observed data with a minimum deviation from the boundaries. We incorporate the cNMF algorithm into a hierarchical decomposition framework, showing that it can be used to recover tissue-specific spectra, e.g., spectra indicative of malignant tumor. We demonstrate the hierarchical procedure on 1H long echo time (TE) brain absorption spectra and conclude that the computational efficiency of the cNMF algorithm makes it well-suited for use in diagnostic work-up.
Several theories of early visual perception hypothesize neural circuits that are responsible for assigning ownership of an object’s occluding contour to a region which represents the “figure”. Previously, we presented a Bayesian network model which integrates multiple cues and uses belief propagation to infer direction of figure (DOF) along an object’s occluding contour. In this paper, we use a linear integrate-and-fire model to demonstrate how such inference mechanisms could be carried out in a biologically realistic neural circuit. The circuit, modeled after the network proposed by Rao, maps the membrane potentials of individual neurons to log probabilities and uses recurrent connections to represent transition probabilities. The network’s “perception ” of DOF is demonstrated for several examples, including perceptually ambiguous figures, with results qualitatively consistent with human perception.
Psychophysical data have demonstrated that our visual system must integrate multiple, spatially local and non-local cues to construct the visual scene. In this paper we describe a probabilistic network model which integrates visual cues to infer intermediate-level visual representations. We demonstrate the network model for two example problems: inferring “direction of figure” (DOF)  and estimating perceived velocity. One can consider the assignment of DOF as essentially a problem in probabilistic inference, with DOF being a hidden variable, assigning “ownership” of an object’s occluding boundary to a region which represents the “figure”. The DOF is not directly observed but can potentially be inferred from local observations and “message passing”. For example, our model combines contour convexity and similarity/proximity cues to form observations, with belief propagation (BP) used to integrate these observations with state probabilities to infer the DOF. We extend the network model, integrating form and motion streams, to explain the coherence based motion effects first demonstrated by McDermott et al. . The extended model consists of two interacting network chains (streams), one for inferring DOF and the other for inferring scene motion. The local figure-ground relationships estimated in the DOF stream are subsequently used by the motion stream as evidence for surface occlusion, modulating the covariance of a Gaussian distribution used to model the velocity at apertures located at junction points. The distribution of scene motion ultimately is represented in velocity space as a mixture of these form-modulated Gaussians. Simulation results show that the network’s integration of cues can account for several examples of perceptual ambiguity in DOF, consistent with human perception. Also, the integration of form and motion representations qualitatively accounts for psychophysical results showing surface dependent motion coherence of oscillating edges . We also show that the model naturally integrates top-down cues, leading to perceptual bias in interpreting ambiguous figures, such as Rubin’s vase, as well as bias in the perceived coherence of object motion.
In this paper we describe a non-negative matrix factoriza- tion (NMF) for recovering constituent spectra in 3D chem- ical shift imaging (CSI). The method is based on the NMF algorithm of Lee and Seung (1), extending it to include a constraint on the minimum amplitude of the recovered spectra. This constrained NMF (cNMF) algorithm can be viewed as a maximum likelihood approach for finding ba- sis vectors in a bounded subspace. In this case the opti- mal basis vectors are the ones that envelope the observed data with a minimum deviation from the boundaries. Re- sults for P human brain data are compared to Bayesian Spectral Decomposition (BSD) (2) which considers a full Bayesian treatment of the source recovery problem and re- quires computationally expensive Monte Carlo methods. The cNMF algorithm is shown to recover the same con- stituent spectra as BSD, however in about less com- putational time.
Blind source separation (BSS) has been proposed as a method to analyze multi-channel electroencephalography (EEG) data. A basic issue in applying BSS algorithms is the validity of the independence assumption. We investigate whether EEG can be considered to be a linear combination of independent sources. Linear BSS can be obtained under the assumptions of non-Gaussian, non-stationary, or non-white independent sources. If the linear independence hypothesis is violated, these three different conditions will not necessarily lead to the same result. We show, using 64 channel EEG data, that different algorithms which incorporate the three different assumptions lead to the same results, thus supporting the linear independence hypothesis.
We describe a filter-based model of orientation processing in primary visual cortex (V1) and demonstrate that novelty in cortical “pinwheel” space can be used as a measure of perceptual salience. In the model, novelty is computed as the negative log likelihood of a pinwheel’s activity relative to the population response. The population response is modeled using a mixture of Gaussians, enabling the representation of complex, multi-modal distributions. Hidden variables that are inferred in the mixture model can be viewed as grouping or “binding” pinwheels which have similar responses within the distribution space. Results are shown for several stimuli that illustrate well-known contextual effects related to perceptual salience, as well as results for a natural image.
We describe a method, using linear discrimination, for detecting single-trial EEG signatures of object recognition events in a rapid serial visual presentation (RSVP) task. We record EEG using a high spatial density array (87 electrodes) during the rapid presentation (50-200 msec per image) of natural images. Subjects were instructed to release a button when they recognized a target image (an image with a person/people). Trials consisted of 100 images each, with a 50% chance of a single target being in a trial. Subject EEG was analyzed on a single-trial basis with an optimal spatial linear discriminator learned at multiple time windows after the presentation of an image. Linear discrimination enables the estimation of a forward model and thus allows for an approximate localization of the discriminating activity. Results show multiple loci for discriminating activity (e.g. motor and visual). Using these detected EEG signatures, we show that in many cases we can detect targets more accurately than the overt response (button release) and that such signatures can be used to prioritize images for high-throughput search.
Optical imaging studies have played an important role in mapping the orientation selectivity and ocular dominance of neurons across an extended area of primary visual cortex (V1). Such studies have produced images with a more or less smooth and regular spatial distribution of relevant neuronal response properties. This is in spite of the fact that results from electrophysiological recordings, though limited in their number and spatial distribution, show significant scatter/variability in the relevant response properties of nearby neurons. In this paper we present a simulation of the optical imaging experiments of ocular dominance and orientation selectivity using a computational model of the primary visual cortex. The simulations assume that the optical imaging signal is proportional to the averaged response of neighboring neurons. The model faithfully reproduces ocular dominance columns and orientation pinwheels in the presence of realistic scatter of single cell preferred responses. In addition,we find the simulated optical imaging of orientation pinwheels to be remarkably robust, with the pinwheel structure maintained up to an addition of degrees of random scatter in the orientation preference of single cells. Our results suggest that an optical imaging result does not necessarily, by itself, provide any obvious upperbound for the scatter of the underlying neuronal response properties on local scales.
In this paper a constrained non-negative matrix factorization (cNMF) algorithm for recovering constituent spectra is described together with experiments demonstrating the broad utility of the approach. The algorithm is based on the NMF algorithm of Lee and Seung, extending it to include a constraint on the minimum amplitude of the recovered spectra. This constraint enables the algorithm to deal with observations having negative values by assuming they arise from the noise distribution. The cNMF algorithm does not explicitly enforce independence or sparsity, instead only requiring the source and mixing matrices to be non-negative. The algorithm is very fast compared to other “blind” methods for recovering spectra. cNMF can be viewed as a maximum likelihood approach for finding basis vectors in a bounded subspace. In this case the optimal basis vectors are the ones that envelope the observed data with a minimum deviation from the boundaries. Results for Raman spectral data, hyperspectral images, and 31P human brain data are provided to illustrate the algorithm’s performance.
In this paper we use linear discrimination for learning EEG signatures of object recognition events in a rapid serial visual presentation (RSVP) task. We record EEG using a high spatial density array (63 electrodes) during the rapid presentation (50-200 msec per image) of natural images. Each trial consists of 100 images, with a 50% chance of a single target being in a trial. Subjects are instructed to press a left mouse button at the end of the trial if they detected a target image, otherwise they are instructed to press the right button. Subject EEG was analyzed on a single-trial basis with an optimal spatial linear discriminator learned at multiple time windows after the presentation of an image. Analysis of discrimination results indicated a periodic fluctuation (time-localized oscillation) in A/sub z/ performance. Analysis of the EEG using the discrimination components learned at the peaks of the A/sub z/ fluctuations indicate 1) the presence of a positive evoked response, followed in time by a negative evoked response in strongly overlapping areas and 2) a component which is not correlated with the discriminator learned during the time-localized fluctuation. Results suggest that multiple signatures, varying over time, may exist for discriminating between target and distractor trials.
In this paper we summarize our results for two classes of hierarchical multi-scale models that exploit contextual information for detection of structure in mammographic imagery. The first model, the hierarchical pyramid neural network (HPNN), is a discriminative model which is capable of integrating information either coarse-to-fine or fine-to-coarse for microcalcification and mass detection. The second model, the hierarchical image probability (HIP) model, captures short-range and contextual dependencies through a combination of coarse-to-fine factoring and a set of hidden variables. The HIP model, being a generative model, has broad utility, and we present results for classification, synthesis and compression of mammographic mass images. The two models demonstrate the utility of the hierarchical multi-scale framework for computer assisted detection and diagnosis.
We present evidence that several higher-order statistical properties of natural images and signals can be explained by a stochastic model which simply varies scale of an otherwise stationary Gaussian process. We discuss two interesting consequences. The first is that a variety of natural signals can be related through a common model of spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. The second is that in some cases the non-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. This is demonstrated on spectro-temporal components of speech.
We develop a probability model over image spaces and demonstrate its broad utility in mammographic image analysis. The model employs a pyramid representation to factor images across scale and a tree-structured set of hidden variables to capture long-range spatial dependencies. This factoring makes the computation of the density functions local and tractable. The result is a hierarchical mixture of conditional probabilities, similar to a hidden Markov model on a tree. The model parameters are found with maximum likelihood estimation using the EM algorithm. The utility of the model is demonstrated for three applications; 1) detection of mammographic masses in computer-aided diagnosis 2) qualitative assessment of model structure through mammographic synthesis and 3) compression of mammographic regions of interest.
In hyperspectral imagery one pixel typically consists of a mixture of the reflectance spectra of several materials’ where the mixture coefficients correspond to the abundances of the constituting materials. We assume linear combinations of reflectance spectra with some additive normal sensor noise and derive a probabilistic MAP framework for analyzing hyperspectral data. As the material reflectance characteristics are not know a priori’ we face the problem of unsupervised linear unmixing. The incorporation of different prior information (e.g. positivity and normalization of the abundances) naturally leads to a family of interesting algorithms’ for case yielding an algorithm that can be understood as constrained independent component analysis (ICA). Simulations underline the usefulness of our theory.
We formulate a model for probability distributions on image spaces. We show that any distribution of images can be factored exactly into conditional distributions of feature vectors at one resolution (pyramid level) conditioned on the image information at lower resolutions. We would like to factor this over positions in the pyramid levels to make it tractable, but such factoring may miss long-range dependencies. To fix this, we introduce hidden class labels at each pixel in the pyramid. The result is a hierarchical mixture of conditional probabilities, similar to a hidden Markov model on a tree. The model parameters can be found with maximum likelihood estimation using the EM algorithm. We have obtained encouraging preliminary results on the problems of detecting masses in mammograms.
A fundamental problem in image analysis is the integration of information across scale to detect and classify objects. We have developed, within a machine learning framework, two classes of multiresolution models for integrating scale information for object detection and classification-a discriminative model called the hierarchical pyramid neural network and a generative model called a hierarchical image probability model. Using receiver operating characteristic analysis, we show that these models can significantly reduce the false positive rates for a well-established computer-aided diagnosis system.
We formulate a model for probability distributions on image spaces. We show that any distribution of images can be factored exactly into conditional distributions of feature vectors at one resolution (pyramid level) conditioned on the image information at lower resolutions. We would like to factor this over positions in the pyramid levels to make it tractable, but such factoring may miss long-range dependencies. To capture long-range dependencies, we introduce hidden class labels at each pixel in the pyramid. The result is a hierarchical mixture of conditional probabilities, similar to a hidden Markov model on a tree. The model parameters can be found with maximum likelihood estimation using the EM algorithm. We have obtained encouraging preliminary results on the problems of detecting various objects in SAR images and target recognition in optical aerial images.
In this paper we explore the use of feature selection techniques to improve the generalization performance of pattern recognizers for computer-aided diagnosis. We apply a modified version of the sequential forward floating selection (SFFS) of Pudil et al. to the problem of selecting an optimal feature subset for mass detection in digitized mammograms. The complete feature set consists of multi-scale tangential and radial gradients in the mammogram region of interest. We train a simple multi-layer perceptron (MLP) using the SFFS algorithm and compare its performance, using a jackknife procedure, to an MLP trained on the complete feature set (35 features). Results indicate that a variable number of features is chosen in each of the jackknife sets (12 +/- 4) and the test performance, Az, using the chosen feature subset is no better than the performance using the entire feature set. These results may be attributed to the fact that the feature set is noisy and the data set used for training/testing is small. We next modify the feature selection technique by using the results of the jackknife to compute the frequency at which different features are selected. We construct a classifier by choosing the top N features, selected most frequently, which maximize performance on the training data. We find that by adding this `hand-tuning’ component to the feature selection process, we can reduce the feature set from 35 to 8 features and at the same time have a statistically significant increase in generalization performance (p < 0.015).
We have previously presented a hierarchical pyramid/neural network (HPNN) architecture which combines multi-scale image processing techniques with neural networks. This coarse-to- fine HPNN was designed to learn large-scale context information for detecting small objects. We have developed a similar architecture to detect mammographic masses (malignant tumors). Since masses are large, extended objects, the coarse-to-fine HPNN architecture is not suitable for the problem. Instead we constructed a fine-to- coarse HPNN architecture which is designed to learn small- scale detail structure associated with the extended objects. Our initial result applying the fine-to-coarse HPNN to mass detection are encouraging, with detection performance improvements of about 30%. We conclude that the ability of the HPNN architecture to integrate information across scales, from fine to coarse in the case of masses, makes it well suited for detecting objects which may have detail structure occurring at scales other than the natural scale of the object.
We have previously presented a coarse-to-fine hierarchical pyramid/ neural network (HPNN) architecture which combines multi-scale image processing techniques with neural networks. In this paper we present applications of this general architecture to two problems in mammographic Computer-Aided Diagnosis (CAD). The first application is the detection of microcalcifications. The coarse-to-fine HPNN was designed to learn large-scale context information for detecting small objects like microcalcifications. Receiver operating characteristic (ROC) analysis suggests that the hierarchical architecture improves detection performance of a well established CAD system by roughly 50%. The second application is to detect mammographic masses directly. Since masses are large, extended objects, the coarse-to-fine HPNN architecture is not suitable for this problem. Instead we construct a fine-to-coarse HPNN architecture which is designed to learn small-scale detail structure associated with the extended objects. Our initial results applying the fine-to-coarse HPNN to mass detection are encouraging, with detection performance improvements of about 36%. We conclude that the ability of the HPNN architecture to integrate information across scales, both coarse-to-fine and fine-to-coarse, makes it well suited for detecting objects which may have contextual clues or detail structure occurring at scales other than the natural scale of the object.
Microcalcifications are important cues used by radiologists for early detection in breast cancer. Individually, microcalcifications are difficult to detect, and often contextual information (e.g. clustering, location relative to ducts) can be exploited to aid in their detection. We have developed an algorithm for constructing a hierarchical pyramid/neural network (HPNN) architecture to automatically learn context information for detection. To test the HPNN we first examined if the hierarchical architecture improves detection of individual microcalcifications and if context is in fact extracted by the network hierarchy. We compared the performance of our hierarchical architecture versus a single neural network receiving input from all resolutions of a feature pyramid. Receiver operator characteristic (ROC) analysis shows that the hierarchical architecture reduces false positives by a factor of two. We examined hidden units at various levels of the processing hierarchy and found what appears to be representations of ductal location. We next investigated the utility of the HPNN if integrated as part of a complete computer-aided diagnosis (CAD) system for microcalcification detection, such as that being developed at the University of Chicago. Using ROC analysis, we tested the HPNN’s ability to eliminate false positive regions of interest generated by the computer, comparing its performance to the neural network currently used in the Chicago system. The HPNN achieves an area under the ROC curve of Az equal to .94 and a false positive fraction of FPF equal to .21 at TPF equals 1.0. This is in comparison to the results reported for the Chicago network; Az equal to .91, FPF equal to .43 at TPF equal to 1.0. These differences are statistically significant. We conclude that the HPNN algorithm is able to utilize contextual information for improving microcalcifications detection and potentially reduce the false positive rates in CAD systems.
Despite a long history of neurological, psychological, and computational efforts, no satisfactory explanation has been offered for the extraordinary ability of humans to recognize other human faces. However, a number of different network- based approaches (Turk and Pentland,1991; Brunelli and Poggio, 1993; Buhmann et al., 1989) have achieved surprisingly good ability to recognize faces, at least under certain restricted conditions. We decided to compare the solutions developed by different network architectures including PDP and radial basis function (RBF) networks to the problem of gender classification. Given a picture of a face, including external features such as hair, beard, jewelry, etc., the network must learn to distinguish male from female. This is a simpler problem than general face recognition, and there is some evidence that it is carried out by a separate population of cells in the inferior temporal cortex (Damasio et. al., 1990). Several investigators have previously applied PDP networks to the problem of gender classification (Golomb et al., 1989; Cottrell and Metcalfe, 1989). However, the hidden unit representations developed in those models were not analyzed in detail. Moreover, we wanted to directly compare the representations developed by different types of networks (PDP, RBF) when confronted with the exact same training and test set
An important problem in image analysis is finding small objects in large images. The problem is challenging because (1) searching a large image is computationally expensive, and (2) small targets (on the order of a few pixels in size) have relatively few distinctive features which enable them to be distinguished from non-targets. To overcome these challenges we have developed a hierarchical neural network (HNN) architecture which combines multi-resolution pyramid processing with neural networks. The advantages of the architecture are: (1) both neural network training and testing can be done efficiently through coarse-to-fine techniques, and (2) such a system is capable of learning low-resolution contextual information to facilitate the detection of small target objects. We have applied this neural network architecture to two problems in which contextual information appears to be important for detecting small targets. The first problem is one of automatic target recognition (ATR), specifically the problem of detecting buildings in aerial photographs. The second problem focuses on a medical application, namely searching mammograms for microcalcifications, which are cues for breast cancer. Receiver operating characteristic (ROC) analysis suggests that the hierarchical architecture improves the detection accuracy for both the ATR and microcalcification detection problems, reducing false positive rates by a significant factor. In addition, we have examined the hidden units at various levels of the processing hierarchy and found what appears to be representations of road location (for the ATR example) and ductal/vasculature location (for mammography), both of which are in agreement with the contextual information used by humans to find these classes of targets. We conclude that this hierarchical neural network architecture is able to automatically extract contextual information in imagery and utilize it for target detection.
An important problem in image analysis is finding small objects in large images. The problem is challenging because: 1) searching a large image is computationally expensive; and 2) small targets (on the order of a few pixels in size) have relatively few distinctive features which enable them to be distinguished from non-targets. To overcome these challenges the authors have developed a hierarchical neural network architecture which combines multiresolution pyramid processing with neural networks. Here the authors discuss the application of their hierarchical neural network architecture to the problem of detecting microcalcifications in digital mammograms. Microcalcifications are cues for breast tumors. 30% to 50% of breast carcinomas have microcalcifications visible in mammograms while 60% to 80% of all breast tumors eventually show microcalcifications via histology. Similar to the building/ATR problem, microcalcifications are generally very small point-like objects (
A model is proposed which directly links the perception of illusory contours to intermediate-level cortical processes for visual surface discrimination. An important assertion of the model is that illusory contours are reentered, via feedback, into surface discrimination processes with the result being the construction of illusory surfaces. The model is tested in a number of simulations which demonstrate surface completion, generation of illusory contours, and interactions with depth cues from stereopsis.
Physiology has shown that the neural machinery of “early vision” is well suited for extracting edges and determining orientation of contours in the visual field. However, when looking at objects in a scene our perception is not dominated by edges and contours but rather by surfaces. Previous models have attributed surface segmentation to filling-in processes, typically based on diffusion. Though diffusion related mechanisms may be important for perceptual filling-in , it is unclear how such mechanisms would discriminate multiple, overlapping surfaces, as might result from occlusion or transparency. For the case of occlusion, surfaces exist on either side of a boundary and the problem is not to fill-in the surfaces but to determine which surface “owns” the boundary . This problem of boundary “ownership” can also be considered a special case of the binding problem, with a surface being “bound” to a contour.
We propose that the binding and segmentation of visual features is mediated by two complementary mechanisms; a low resolution, spatial-based, resource-free process and a high resolution, temporal-based, resource-limited process. In the visual cortex, the former depends upon the orderly topographic organization in striate and extrastriate areas while the latter may be related to observed temporal relationships between neuronal activities. Computer simulations illustrate the role the two mechanisms play in figure/ground discrimination, depth-from-occlusion, and the vividness of perceptual completion.
The model presented shows how textured regions can be discriminated and textured surface created by the visual cortex. The model addresses two major processes: texture segmentation and texture binding. Textures are detected by using a version of the energy model of J. R. Bergen and E. H. Adelson (1988) and J. R. Bergen and M. S. Landy (1991), which was modified to include ON and OFF center cells, and units selective for line endings. A novel neural mechanism is described for binding a texture pattern together. Simulation results demonstrated the ability of the networks to segment and bind a well-known texture pattern.
The authors present neural network simulations of how the visual cortex may segment objects and bind attributes based on depth-from-occlusion. They briefly discuss one particular subprocess in the occlusion-based model most relevant to segmentation and binding: determination of the direction of figure. They propose that the model allows addressing a central issue in object recognition: how the visual system defines an object. In addition, the model was tested on illusory stimuli, with the network’s response indicating the existence of robust psychophysical properties in the system.
The problems of object segmentation and binding are addressed within a biologically based network model capable of determining depth from occlusion. In particular, the authors discuss two subprocesses most relevant to segmentation and binding: contour binding and figure direction. They propose that these two subprocesses have intrinsic constraints that allow several underdetermined problems in occlusion processing and object segmentation to be uniquely solved. Simulations that demonstrate the role these subprocesses play in discriminating objects and stratifying them in depth are reported. The network is tested on illusory stimuli, with the network’s response indicating the existence of robust psychological properties in the system.
We hypothesize that certain speaker gestures can convey significant information that are correlated to audience engagement…