A probabilistic network model for integrating visual cues and inferring intermediate-level representations

Psychophysical data have demonstrated that our visual system must integrate multiple, spatially local and non-local cues to construct the visual scene. In this paper we describe a probabilistic network model which integrates visual cues to infer intermediate-level visual representations. We demonstrate the network model for two example problems: inferring “direction of figure” (DOF) [15] and estimating perceived velocity. One can consider the assignment of DOF as essentially a problem in probabilistic inference, with DOF being a hidden variable, assigning “ownership” of an object’s occluding boundary to a region which represents the “figure”. The DOF is not directly observed but can potentially be inferred from local observations and “message passing”. For example, our model combines contour convexity and similarity/proximity cues to form observations, with belief propagation (BP) used to integrate these observations with state probabilities to infer the DOF.

We extend the network model, integrating form and motion streams, to explain the coherence based motion effects first demonstrated by McDermott et al. [11]. The extended model consists of two interacting network chains (streams), one for inferring DOF and the other for inferring scene motion. The local figure-ground relationships estimated in the DOF stream are subsequently used by the motion stream as evidence for surface occlusion, modulating the covariance of a Gaussian distribution used to model the velocity at apertures located at junction
points. The distribution of scene motion ultimately is represented in velocity space as a mixture of these form-modulated Gaussians.

Simulation results show that the network’s integration of cues can account for several examples of perceptual ambiguity in DOF, consistent with human perception. Also, the integration of form and motion representations qualitatively accounts for psychophysical results showing surface dependent motion coherence of oscillating edges [11]. We also show that the model naturally integrates top-down cues, leading to perceptual bias in interpreting ambiguous figures, such as Rubin’s vase, as well as bias in the perceived coherence of object motion.

Accepted 12 October 2003
Download Now

Latest News & Links

See All News