Reverse engineering of intermediate-level vision: surface segmentation and depth-from-occlusion

Visual processing can be divided into three stages–early, intermediate, and high level vision, which roughly correspond to the sensation, perception, and cognition of the visual world. In this thesis, we develop a network-based model of intermediate-level vision which focuses on how surfaces might be represented in visual cortex. The model is constructed through reverse engineering, whereby data from neuroanatomy, neurophysiology, visual psychophysics and computational theory are simultaneously utilized to constrain the architecture of the model.
We begin by addressing the neural binding problem, considering how neuronal activities representing responses to local points are bound together to form contours, how contours are bound to regions, and how regions are bound to depth. We argue that the cortex uses two general methods for binding; one operating as a spatial-based system and another as a temporal-based system. We find that these two classes of binding complement one another and can cooperate to overcome several problems inherent to the construction of a surface representation.

Using these two classes of binding, we construct a neural computational model. Central to the model is the representation of surfaces through the establishment of “ownership”–a selective binding of contours and regions. We identify pictorial cues to ownership and consider neural circuits which operate on these cues. In the model, ownership is represented as a vector along the occluding contour. This representation, which we call direction of figure, is attractive since it is both consistent with known neural encoding schemes and provides a local representation of relative depth.

Using computer simulations we test our model’s perceptual performance. We show that by using ownership, our model can parcel real images into surfaces, determine depth from phenomenal transparency, handle classic figure/ground problems, organize an image consistent with the gestalt laws, and account for several aspects of illusory contour and surface perception, including perceived vividness. We conclude that ownership can serve as a central locus for visual integration and that through ownership, processes such as depth, transparency, surface completion, and even motion, can interact with one another to organize an image into a perceptual scene.

Accepted 15 December 1994

Latest News & Links

See All News