11 Define a simple deformable model to detect a halfcircular
11. Define a simple deformable model to detect a half-circular shape (may be rotated). What will be the energy function?
Solution
shape is a recurring theme in computer vision. For example, shape is one of the main sources of information that can be used for object recognition. In medical image analysis, geometrical models of anatomical structures play an important role in automatic tissue segmentation. The shape of an organ can also be used to diagnose diseases. In a completely different setting, shape plays an important role in the perception of optical illusions (we tend to see particular shapes) and this can be used to explain how our visual system interprets the ambiguous and incomplete information available in an image. Our main goal is to develop techniques that can be used to represent and detect relatively generic objects in images. The techniques we present here revolve around a particular shape representation, based on the description of objects using triangulated polygons. Triangulated polygons allow us to describe complex shapes using simple building blocks. As we show in the next section, the triangles that decompose a polygon without holes are connected together in a tree structure, and this has important algorithmic consequences. By picking a particular triangulation for the polygons we obtain decompositions of objects into meaningful parts. This yields a discrete representation closely related to Blum’s medial axis transform [6]. In this paper we concentrate on the task of finding the location of a deformable shape in an image. This problem is important for the recognition of non-rigid objects. Moreover, objects in many generic classes can be described as deformed versions of an ideal template. In this setting, the location of an object is given by a continuous map from a template to an image. Figure 1 illustrates how we use a deformable template to detect a particular anatomical structure in an MR image. We will show how triangulated polygons provide rich models for deformable shapes. These models can capture both boundary and interior information of an object and can be deformed in an intuitive way. Equally important, we present an efficient algorithm for finding the optimal location of a deformable shape in an image. In contrast, previous methods that take into account the interior of deformable objects.
The geometric properties of rigid objects are well understood. We know how three dimensional features such as corners or edges project into images, and there are a number of methods that can be used to represent rigid shapes and locate their projections. Some techniques, such as the alignment method [23], use explicit three dimensional representations. Other techniques, such as linear combination of views [36], capture the appearance of three dimensional shapes using a small number of two dimensional pictures. These and similar techniques assume that all shape variation comes from the viewpoint dependency of two dimensional images. A number of representations describe objects in terms of a small set of generic parts. This includes representations based on generalized cylinders [27] and geons [5]. These approaches are appealing because they provide symbolic descriptions of objects. A shortcoming is that some objects do not have a clear decomposition into generic parts. For example, what are the parts that make up a shoe? Another problem is that we do not know how to extract generic parts from images in a robust way. On the other hand, models based on pictorial structures (e.g. [14, 13]) have been successfully used to characterize and detect objects that are described by a small number of simple parts connected in a deformable configuration. In this approach, generic parts are not extracted from images on their own, the whole object model is used at once. Our representation is similar in that objects are represented by a number parts connected together. When matching a triangulated polygon to an image we also consider the whole model at once instead of trying to detect the generic parts individually. We can represent objects in terms of their boundaries, which for two dimensional objects are curves, and for three dimensional objects are surfaces. Such representations are commonly used both in the context of image segmentation and object recognition. One example is a popular technique for image segmentation known as active contour models or snakes [25, 15]. Boundary models are also used for generic object recognition. Grenander et al. [20] pioneered the use of Markov models to represent the boundaries of non-rigid objects. They demonstrated how these models can be used to detect objects in noisy images. The work in [2] describes how we can measure similarity between objects in terms of the amount of stretching and bending necessary to turn the boundary of one object into the boundary of another one. One problem with deformable boundary models is that they do not capture well how the interior of objects deforms. Blum introduced a representation known as the medial axis transform [6] that is now widely used. The medial axis of an object is defined as the set of centers of all maximally inscribed disks (disks that are contained inside the object but not contained in any other such disk). The medial axis transform is the medial axis together with the radius of each maximal disk. For two dimensional objects the medial axis is one-dimensional and if the shape has no holes the medial axis has no loops. The tree structure is appealing from a computational point of view. The medial axis captures local symmetries of an object and provides a natural decomposition of the object into parts (corresponding to branches in the one-dimensional structure)
Energy Function In our framework, each triangle in a template is mapped to the image plane by an affine transformation. In matrix form, we can write the affine transformation as h(x) = Ax + a. We restrict our attention to transformations which preserve orientation (det(A) > 0). This ensures that the global embedding f is locally one-to-one. Let and be the singular values of A. The transformation h takes a unit circle to an ellipse with major and minor axes of length and . The value log(/) is called the log-anisotropy of h and is a measure of how far h is from a similarity transform. This quantity has also been used by Bookstein [7] to measure distance between triangle shapes. We use the log-anisotropy measure to assign a deformation cost for each affine map (and let the cost be infinity if the map is not orientation preserving). The deformation costs are combined with a data cost that attracts the template boundary to locations in the image that have high gradient magnitude

