Although machine learning is great for shape classification, for shape recognition, we must still use the old methods. Methods such as Hough Transform, and RANSAC.

In this post, we’ll look into using Hough Transform for recognizing straight lines. The following is taken from E. R. Davies’ book, *Computer Vision: Principals, Algorithms, Application, Learning *and *Image Digital Image Processing *by Gonzalez and Woods.

Straight edges are amongst the most common features of the modern world, arising in perhaps the majority of manufactured objects and components – not least in the very buildings in which we live. Yet, it is arguable whether true straight lines ever arise in the natural state: possibly the only example of their appearance in virgin outdoor scenes is in the horizon – although even this is clearly seen from space as a circular boundary! The surface of water is essentially planar, although it is important to realize that this is a deduction: the fact remains that straight lines seldom appear in completely natural scenes. Be all this as it may, it is clearly vital both in city pictures and in the factory to have effective means of detecting straight edges. This chapter studies available methods for locating these important features.

Historically, HT has been the main means of detecting straight edges, and since the method was originally invented by Hough in 1962, it has been developed and refined for this purpose. We’re going to concentrate on it on this blog post, and this also prepares you to use HT to detect circles, ellipses, corners, etc, which we’ll talk about in the not-too-distant future. We start by examining the original Hough scheme, even thoupgh it is now seen to be wasteful in computation since it has evolved.

First, let us introduce Hough Transform. Often, we have to work in unstructured environments in which all we have is an edge map and no knowledge about where objects of interest might be. In such situations, all pixels are candidates for linking, and thus have to be accepted or eliminated based on predefined global properties. In this section, we develop an approach based on whether set of pixels lie on curves of a specified shape. Once detected, these curves form the edge or region boundaries of interest.

Given $n$ points in the image, suppose that we want to find subsets of these points that lie on straight lines. One possible soltion is to fine all lines determined by every pair of points, then find all subsets of points that are close to particular lines. This approach involves finding $n(n-1)/2 \sim n^2$ lines, then performing $(n)(n(n-1))/2 \sim n^3$ comparisons for every points to all lines. As you might have guessed, this is extremely computationally expensive task. Imagine this, we check every pixel for neighboring pixels and compare their distance to see if they form a straight line. Impossible!

Hough, as we said, in 1962, proposed an alternative approach to this scanline method. Commonly referred to as the *Hough transform. *Let $(x_i, y_i)$ denote p point in the xy-plane and consider the general equation of a straight line in slope-intercept form: $y_i = ax_i + b$. Infinitely many lines pass through $(x_i, y_i)$ but they all satisfy the equation we saw, for varying values of $a$ and $b$. However, writing this equation as $b = -x_i a+y_i$ and considering the ab-plane – also called *parameter space*, yields the equation of a single line in parameter space associated with it, which intersects the line associated with $(x_i, y_i)$ at some point $(a\prime, b\prime)$ in parameter space, where $a\prime$ is the slope and $b\prime$ is the intercept of the line containing the both $(x_i, y_i)$s in the xy-plane, and of course, we are assuming that lines are not parallel, in fact, all points on this line have lines in parameter space that intersect at $(a\prime, b\prime)$. Here, this figure illustrates s the concepts:

In principle, the parameter space lines corresponding to all points $(x_k, y_k)$ in the xy-plane could be plotted, and the principal (goddammit, principle, principal, fuck this language!) lines in that plane could be found by identifying points in parameter space where large numbers of parameter-space lines intersect. However, a difficulty with this approach is that $a$, approaches infinity as the lines approaches vertical direction. One way around this difficulty is to use the *normal representation of a line:*

\[ x \cos(\theta) + y sin(\theta) = \rho \]

Figure on the right below demonstrates the geometrical interpretation of parameters $\rho$ and $\theta$. A horizontal line has $\theta = 0^\circ$, with $\rho$ being equal to the positive x-intercept. Similarly, a vertical line has $\theta = 90^\circ$, with $\rho$ being equal to positive y-intercept. Each sinusoidal curve in the middle of the figure below represents the family of lines that pass through a particular point $(x_k. y_k)$ in xy-plane.

Let’s talk about the properties of Hough transform. Figure below illustrates the Hough transform based on the equation above.

On the top, you see an image of size $M\times M \bigvee M=101$ with five labeled white points, and below it shows each of these points mapped into the parameter space, $\rho\theta$-plane using subdivisions of one unit for the $\rho$ and $\theta$ axes. The range of $\theta$ values is $\pm 90^\circ$ and the range of $\rho$ values is $\pm \sqrt{2} M$ As the bottom image shows, each curve has a different sinusoidal shape. The horizontal line resulting from the mapping of point 1 is a sinusoid of zero amplitude.

The points labeled A and B in the image on the bottom illustrate the colinearity detection property of the Hough transform. For exampele, point B marks the intersection of the curves corresponding to points 2, 3, and 4 in the xy image plane. The location of point A indicates that these thre points line on a straight line passing through the origin $(\rho = 1)$ and oriented at $-45^\circ$. Similarly, the curves intersecting at point B in parameter space indicate that 2, 3, and 4 line on a straight like oriented at $45^\circ$, and whose distance from origin is $\rho = 71$. Finally, Q, R, and S illustrate the fact that Hough transform exhibits a reflective adjacency relationship at the right and left edges of the parameter space.

Now that we know the basics of HT and line detection using HT, let’s take a look at Longitudinal Line Localization.

The previous method is insensitive to where along the infinite idealized line an observed segment appear. He reason for this is that we only have two parameters, $\rho$ and $\theta$.There is some advantage to be gained in this, in parital occlusion of line does not prevent its detection: indeed, if several segments of a line are visible, they can all contribute to the peak in parameter space, hence improving senitivity. On the other hand, for full image interpretation, it is useful to have information about the longitudinal placement of line segments.

Ths is achieved by a further stage of processing. The additional stage involves finding which points contributed to each peak in the main parameter space, and carrying out connectivity analysis in each case. Some call this process xy-gruping. It is not vital that the line segments should be 4-connected (meaning, a neighborhood with only the vertical and horizontal neighbors) or 8-connected (with diagonal neighbors) – just that there should be sufficient points on them so that adjacent points are withing a threshold distance apart, i.e. groups of points arem erged if they are withing prespecified distance. Finally, segments shorter than a certain minimum length can be ignored too insignificant to help with image interpretation.

The alternative method for saving computation time is the Foot-of-Normal method. Created by the author of book I’m quoting from, it eliminates the use of trigonometric functions such as arctan by employing a different parametrization scheme. Both the methods we’ve described employ abstract parameter spaces in which poitns bear no immediately obvious relation to image space. N the alternative scheme, the parameter spaces in which points bear no immediately obvious visual relation to image space. In this alternative scheme, the parameter space is a second image space, whifcfh is congruent to image space.

This type of parameter space is obtained in the following way. First, each edge fragment in the image is produced much as required previously so that $\rho%” can be measured, but this time the foot of the normal from the origin is taken as a voting position in the parameter space. Taking %(x_0, y_0) as the foot of the normal from the origin to the relevant line, it is found that:

\[b/a = y_0/x_0 \]

\[(x-x_0)x_0 + (y-y_o)y_0 \]

Thes etwo equations are sufficient to compute the two coordinates, $(x_0, y_0)$. Solving for $x_0$ and $y_0$ gives:

\[ x_0 = va \]

\[y_0 = vb \]

Where:

\[ \frac{ax + by}{a^2 + b^2} \]

Well, we’re done for now! It’s time to take a shower, then study regression, as I’m done with classification. I’m going to write a post about regression, stay tuned!