
Observation can impact data collection, especially true when monitoring animals

Identifying animals is usually done by capturing the animal, attaching a tag or transceiver, and releasing the animal to continuously gather data on its movements

But these interactions inherently interrupt the animals’ behavior

What if there was a way to monitor specific animals without resorting to such invasive methods?
Identifying Whale Sharks by Pattern
Researchers use patternmatching algorithms to match an animal’s markings to previous data, most commonly using an algorithm originally designed to match stars and identify constellations for astronomers: Groth’s Algorithm. We examine existing applications of Groth’s algorithm that aim to answer questions on animal tracking and implement proofofconcept algorithms to approximate how Groth’s Algorithm goes about matching triangles of points as identifiable patterns.
â€‹
In order to utilize this pattern algorithm, the ecologists would need a picture of the designated measurement region, which is located behind the gill slits of each shark. The whale shark photographs from the database will put the photograph through a spot extraction, rotation correction, and contrast enhancement process before the pattern matching begins. The spot extraction method process is when the ecologists implement computerdriven matching software that can discern the areas of interest in an image. This enhances the contrast of white spots on dark skin. It maps the dimensions and location of the spot into a pixel group of a single color family. From there, the image is put through a rotation correction process where it will rotate the image until the curved segment of the shark becomes a vertebral column. The source image will then crop the boundaries of the region. The last method of the processing with contrast enhancement method. This preprocessing helps to reduce the white pixel noise of the image from the spot extraction process. Once the ecologist completes the three tasks, they will put the cropped photograph through Groth’s algorithm to match the whale shark pattern against the
library images.
Algorithms
Groth’s algorithm is an effective, and intricate, approach to twodimensional pattern matching. So intricate, in fact, that we lacked the background to properly implement many of the filtering and voting processes. At Professor Hoschino’s suggestion, we have instead devised two demonstration algorithms that show off different features of pattern matching, in a much more controlled environment than Groth’s algorithm is meant for.
Overlay Algorithms
Our overlay algorithm was inspired by the implementation of Groth's starmapping algorithm to track whale sharks by keeping records of their distinct patterns of white spots. Our overlay program accomplishes something similar but at a much smaller scale –– it pattern matches an image to an existing library of images by comparing the pixels of one image to those of another as points in a coordinate plane. We took a divide and conquer approach in designing our algorithm in that we first tackled the subproblem of pattern matching an image to another (where oldImage is the recorded unique whale pattern of spots as reference) and returning its overlay percentage. These two functions are described in more detail below:
â€‹

Overlay the images: Given in two images of patterns, iterate over corresponding pixel coordinates according to the image dimensions. For each coordinate pair in the images, compare the two pixels and see if they have the same hue. If so, increase the match counter by one. Finally, divide the number of matches by the total number of pixels to determine the match percentage of the two images. Note: one requirement of our implementation is that the two images being compared have to be identical in dimension.
â€‹

Compare the unidentified image against images in the library: Iterate over each known image in the library and compare it with the new image. Preserve the match percentage for each pairing. When all images have been checked, the highest percentage is the closest match.
Part I: Overlay Algorithm Code
Triangle Area Algortihms
Our second pattern matching approximation algorithm is created in spirit of the Groth’s Algorithm, which matches lists of twodimensional coordinates by comparing the areas of triangles formed by sets of three coordinates.
â€‹

Collect coordinates: The algorithm starts with lists of coordinates, representing a whale shark’s spots. Instead of using tuples as a quick solution, our program creates our ownuser defined class: the Point. A Point is represented with two numeric values x and y, and x notes the units to the right and y notes the units up from the origin. Once each
point is recorded, it’s added to the list as the input data. 
Generate triangles: Form triangles between every three points in the input list. This will then give us a list of triangles of length n(n  1)(n  2)/6, where n is the number of points in the list. In our samples, each picture has n=10 points, which means each picture in the whale database can be converted into a 120element array, since (10 choose 3) = 120. Unlike Groth’s algorithm, which uses properties like the ratio of the largest side to the smallest side, we calculate and record the area of each triangle formed for later comparison. After adding 120 triangle areas to a list, we sort them from smallest to largest.

Calculate triangle areas: A mathematical algorithm called the shoelace formula is used to determine the area of a triangle with points on the Cartesian 2dimensional xy plane. By the shoelace formula, where R is the area of an enclosing rectangle, the area of the formed triangle A can be calculated using the formula below:
Part II: Triangle Area Algorithm Code
Drawbacks

Our algorithms solve optimization problems, not decision problems.

The overlay algorithm is dependent on the images having the same pixel dimensions and cannot account for magnification, rotation, or reflection.

The triangle area algorithm views all triangles of the same area as identical, regardless of vertex positioning and is easily misled by falselymatched collections of points.