Hi all, it’s come to my attention that some students are using this as a tool to learn from when creating their own implementations for this year’s assignments, which is great! People are welcome to ask me anything about my submission, I’m glad to help.
Although, there is a problem in my implementation which I hope somebody might be able to fix. There’s a soft-break in the program. It’s been a while since I last ran this and/or attempted to fix it, but it seems like the program will crash unless there’s a breakpoint placed before the code exits. I’ve been told that perhaps the way the code is run when looking for breakpoints uses memory in a slightly different way, which may cause the difference but no solution has yet been found.
If you can fix it, let me know!
The coursework for the module on Algorithms in the second year of my degree consisted of implementing Dijkstra’s algorithm in C, to read in a list of connections between cities, read in a list of city pairs and for each of those pairs, find the shortest route between these two cities. The program was accompanied by the following report, which contains details of the implementation process, with justifications, and testing of the program.
The report is below and you can find the code for the implementation on my Github here:
In the following report, I will describe the writing and function of a program in C which implements Disktra’s Algorithm using a priority queue. I will attempt to justify my chosen data structures and algorithms, writing of their time and space complexity. As well as this, I will discuss the limitations of my program.
2.1. Declaring Structures
The program begins with the declaration of two ‘intertown_distance’ structures. This is the structure described by Prof. John Robinson in his lecture ‘Algorithms: Week 9, Heaps and the Assessment’. Simple arrays of this structure are used to store each edge and test-pair. The structure is useful as it contains all relevant information for each edge, edge information can be accessed with time-complexity O(1) and is stored with a space-complexity of O(n) at the worst. In the program, the array is never extended or shortened but it is searched, which, unfortunately, has a time-complexity of O(n). One change made to this structure is the addition of the ‘arrayLength’ property.
The length of the array is controlled by the function ‘CountLines()’, which reads each character until the end of the file, counts the number of new-line characters and returns this value as the number of lines present in the file. This way, an array of ‘intertown_distance’ structures will always be exactly the right length. The number returned by the function ‘CountLines()’ is also stored as the ‘arraylength’ property, to avoid repeated calling of the function and the need to pass the name of the file used to initialise the structure into various functions in which the structure is used.
The second structure present, the ‘cities’ structure, is then declared. Used as an array, the structure is used to contain an unordered list of unique cities and associate an indexing number with them. The city names are written to the array in the order in which they appear in the file containing the list of edges. Due to the nature of the program, no cities will be appended or deleted from the array after its initialisation, so no considerations need to be made as to the efficiency of these. The array will, however, be accessed and searched frequently. Unfortunately, for an unordered array, searching the array will take O(n), at it’s worst and on average. Attempts were made to order the cities and decrease search time, and this will be detailed later. The upside to this structure is that it has an average and worst access time of O(1), though only the ‘printCities()’ function makes use of this fact. The length of this array is controlled much like the previous structures, with the function ‘countCities()’ returning the value to be used. The function compares each start and end city in a given ‘intertown_distance’ structure and compares it to every other start and end, removes all duplicates and then returns the number of starts and ends minus the number of empty spaces as the number of unique cities.
2.2. Reading Files
There are two files to be red into the program, the file containing all the edge information, which will be read into the first ‘intertown_distance’ structure ‘edges’, and the file containing pairs of cities used to test the program, which will be read into the second ‘intertown_distance’ structure ‘testpairs’.
The function ‘readFilesTointertown_distance()’ handles the reading of files. Each line of the file is read into a buffer, which is then separated, via the tab-delimitation, into a start-city string, an end-city string and an integer for the distance and these are all stored in consecutive array locations of the passed-in ‘intertown_distance’ structure. If the file cannot be read, the function prints a line to alert the user.
2.3. Converting City Names to Indexes
The heap and function implementing the Dijkstra algorithm use integers rather than strings to denote cities. Because of this and the fact that city names are provided and not just numbers, there needs to be a conversion.
The conversion begins with a function to create an array of all and only the unique cities from the edges structure. The function ‘populateCityNames()’ does this by copying all start and end cities to a one-dimensional array before rendering all locations containing duplicate cities blank, much like the function ‘countCities()’. Then, a loop checks every location in the array of unique cities and blank locations and copies all of the unique cities to the ‘Cities’ structure. Unfortunately, this method does not apply any order to the list of cities. I did attempt to order the list with a variation on the following excerpt of code from geeksforgeeks.org, but it failed and I could not fix it.
Once a list of unique cities is obtained and each of these cities is assigned an index, the function ‘cityNamesToNumbers()’ takes the array of edges and converts each city to its index number. This is done by looping through ‘edges’ and passing each start and end city into the function ‘cityNumberFromName()’, which performs a linear search on the cities structure and returning the index value associated with the input city. Linear search is inefficient, with it having an average time-complexity of O(n). However, since the list is unsorted, it is the best available option.
2.4. The Graph
The next line of code in the main function constructs the graph. The graph is implemented using an array of adjacency lists, with the adjacency lists being singly-linked lists. Linked lists are ideal for this purpose as adding edges is perhaps the most used structure operation in the program and the Linked List data-type can do this in a time-complexity of O(1).
Following the construction of the graph, the edges are added. The start and end city indexes, as well as the distance between the two, are each added to the graph via the function ‘addEdge()’. A For-Loop is used to add each edge from the ‘edges’ structure individually, with the ‘atoi()’ function being used to convert the index numbers from strings representing numbers to true integers. The ‘addEdge()’ function has a time-complexity of O(1) per edge added and, since the graph is undirected, adds an edge in each direction.
2.5. Dijkstra’s Algorithm
For our purposes, Dijkstra’s algorithm must be run several times, once for each pair of cities in the appropriate input file. To achieve this, I’ve used another For-Loop which iterates as many times as there are lines in the file of city-pairs, which is again achieved using the ‘countLines()’ function. The code iterated in this loop calls the function ‘dijkstra()’, my implementation of Dijkstra’s algorithm. As inputs, the function takes the graph and index numbers for the origin city and destination city, which are obtained by calling the function ‘cityNumberFromName()’.
An integer array called ‘pathArr’ is also passed into the function. The array will hold the index number of each city in the final calculated shortest path for later printing. The length of this array is the number of unique cities and the reasoning for this will be discussed in the section on limitations.
2.5.1 The Heap
Within the ‘dijkstra()’ function, an array of integers, ‘distance()’, is declared to hold distance values, as well as a binary heap. The ‘Heap’ and ‘HeapNode’ structures form a minimum-distance-priority queue. Each ‘HeapNode’ contains only the index number of a city and an edge length while the ‘Heap’ contains integers denoting the capacity and current size of the heap, a pointer to an integer denoting a position within the heap and a pointer to a pointer to a ‘HeapNode’. The priority queue structure was chosen for the speed of its operations, with search, insert and remove operations taking O(log n) on average and O(n) at their worst.
Each location in the ‘distance’ array is then initialised, taking the value of the constant ‘INFINITY’, and a node is constructed for each city, with its distance being initialised from the ‘distance’ array. The ‘HeapNode’ for the origin city is then constructed, with its distance being form itself being 0 so it gets popped first. Also, the heap size is set to the size of the graph, since every city is currently in the heap.
A loop runs, while the heap is not empty, as decided by the function ‘isEmpty()’, which simply returns whether or not the heap is of size ‘0’. The iterated code within the loop begins by popping the minimum distance node from the heap, as chosen by the function ‘popSmallest()’. The queue should already be sorted by minimum distance so the root node will be popped. The root is replaced by the last node in the heap, the size of the heap is decremented and the heap is reordered with the function ‘generateHeap()’.
The function ‘generateHeap()’ is my implementation of the heap sort algorithm. The algorithm has a worst and average time complexity of O(n log n) so it is not a particularly efficient algorithm but it is the only way to sort a binary heap. The function recursive, with an integer being passed in to denote the index number from which the heap is sorted. From the ‘dijkstra()’ algorithm, ‘0’ is passed in to begin the sorting at the root of the heap. The node within the heap currently thought to be the smallest is initially assigned this index number. The indexes for the right and left child nodes are then calculated from the integer passed into the function with the following formulae:
The edge weights of the left and right values in the heap are then compared to the current smallest value and, if they are smaller, they will display the current smallest index. If the index with the smallest edge weight value is not the original integer passed in, the index and smallest node swap places with the function ‘swapHeapNode()’. The heap sort function then calls itself, with the new input index being the index of the current smallest node.
2.5. (resumed) Dijkstra’s Algorithm
After the heap is sorted, the ‘popSmallest()’ function is finished and the program returns to the ‘dijkstra()’ function, the head of the graph’s array at the index is copied to a ListNode pointer and another while-loop begins iterating while the ListNode is not NULL. Inside this loop, a check such that if the shortest distance to the destination index in the ListNode is still not finalised and the distance to the index through the last popped index is less than the previously calculated distance, the distance value of the given index is decreased.
As well as this, the current index is added to the path array, as the success of the previous check would mean that the current index forms part of the minimum path. A second check is performed to see if the current index is the destination and, if it is, the function returns the current total distance. If not, however, the ListNode pointer points to the next node in the graph and the loop continues.
2.6. Program Output
With the shortest path from the first pair of cities calculated, the value is saved as the distance member of the ‘intertown_distance’ structure for that pair of cities. The path itself is then printed with the function ‘printArr()’. At this point, the array containing the shortest path actually contains every edge tested in the algorithm. Because of this, the function begins by filtering out the edges not used in the final path. This is done with a loop which iterates while integer ‘x’, which is initialised to be the destination, does not equal the origin of the path.
First, the city name corresponding to index ‘x’ is copied to a temporary array of strings. The, ‘x’ is assigned the value at ‘pathArr[x]’, which is the edge connecting the previous city to the remainder of the path. Repeating this will save the whole path to an array of strings, in reverse order. A For-Loop, with counter ‘i’ being decremented, then prints this array in reverse order, causing the path to be displayed forwards (from origin to destination). After printing the array, the ‘intertown_distance’ structure containing all pairs for cities and the shortest distance between them, is printed and written to a file, completing the functionality of the program. Overall, the algorithm runs with a time complexity of O(H log N) with a binary heap implementation in the worst case, where N is the number of cities in the graph and H is the number of edges in the heap.
Unfortunately, sometime during the second call of the ‘dijkstra()’ function, the program is prone to crashing. Running the program in debug mode, however, gives no issue. Placing a breakpoint before the return of the program main allows for the debug window to stay open and even if this is not done, the output file is successfully and correctly written.
The program is limited in several regards, the main of which being to do with constants defined for string and array lengths. As Strings are arrays, they must have their maximum length defined in the code. For ease of alteration, I have defined this as a constant. The size of which, was decided to be 85, due to the fact that this is the number of characters in the longest single-word city name in the world.
The constant ‘ARRAYLEN’ is used as the maximum number of cities stored in the shortest path. Unfortunately, I’m not aware of any formula for the maximum number of cities present in the shortest path between two points. Because of this, I’ve set the constant to 20, one less than the number of unique cities present in the given file of edges. The concept of infinity is not defined in C by default, believe it or not. Because of this, a suitable replacement must be defined. One possibility is the maximum size of an integer, which is defined in ‘limitations.h’. I, however, thought this was overkill and and used 20000 as a suitable replacement. Since infinity is only used as a value which is higher than any possible distance between two cities, I chose 20000 as it’s approximately half the length of the equator (in km), the longest direct line which could be drawn between two cities on Earth.
Finally, formatting is a limitation. The program can only successfully read tab-delimited files of edges. This is due to the way files are read to members of the ‘intertown_distance’ structure. Luckily, the process to take one line of the file and split it into three variables uses only one line of code, so this can quickly and easily be changed.
4.1. The Test file
The provided file contained three pairs of cities for which it was required to find the shortest path and, hence, the shortest distance between them. My program, running in debug mode, returned the following.
Displayed are the three paths returned by my program, followed by the ‘intertown_distance’structure generated from the file of test pairs, with the shortest distance between them in km generated by my program added beside each pair. These results were corroborated by manually summing the distances between the links as given by the original file of edges.
4.2. Incomplete file lines
Several incomplete lines were appended to the file of edges to test how the program would handle erroneous input. The lines were the following and the consequence of this change is also shown.
\n Leeds \t Sheffield \t 53 \n Leeds \t Oxford \n
The last three warnings that a distance of ‘0’ has been read are from reading in the file of test-pairs, and this is expected. The preceding four lines are due to the erroneous file input. It seems that York to Oxford was read as ‘0’ because of the line which featured only York. The second city in the preceding line was Oxford and was not overwritten. The distance of the preceding line was not copied as it was reset to ‘0’ in the function, as was also true for the line with two cities and a missing distance. Fully repeating an edge produced no strange behaviour.
The program was run 1000 times, with each time running the whole main function described in this report. After one thousand samples, the following output was achieved. The total time to run the main function 1000 times was 70 seconds, with the average time taken for the main to run once being 0.07 seconds.
5. Reference Material
”Dijsktra’s algorithm”, GeeksforGeeks, 2018. [Online]. Available: https://www.geeksforgeeks.org/greedy-algorithms-set-6-dijkstras-shortest-path-algorithm/. [Accessed: 19- Feb- 2018].
”Dijkstra’s algorithm – Rosetta Code”, Rosettacode.org, 2018. [Online]. Available: https://rosettacode.org/wiki/Dijkstra%27s_algorithm. [Accessed: 19- Feb- 2018].
J. Robinson, “Algorithms: Week 9, Heaps and the Assessment”, University of York B/B/006, 2018.
”Big-O Algorithm Complexity Cheat Sheet (Know Thy Complexities!) @ericdrowell”, Bigocheatsheet.com, 2018. [Online]. Available: http://bigocheatsheet.com/. [Accessed: 19- Feb- 2018].