This is a Question of the Design and Analysis of Algorithms

This is a Question of the Design and Analysis of Algorithms class, please help

Here\'s a beautiful question I found in an algorithms textbook. It has a very simple solution, but you have to be a little bit creative: You\'re helping a group of ethnographers analyze some oral history data they?ve collected by interviewing members of a village to learn about the lives of people who?ve lived there over the past two hundred years. From these interviews, they?ve learned about a set of n people (all of them now deceased), whom we?ll denote P1, P2,. . . , P. They?ve also collected facts about when these people lived relative to one another. Each fact has one of the following two forms: . For some i and j, person P died before person P was born; or . For some i and j, the life spans of P and P overlapped at least partially. Naturally, they?re not sure that all these facts are correct; memories are not so good, and a lot of this was passed down by word of mouth. So what they?d like you to determine is whether the data they?ve collected is at least internally consistent, in the sense that there could have existed a set of people for which all the facts they?ve learned hold simultaneously. Give an efficient algorithm to do this: either it should produce proposed dates of birth and death for each of the n people so that all the facts hold true, or it should report (correctly) that no such dates can exist?that is, the facts collected by the ethnographers are not internally consistent. To make the question a bit more precise, you are asked to develop an algorithm that runs in O(n + m log n) time, where n denotes the number of people and m denotes the number of facts collected about these people. Argue that your algorithm is correct and that its running time is indeed O(n + m logn). The method to use here is to construct a bipartite directed graph, where the nodes represent events births and deaths of the various n people (so there are 2n nodes). Draw edges from one node to another when we have a fact that states that the former precedes the latter. In particular P_i exists, and was born before they died, so (birth(P_i), death(P_i)) is an edge. if person P_i died before P_j was born, (death(P_i), birth(P_j)) is an edge. if the life spans of P_i and P_j overlapped at least partially, then both (birth(P_i), death(P_j)) and (birth(P_j), death(P_i)) are edges (if either\'s death preceded the other\'s birth, then they couldn\'t have coexisted) For any path a -> b -> c in the graph, we have date(a) < date(b) < date(c) => date(a) < date(c), so the edges are transitive. In particular this means that if the graph contains a cycle, a_1 -> a_2 -> a_3 -> ... -> a_m -> a_1, then we have date(a_1) < date(a_2) < date(a_3) < ... < date(a_m) < date(a_1) => date(a_1) < date(a_1). This is a contradiction in the data, and demonstrates that it is not internally consistent. If the graph does not contain a cycle, then it is a DAG, and therefore must have a topological ordering. Given this topological ordering, we can then assign a list of proposed births and deaths that is consistent with the data, by just scaling the ordering to dates within the 200 year interval. date[event_a] <- START_DATE + (200 yrs / 2*n)(topological-order[event_a]). So the problem reduces to simply attempting to find a topological ordering on the graph of events. If one is found, then we have a possible ordering of events, consistent with the data. If one is not found, it must be because the graph has a cycle (by Kleinberg and Tardos (3.20), p. 102), and therefore the data is inconsistent. ALGORITHM: proc create-node( event ) out-adj-list[event] <- empty-list in-degree[event] <- 0 endproc proc create-edge( start, end ) push( out-adj-list[ start ], end ) in-degree[ end ] <- in-degree[ end ] + 1 endproc for p \\in { P_i } create-node( birth[p] ) create-node( death[p] ) create-edge( birth[p], death[p] ) endfor for (p, q) \\in DiedBefore create-edge( death[p], birth[q] ) endfor for (p, q) \\in Coexisted create-edge( birth[p], death[q] ) create-edge( birth[q], death[p] ) endfor eligible-node-list <- empty-list for p \\in { P_i } if in-degree[ birth[p] ] = 0 push(eligible-node-list, birth[p]) endif endfor event-date-separation = 200yrs / (2*|{P_i}|) order <- 0 while eligible-node-list is non-empty event = pop(eligible-node-list) order <- order + 1 dates[ event ] = START_DATE + order * event-date-separation for following-event \\in out-adj-list[event] in-degree[ following-event ] <- in-degree[ following-event ] - 1 if in-degree[ following-event ] = 0 push( eligible-node-list, following-event ) endif endfor endwhile if order < 2*|{P_i}| return INCONSISTENT_DATA else return CONSISTENT_DATA, dates endif RUNNING TIME: Creating the graph, and initializing the eligible node list takes O( |{ P_i}| + |DiedBefore| + |Coexisted| ) time, linear in the size of our data. Generating the dates and checking for cycles takes at most 2*|{P_i}| iterations of the while loop (if the data is consistent), each of which takes O(1 + |out-adj-list[event]) time. Each edge appears in one out adjacency list, so that takes at most O( 2*|{P_i}| + |{P_i}| + |DiedBefore| + 2*|Coexisted|) time, for a total running time of O( |{ P_i}| + |DiedBefore| + |Coexisted| )

This is a Question of the Design and Analysis of Algorithms

Solution

Get Help Now

Submit a Take Down Notice