Suffix Sorting & Related Algoritmics Martin Farach-Colton Rutgers University USA

Suffix Sorting & Related Algoritmics Martin Farach-Colton Rutgers University USA www.phwiki.com

Suffix Sorting & Related Algoritmics Martin Farach-Colton Rutgers University USA

Lambert, Suzanne, Features Writer has reference to this Academic Journal, PHwiki organized this Journal Suffix Sorting & Related Algoritmics Martin Farach-Colton Rutgers University USA Suffix Sorting & Related Algoritmics Martin Faraç-Kolton Rutgers University USA What is Suffix Sorting Given a string, produce the sorted order of its suffixes. Why would we want that More on this later.

American Film Institute Center for Advanced Film and Television Studies US www.phwiki.com

This Particular University is Related to this Particular Journal

Example: Mississippi$ How fast can we sort Sorting suffixes of a string can be no faster than sorting the characters of a string. Can we match the sorting lower bound Suffix Sorting We are sorting strings, so Radix Sort is natural. This yields an algorithm with time O(n2) to O(n2logn), depending on assumptions on sortability of characters. O(n2logn) in comparison model. O(n2) as long as small integers. In between in general word model. We can do better by combining Merge Sort with Radix Sort.

Building Blocks Range Reduction Radix Step Chunking Range Reduction Observation: If we apply a monotone function to the characters, the sorted order doesn’t change. Example: Mississippi$

Range Reduction Our only range reduction operation will be: Replace every character by rank in sorted order of characters. After RR, length n string will be in [n]n RR helps running time Radix Sort on raw input might take O(n2logn) time. RR takes at most O(nlogn) time. Radix Sort on small-integer inputs takes O(n2) time. Total time is O(n2). Radix Step Recall that Radix Sort proceeds in steps: Lexicographically sort the last i characters of each string. Stably sort by preceding character. Now strings are lexicographically sorted by last i+1 characters.

Using Radix Step Suppose you have some set of suffixes sorted. Maybe suffixes in odd positions. One Radix Step backs up each suffix by one character. This gives us sorted order of even suffixes. Example: 214414413315 Odd Suffixes Example: 214414413315 Even Suffixes

Radix Step Normal Radix Step: Every string gets a little more sorted. Our use of Radix Step: The order of one set of suffixes is determined from the order of another set of suffixes. Where are we now Step 1: Recursively sort odd suffixes. How And how is it recursive A recursive step must sort every suffix! We’ll get to that. Step 2: Sort even suffixes in linear time. By Radix Step. Step 3: Merge! Merging is tricky F ‘97 gave first linear time solution. This yields an optimal suffix sorting routine. It’s a fun algorithm but highly unintuitive. Or so I’ve been told!

Merging is tricky F ‘97 gave first linear time solution. This yields an optimal suffix sorting routine. It’s a fun algorithm but highly unintuitive. Or so I’ve been told! Merging is tricky F ‘97 gave first linear time solution. This yields an optimal suffix sorting routine. It’s a fun algorithm but highly unintuitive. Or so people keep telling me! Chunking Let’s solve the recursion problem first. Given two integers i in addition to j, let i,j be their bit concatenation. If i,j[n], then i,j[n2]. Given a string S = (s1,s2, ,sn) Let S’ = (s1,s2,s3,s4, ,sn-1,sn)

Chunking + Recursion: I Observation: The order of the odd suffixes of S = (s1,s2, ,sn) is the same as the order of all suffixes of S’ = (s1,s2,s3,s4, ,sn-1,sn) Since bit concatenation preserves lexicographic ordering. Example: 214414413315 base 8 2x-1 Chunking + Recursion: II Chunking+Range Reduction = Recursion Input is in [n]n. Chunked Input is in [n2]n/2. Range Reduced Chunking is in [n/2]n/2. So now problem instance is half the size in addition to we can recurse.

Lambert, Suzanne Arizona Republic Features Writer www.phwiki.com

Example: 214414413315 Recall Basic Operations Range Reduction Radix Step Chunking How we are ready as long as the whole algorithm. Suffix Sorting Step 1: Chunk + Range Reduction. Recurse on new string. Get sorted order of odd suffixes. Step 2: Radix Step. (Not 2nd Recursion!) Get sorted order of even suffixes. Step 3: Merge! We still don’t know how to do this.

The Trouble with Merging Know how the odd suffixes compare. Know how the even suffixes compare. No idea how odd & even compare! The difference between 3 in addition to 2 It’s possible to merge the lists. By “unintuitive” algorithm. But Kärkkäinen & S in addition to ers showed the elegant way to merge. From Plato/Erdös’ Book of Proofs They modified the recursion to make the merge easy. Mod 3 Recursion Given a string S = (s1,s2, ,sn) Let S1 = (s1,s2,s3,s4,s5,s6, ,sn-2,sn-1,sn) Let S2 = (s2,s3,s4,s5,s6,s7, ,sn-1,sn,$) Let O12 be order of suffix congruent to 1 or 2 mod 3. You get this recursively from sorting the suffixes of S1S2

Suffix Tree Optimality For small integers, construction is already O(n), so this is optimal, even as long as Scrambled Suffix Trees. In comparison model, suffix trees have a lower bound from element uniqueness (depends on degree of root) so we have optimal algorithm. For large integers (word model of computation), lower bound is linear, upper bound is super-linear. Open Problem Close the gap in the time as long as building a large-alphabet suffix tree, when child order is irrelevant. Related to Deterministic Hashing Open Problem: Given n large integers, can you map them to small integers (poly n) in linear time in the word model

Lambert, Suzanne Features Writer

Lambert, Suzanne is from United States and they belong to Arizona Republic and they are from  Phoenix, United States got related to this Particular Journal. and Lambert, Suzanne deal with the subjects like Celebrities; Entertainment; Features/Lifestyle

Journal Ratings by American Film Institute Center for Advanced Film and Television Studies

This Particular Journal got reviewed and rated by American Film Institute Center for Advanced Film and Television Studies and short form of this particular Institution is US and gave this Journal an Excellent Rating.