# Suffix Tree using Ukkonen's algorithm

Suffix Tree using Ukkonen's algorithm

Construct suffix tree using Ukkonen’s algorithm which is linear time algorithm

https://www.facebook.com/tusharroy25

https://github.com/mission-peace/interview/blob/master/src/com/interview/suffixprefix/SuffixTree.java

https://github.com/mission-peace/interview/wiki

im dead serious, there is a hole with this method.

try abccbabbabbc$

the hole is at finger 11. there is one skipped edge.. it happens because u change pivot and length every jump to new node

im dead serious. i tried to implement your method to programming for like 10 days. then i found your hole..

but i have another method which is 100% correct to 40+ cases

Лайк, если прорал все лекции по Дискрану, но не хочешь отчисляться

implementing this algorithm using active edges to decide your branch direction when the suffix link is to the root is not fool proof. its better to rely on naive traversing and skip/counting to get to your position.

Can you please do a similar video with McCreight’s and Weiner’s Algorithms?

Tushar Roy, saving my sorry ass once more! This guy is the MVP of IT on youtube!

why at 1.03.50 start: 1 and end: 0 for root?

Hi, Tushar !!

You have explained really well !!

Your code explains suffix tree for a single string, can you please share a short lesson on a generalized suffix tree involving multiple strings. Thanks.

Hi Tushar, Your video is very very good. I was following it really well until you introduced the concept of "active nodes", "active edges". It messed up a lot of things, and it was very sudden jump afterwards. The "active nodes" etc. are actually implementation details. They should not be part of whiteboard algorithm.. Another thing, you jumped from O(M^3) to O(M) directly.. It would be better if you could make it O(M^2) first and then slowly optimise it up to O(M).

I propose following changes:

The explanation before introduction of "active nodes" is very very good. Now after that, consider a O(M^2) optimisation:

Let Tree T is suffix tree for string S.

Now let’s consider the changes in T after adding char c. (i.e. suffix tree of string S + c)

1. When constructing suffix tree T of string S, let’s also maintain the list of internal exact-node-ptr where last char was either added or was pointed to.. (i.e. either via Rule2 or Rule3)

Note that "exact-node-ptr" points to the actual char on the edge label. which is can be tracked by (node + offset).

Note that we are only considering internal exact-node-ptr because the addition of new-char on leaves is automatically covered because of ‘End’ marker used in their range.

2. In the next iteration, i.e. when adding a new char c, we know the list of exact-node-ptr where this new char needs to be added (Rule2) or ensured if it’s already present (Rule3)… While doing so, let’s also track the list of exact-node-ptr for next iterations…

Note that if we added a new node for the new char (Rule2), then we don’t need to track it since it will be leaf node. We are only tracking the internal exact-node-ptr..

Also note that root-node is always part of this list…

This makes the algorithm O(M^2).. because on each phase we are doing at max O(M) work.

Now let’s improve it for O(M)..

Observe that at each phase, we are maintaining a list of size M. To store this list in O(1) space, let’s consider a new concept called "Suffix Link"… Suffix link enables us to maintain this list as linked list and we only need to store the head-ptr of this linked list. We need to store the suffix link for each node. By storing the suffix-link for each intermediate node, we can calculate it for each exact-node-ptr. Because suffix-link of a exact-node-ptr = suffix-link of containing node + offset.

Also note that we don’t need to iterate on the entire linked list on each phase if the addition of new char turns out to be Rule3.. because we can guarantee that addition of this new char will be Rule3 on remaining exact-node-ptr as well…. and to prepare the list of exact-node-ptr for the next stage , we don’t need to do anything apart from adding 1 in the offset of linked list head… Which would mean offset of all the remaining items of current linked list is also added by 1 automatically… Check how we defined "next" function of the linked list..

LinkedList_Next_Func(exact-node-ptr) = SuffixLink(exact-node-ptr) = SuffixLink(node + offset) = SuffixLink(node) + offset

That would mean, we are only iterating the LinkedList when we need to actually add new child on all the exact-node-ptr from previous linked list…

Note that total ‘child-addition’ steps are bounded by O(M) and total phases are bounded by M. Hence total time complexity of the algorithm will be bounded by O(M)…

15:13 LENGHT IS WRONG !!!

https://stackoverflow.com/questions/9452701/ukkonens-suffix-tree-algorithm-in-plain-english#

explanation is epic

Chea Uce this is a great video. I love the time you put into making this 🙂 🙂 much appreciated!

At 20:23 the extension of path b should start from node C. And travelling on suffix link we then land up on node D.

Excellent and a very detailed explanation. Programmers, I would suggest to read this article with this video: https://www.geeksforgeeks.org/ukkonens-suffix-tree-construction-part-6/ (all 6 sets).

Article & Video compliment each other well.

@Tushar – Worth chopping this video in 4 videos – (1) Basic idea – 22:15 (2) Example 1 – 47:51 (3) Example 2 (4) Code explanation

thanks for the detailed example. I have yet only one doubt about the time complexity though: wouldn’t building "new" suffix links (walking down from the root) take O(m) each and makes it O(m^2) in total?

Based Indian algorithm guy

I haven’t seen him replying any comment good, bad or query. Anyway, Good work 🙂

great video. but i really have something to ask. would you please answer.

at 59:12

the active node is actually L isnt it?

then the edge is 4 and the length is 1

then

folow L sufix link to root,

find from root where the value of edge 4 and the length is 1 which is "i" character

please answer this one please

Before watching this lecture. It’s recommended to watch this:

https://www.youtube.com/watch?v=hLsrPsFHPcQ&t=653s

to see how we came to the idea of using suffix tree from suffix trie

and what r the applications for both data structures

14:18 someone please clarify. How is node E a sublink of node D ?!

How ca we modify the code you have provided us with to make that tree a generalized suffix tree ? I mean I would likte to store "xabxa#babxba" string in my tree and perform actions on it like longest palindrome search and longest common substring. Thanks.

You recommend to read about before starting this, can you please share some recommendations for such readings. Thanks!

This is beautiful. Thank you very much for the best explanation of this algorithm on the internet. 🙂

thanks for the elegant and clear explanation

The tricky part is the usage of index, reference and suffix link (transition), which help make it O(n), 2*n actually. Thank you very much.

Amazing! Thanks for all your efforts!

Thanks for the great explanation, but I have a question.

On 21:00 , shouldn’t you traverse the link on D to E, rather than B to root, considering D happens to already has a suffix link? It’ll make the amount of edges you have to traverse less.

Thanks Tushar! Great explanation making the algorithm understanding easy

Great job, Tushar Roy! I spent many tough hours not understanding the algorithm and then, I finally found this brilliant video! Two examples made it really really clear! POY man, keep up the great work!

Amazing, thank you!

Hey Tushar will you make a video on panda and xor on hackerearth (https://www.hackerearth.com/problem/algorithm/panda-and-xor/).

Really cool video, helped me recap and seeing the algorithm through its various iterations.

I don’t understand Java Code . Do you have C++ code ?

Great explanation ….

One confusion… At 31:20 .. you told that after creating internal node .. we increase active edge by 1 and decrease active len by 1 and then traversed the tree… While at 52:10 .. you just decreased active len by 1 but did not increase active edge by 1 before traversing … Why ?

https://github.com/Sidawhiz/Suffix-Tree Implementation in java

I have a question. When you search for the existance of a suffix starting with the current character, how do you do it in O(1) ? Because, the way I see it, you can have an almost arbitrary number of paths that you can take.

I thought something wrong in the demo code

Finally found a tutorial which can help.

Thank you very very much for your videos. You are such a Gem 🙂

your voice is annoying as heck, but thanks for the info

Thank you soooo much! You are an amazing human being for spending all this time to make this lengthy video to help others! I have no words! Thank you thank you thank you and I wish the best to you and your family! You deserve everything you wish for! Thank you!

At 40:35 while creating node Q you told when leaf node is created then remaining is reduced by one then why that has not been done

Thanks.

Thank you Brother .Video is so much helpful to me.

Great video. Question, you label the edge with X with the compression [0,0] and the edge with $ [4,4]. If I were to then search the tree for X$, how would I know that it is actually located at [3..4] when the only information is that X is at 0 and $ is at 4?

The only source that explained me suffix tree in simple English and good examples. Thanks

Great explanation. Thanks for the video.

FIT3155 gang wya

would u plz explain how to make suffix array from this suffix tree u made

Hi thank you for your video. I have a question regarding the rule of active points.You mentioned that "whenever active length 0, you check from the root to see if there’s an edge pointing to the next character’. However, can there be circumstances where active node is not root while active length is also 0? for example, at 16:00 if you had "yz" instead of "xyz" after the "$", the active node becomes B, and the active length becomes 0 (after checking y and jumping node), and then we just check whether there is an edge leading to z? Please correct me if I am wrong but I am quite confused at this point. Thanks a lot for your video again.

thank you tushar sir, finally i got this algorith..