# Suffix Tree using Ukkonen's algorithm

Construct suffix tree using Ukkonen’s algorithm which is linear time algorithm
https://github.com/mission-peace/interview/blob/master/src/com/interview/suffixprefix/SuffixTree.java
https://github.com/mission-peace/interview/wiki

1. David Valentino on June 2, 2021 at 8:08 am

im dead serious, there is a hole with this method.
try abccbabbabbc\$

the hole is at finger 11. there is one skipped edge.. it happens because u change pivot and length every jump to new node

im dead serious. i tried to implement your method to programming for like 10 days. then i found your hole..

but i have another method which is 100% correct to 40+ cases

2. Ярослав Пос on June 2, 2021 at 8:15 am

Лайк, если прорал все лекции по Дискрану, но не хочешь отчисляться

3. Mango23 on June 2, 2021 at 8:16 am

implementing this algorithm using active edges to decide your branch direction when the suffix link is to the root is not fool proof. its better to rely on naive traversing and skip/counting to get to your position.

4. Essence on June 2, 2021 at 8:18 am

Can you please do a similar video with McCreight’s and Weiner’s Algorithms?

5. Joel Andersson on June 2, 2021 at 8:18 am

Tushar Roy, saving my sorry ass once more! This guy is the MVP of IT on youtube!

6. Рамазан Часыгов on June 2, 2021 at 8:18 am

why at 1.03.50 start: 1 and end: 0 for root?

7. Akshat Jain on June 2, 2021 at 8:19 am

Hi, Tushar !!
You have explained really well !!
Your code explains suffix tree for a single string, can you please share a short lesson on a generalized suffix tree involving multiple strings. Thanks.

8. Mohit Saini on June 2, 2021 at 8:19 am

Hi Tushar, Your video is very very good. I was following it really well until you introduced the concept of "active nodes", "active edges". It messed up a lot of things, and it was very sudden jump afterwards. The "active nodes" etc. are actually implementation details. They should not be part of whiteboard algorithm.. Another thing, you jumped from O(M^3) to O(M) directly.. It would be better if you could make it O(M^2) first and then slowly optimise it up to O(M).

I propose following changes:

The explanation before introduction of "active nodes" is very very good. Now after that, consider a O(M^2) optimisation:

Let Tree T is suffix tree for string S.
Now let’s consider the changes in T after adding char c. (i.e. suffix tree of string S + c)

1. When constructing suffix tree T of string S, let’s also maintain the list of internal exact-node-ptr where last char was either added or was pointed to.. (i.e. either via Rule2 or Rule3)
Note that "exact-node-ptr" points to the actual char on the edge label. which is can be tracked by (node + offset).
Note that we are only considering internal exact-node-ptr because the addition of new-char on leaves is automatically covered because of ‘End’ marker used in their range.

2. In the next iteration, i.e. when adding a new char c, we know the list of exact-node-ptr where this new char needs to be added (Rule2) or ensured if it’s already present (Rule3)… While doing so, let’s also track the list of exact-node-ptr for next iterations…
Note that if we added a new node for the new char (Rule2), then we don’t need to track it since it will be leaf node. We are only tracking the internal exact-node-ptr..
Also note that root-node is always part of this list…

This makes the algorithm O(M^2).. because on each phase we are doing at max O(M) work.

Now let’s improve it for O(M)..

Observe that at each phase, we are maintaining a list of size M. To store this list in O(1) space, let’s consider a new concept called "Suffix Link"… Suffix link enables us to maintain this list as linked list and we only need to store the head-ptr of this linked list. We need to store the suffix link for each node. By storing the suffix-link for each intermediate node, we can calculate it for each exact-node-ptr. Because suffix-link of a exact-node-ptr = suffix-link of containing node + offset.

Also note that we don’t need to iterate on the entire linked list on each phase if the addition of new char turns out to be Rule3.. because we can guarantee that addition of this new char will be Rule3 on remaining exact-node-ptr as well…. and to prepare the list of exact-node-ptr for the next stage , we don’t need to do anything apart from adding 1 in the offset of linked list head… Which would mean offset of all the remaining items of current linked list is also added by 1 automatically… Check how we defined "next" function of the linked list..

That would mean, we are only iterating the LinkedList when we need to actually add new child on all the exact-node-ptr from previous linked list…

Note that total ‘child-addition’ steps are bounded by O(M) and total phases are bounded by M. Hence total time complexity of the algorithm will be bounded by O(M)…

9. Utkarsh Devgan on June 2, 2021 at 8:20 am

15:13 LENGHT IS WRONG !!!

10. Ranjith on June 2, 2021 at 8:21 am

11. Fergus Hewson on June 2, 2021 at 8:21 am

Chea Uce this is a great video. I love the time you put into making this 🙂 🙂 much appreciated!

12. Shrey Jain on June 2, 2021 at 8:23 am

At 20:23 the extension of path b should start from node C. And travelling on suffix link we then land up on node D.

13. prabhat kumar on June 2, 2021 at 8:23 am

Excellent and a very detailed explanation. Programmers, I would suggest to read this article with this video: https://www.geeksforgeeks.org/ukkonens-suffix-tree-construction-part-6/ (all 6 sets).
Article & Video compliment each other well.
@Tushar – Worth chopping this video in 4 videos – (1) Basic idea – 22:15 (2) Example 1 – 47:51 (3) Example 2 (4) Code explanation

14. haleyk10198 on June 2, 2021 at 8:24 am

thanks for the detailed example. I have yet only one doubt about the time complexity though: wouldn’t building "new" suffix links (walking down from the root) take O(m) each and makes it O(m^2) in total?

15. shom@ on June 2, 2021 at 8:26 am

Based Indian algorithm guy

16. prianka bhattacharjee on June 2, 2021 at 8:27 am

I haven’t seen him replying any comment good, bad or query. Anyway, Good work 🙂

17. David Valentino on June 2, 2021 at 8:28 am

at 59:12
the active node is actually L isnt it?
then the edge is 4 and the length is 1

then

folow L sufix link to root,
find from root where the value of edge 4 and the length is 1 which is "i" character

18. Megan Lee on June 2, 2021 at 8:29 am

Before watching this lecture. It’s recommended to watch this:
to see how we came to the idea of using suffix tree from suffix trie
and what r the applications for both data structures

19. Sai Nivedh Vallala on June 2, 2021 at 8:30 am

14:18 someone please clarify. How is node E a sublink of node D ?!

20. Meryem Arslan on June 2, 2021 at 8:32 am

How ca we modify the code you have provided us with to make that tree a generalized suffix tree ? I mean I would likte to store "xabxa#babxba" string in my tree and perform actions on it like longest palindrome search and longest common substring. Thanks.

21. Shubham Rajput on June 2, 2021 at 8:33 am

22. Andy on June 2, 2021 at 8:35 am

This is beautiful. Thank you very much for the best explanation of this algorithm on the internet. 🙂

23. Dương Đào Nguyên on June 2, 2021 at 8:36 am

thanks for the elegant and clear explanation

24. Hao H on June 2, 2021 at 8:37 am

The tricky part is the usage of index, reference and suffix link (transition), which help make it O(n), 2*n actually. Thank you very much.

25. balaji baskaran on June 2, 2021 at 8:39 am

Amazing! Thanks for all your efforts!

26. Verdusk on June 2, 2021 at 8:39 am

Thanks for the great explanation, but I have a question.

On 21:00 , shouldn’t you traverse the link on D to E, rather than B to root, considering D happens to already has a suffix link? It’ll make the amount of edges you have to traverse less.

27. Venky Rao on June 2, 2021 at 8:40 am

Thanks Tushar! Great explanation making the algorithm understanding easy

28. Dheer Banker on June 2, 2021 at 8:41 am

Great job, Tushar Roy! I spent many tough hours not understanding the algorithm and then, I finally found this brilliant video! Two examples made it really really clear! POY man, keep up the great work!

29. GIOIA on June 2, 2021 at 8:43 am

Amazing, thank you!

30. RAGHAV ATREYA on June 2, 2021 at 8:45 am

Hey Tushar will you make a video on panda and xor on hackerearth (https://www.hackerearth.com/problem/algorithm/panda-and-xor/).

31. Hadrhune0 on June 2, 2021 at 8:45 am

Really cool video, helped me recap and seeing the algorithm through its various iterations.

32. ENTERTAINMENT on June 2, 2021 at 8:51 am

I don’t understand Java Code . Do you have C++ code ?

33. Abhishek Tiwari on June 2, 2021 at 8:52 am

Great explanation ….
One confusion… At 31:20 .. you told that after creating internal node .. we increase active edge by 1 and decrease active len by 1 and then traversed the tree… While at 52:10 .. you just decreased active len by 1 but did not increase active edge by 1 before traversing … Why ?

34. Marvel Sidawhiz on June 2, 2021 at 8:55 am

35. Grigor Penev on June 2, 2021 at 8:55 am

I have a question. When you search for the existance of a suffix starting with the current character, how do you do it in O(1) ? Because, the way I see it, you can have an almost arbitrary number of paths that you can take.

36. 666 666 on June 2, 2021 at 8:55 am

I thought something wrong in the demo code

37. Hyoseung Kang on June 2, 2021 at 8:56 am

Finally found a tutorial which can help.

38. Anurag Srivastava on June 2, 2021 at 8:56 am

Thank you very very much for your videos. You are such a Gem 🙂

39. bc10000 on June 2, 2021 at 8:57 am

your voice is annoying as heck, but thanks for the info

40. Essence on June 2, 2021 at 8:59 am

Thank you soooo much! You are an amazing human being for spending all this time to make this lengthy video to help others! I have no words! Thank you thank you thank you and I wish the best to you and your family! You deserve everything you wish for! Thank you!

41. Anand Kumar on June 2, 2021 at 8:59 am

At 40:35 while creating node Q you told when leaf node is created then remaining is reduced by one then why that has not been done

42. arman nejahi on June 2, 2021 at 9:02 am

Thanks.

43. Samson Reddy on June 2, 2021 at 9:02 am

Thank you Brother .Video is so much helpful to me.

44. David Cox on June 2, 2021 at 9:02 am

Great video. Question, you label the edge with X with the compression [0,0] and the edge with \$ [4,4]. If I were to then search the tree for X\$, how would I know that it is actually located at [3..4] when the only information is that X is at 0 and \$ is at 4?

45. Ketan Sahu on June 2, 2021 at 9:02 am

The only source that explained me suffix tree in simple English and good examples. Thanks

46. Darshit Thesiya on June 2, 2021 at 9:03 am

Great explanation. Thanks for the video.

47. Sub Ta on June 2, 2021 at 9:04 am

FIT3155 gang wya

48. Muntasir billah on June 2, 2021 at 9:06 am

would u plz explain how to make suffix array from this suffix tree u made

49. Leiling Tao on June 2, 2021 at 9:06 am

Hi thank you for your video. I have a question regarding the rule of active points.You mentioned that "whenever active length 0, you check from the root to see if there’s an edge pointing to the next character’. However, can there be circumstances where active node is not root while active length is also 0? for example, at 16:00 if you had "yz" instead of "xyz" after the "\$", the active node becomes B, and the active length becomes 0 (after checking y and jumping node), and then we just check whether there is an edge leading to z? Please correct me if I am wrong but I am quite confused at this point. Thanks a lot for your video again.

50. Dharmendra Parmar on June 2, 2021 at 9:07 am

thank you tushar sir, finally i got this algorith..