Thought on longest Substring Without Repeating Characters problem
Problem statement:
Given a string, find the longest substring without any repeating characters.
For example, the input string is "abcabaa", then the longest substring is either "abc" or "bca" or "cab".
Analysis:
If we consider the string as an array of characters, one of the constraint of this problem is the substring -> it means the continuous sub-array of the array.
Writing this into sub-problem, we are looking for pair of indices i,j in [0,n) where n is the length of the string, so that two conditions are satisfied:
- there is no repeating character within indices i,j
- (j - i + 1) is maximum (i.e. this is the length of the sub-string, assuming i <= j)
Solution:
Brute force:
With these two conditions above, what we can do is:
- as it is the sub-string, we can get all possible substring of the string
- for each of the sub-string, find if it has repeating characters. In case it does not, get its length and compare with the current maximum value. If it is bigger than the current maximum value, set the current maximum value to its length.
Note: code is written in python.
This brute force solution works fine, however, its Big(O) is too big. Let examine its complexity.
We can easily see that the outer function has complexity of n ^ 2 (where n is the length of the input string), and the inner function has complexity of n. So overall complexity is n^3, which is too high.
Let's analyze to see if we can optimize the solution further.
Taking closer look into this problem, how are we going to find out the maximum substring without duplication? We can see that there are two boundaries: the first character of the input string, and the last character of the input string. Assuming reading the character from left to right, the longest possible substring will always start at the first character.
What we can do is to read the input string from left to right, at any particular character, there are two cases:
- this character is not present in current longest substring, that's good. We will append this character into the current longest substring and continue to next character.
- this character is present in the current longest substring, so what we have to do is to remove the prefix of the current longest substring up to the previous occurrence of this character. Then we append this character to the updated longest substring and continue to the next character.
With this approach, it is guaranteed to have the maximum substring that satisfies the non-duplicate condition.
However, one question is pop-up in the algorithm: how to know if the character is present in the current longest substring and if so, where is it?
One simplest way to do is to track the set of characters of the substring using a separate hash map. This set data structure allows us to insert element in O(1) time, as well as to check its existence in O(1) time.
Optimized solution:
In this algorithm, I am using the map_char_to_index hash map to track the character to its index, and it allows us to:
- quickly check if the character is present
- in case it is present, what is its index so that allow us to rebuild the map_char_to_index object
With this optimized solution, we can see that for any character inside of the input string, the algorithm will examine it twice, so the algorithm complexity is O(n).
Comments
Post a Comment