3. Longest Substring Without Repeating Characters

Description of the Problem

Given a string s, find the length of the longest substring without repeating characters.

Example 1:

Input: s = "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length of 3.

Example 2:

Input: s = "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

Example 3:

Input: s = "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3.
Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.

Constraints:

  • \(0 <= s.length <= 5 * 10^4\)
  • s consists of English letters, digits, symbols and spaces.

Solution - Sliding Window

Tags: Sliding Window

Explanation

Keep expanding our sliding window to find all substring without repeating character.

If new appended character is repeated, trim the current substring and append the new character.

Suppose before an iteration we have a substring without repeating character AcB where Capital letters represent may-be empty substrings and non-capital letter represents 1 character.

If we meet new character d which is different from c, it is fine. Just append to the substring to form a new

Otherwise, if c is the new character, then trim Ac and append c. New Substring becomes Bc which is a valid substring.

The above reasoning show that the algorithm must scan all longest possible valid substrings. The remaining work is to record the longest one.

Code (Rust)

impl Solution {
    pub fn length_of_longest_substring(s: String) -> i32 {
        // Sliding window for longest substring, j is exclusive
        let (mut longest_i, mut longest_j) : (usize, usize) = (0,0);

        // Sliding window for current substring, j is exclusive
        let (mut i, mut j) : (usize, usize)  = (0,0);

        for (k, c) in s.chars().enumerate() {

            // Find 1st index of char c in current substring (&s[i..j])
            // IF new appended character c appear in current substring,
            // i.e. substring has the form of AcBc where current substring consists of substring A, B and character c, 
            // THEN trim is, such that it becomes Bc
            if let Some(index) = (&s[i..j]).find(c){
                i += index + 1; 
            }

            j = k + 1; // the end of the window move right by 1

            // IF the current window size is larger, THEN update.
            if j - i > longest_j - longest_i {
                longest_i = i;
                longest_j = j;
            }
            
        }

        return (longest_j - longest_i) as i32;
    }
} 

Code (Java)

class Solution {
    public int lengthOfLongestSubstring(String s) {
        if ( s== null || s.isEmpty() ) return 0;

        int start = 0; int end =1;
        int max = 1;

        while( end < s.length() ){
            String subString = s.substring(start, end);
            int indexOfChar = (int) (subString.indexOf(s.charAt(end)));
            if (indexOfChar >= 0){
                start = start + indexOfChar + 1;
            }
            end++;
            max = max < end - start ? end - start : max;
        }

        return max;
    }
}

Complexity (\(n\) is length of the string)

Time complexity:**

  • Worst case: \( T(n) = \mathcal{\Theta}(n^2) \)
    • String contains no repeating characters
  • Average case: \( T(n) = O(n) \)
  • Best case: \( T(n) = \mathcal{\Theta}(n) \)
    • String repeats same character

Auxiliary Space:**

  • \( S(n) = O(1) \)
  • Depends on implementation, Rust can achieve \( O(1) \)

Further Discussion

Average Time Complexity for the algorithm

Lemma: The algorithm run in \( O(n+X) \) where n is length of the string and X is number of comparison.

Proof: We have n iterations and each iteration performs certain number of characters comparison to find first index of the same character.

Theorem: The expected number of comparisons is \( O(n) \)

Proof:

Assume \( M \) is the number of characters of the program choose to form a string and each character is equally likely to be chosen independently.

Let \( X_{ij} = \unicode{x1D7D9} \text{ { i-th character is compared to j-th character } } \)

Consider a sequence of characters \( c_i c_{i+1} c_{i+2} ... c_{j-1} \) and a character \( c_j \) in a loop, \( c_i \) and \(c_j\) compares each other if-and-only-if the sequence of characters does not contains repeating characters. Otherwise the substring will be trimmed by our algorithm before the current loop.

For instance, consider a substring abaabcd, 1st a never compare with 2nd b, c, d since there are repeating characters between these pairs of characters (1st a and 2nd b, 1st a and c, 1st a and d).

The probability of \( c_i c_{i+1} c_{i+2} ... c_{j-1} \) does not contain repeating characters, according to formula of birthday paradox, is \(\frac{M!}{M^{(j-i)} (M - (j - i))! }\)

The total number of comparison \(\mathbb{E}[X] = \mathbb{E} \left [ \sum_{i = 0}^{n - 2} \sum_{j = i+1}^{n - 1} X_{ij} \right ] = \sum_{i = 0}^{n - 2} \sum_{j = i+1}^{n - 1} \mathbb{E}[ X_{ij} ] \)

\(= \sum_{i = 0}^{n - 2} \sum_{j = i+1}^{n - 1} \frac{M!}{M^{(j-i)} (M - (j - i))! } \)

\(= \sum_{i = 0}^{n - 2} \sum_{ k = 1}^{n - 1 - i} \frac{M!}{M^{k} (M - k)! } \)

\( \le \sum_{i = 0}^{n - 2} \sum_{k = 1}^{M} \frac{M!}{M^{k} (M - k)! } \)

\(\because\) When \(n \ge M \), the chance of substring without repeating characters becomes zero

\( \le \sum_{i = 0}^{n - 2} \int_{0}^{M} e^{ - x^2 / 2M } dx \) \( \ \because \) Approximation of the "birthday formula"

\( = \sum_{i = 0}^{n - 2} \frac{\sqrt{M\pi}}{\sqrt{2}} \ erf(\sqrt{\frac{M}{2}}) \)

\( \approx (n-1) 12.21 \) \( \ \because \) In ASCII table, dec32 to dec126 are charcters can be type in keyboard

\( = O(n) \)

Interpretation: Seldom do two characters compare each other since when the distance between two characters becomes larger, the chance of the characters between them contains no repeating characters becomes smaller.

Corollary: The average running time of the algorithm is \(O(n)\)

Proof: Followed by the lemma and the theorem.