3 longest substring without repeating characters


title: Longest Substring Without Repeating Characters
tags:
- longest-substring-without-repeating-characters
- No.3
- medium
- string
- divide-conquer
- dynamic-programming
- queue


Problem

Given a string, find the length of the longest substring without repeating characters.

Examples:

Given "abcabcbb", the answer is "abc", which the length is 3.

Given "bbbbb", the answer is "b", with the length of 1.

Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

Corner Cases

  • ""
  • "a"
  • "ab"
  • "xxxcabcxy"

Solution

Divide and Conquer

Divide

For string s[i : j], it can be divided into 2 independent substrings: s[i : k], s[k+1 : j]. If the longest substring without repeating characters for s[i : j] is LSWRC(s[i : j]), then we have:

LSWRC(s[i : j]) = max{ LSWRC(s[i : k]), LSWRC(s[k+1 : j]), s[x : y] }

where s[x : y] is the longest substring without repeating characters accrossing over k, k+1.

In this case, we can divide s[i : j] into 2 halves, which means k = (i + j) / 2

Conquer

Let say LSWRC(String s) gives the longest substring without repeating characters for input s. Then we directly get the results of the following for input s[i : j]:

  • LSWRC(s[i : k])
  • LSWRC(s[k+1 : j])

The main problem is s[x : y].

Starting from k and k+1, we move x and y one by one at one time untill:

  • x decreases to i and y increases to j
  • s[x - 1] or s[y + 1] is already in s[x : y]

Then we get the required s[x : y].

Finally, compare LSWRC(s[i : k]), LSWRC(s[k+1 : j]) and s[x : y].

An instance follows:

  LSWRC("abcabcbb") 
= max{ LSWRC("abca"), LSWRC("bcbb"), "cab"          }
= max{ max{ LSWRC("ab"), LSWRC("ca"), "abc" },
       max{ LSWRC("bc"), LSWRC("bb"), "cb"  },
       "cab"                                        }
= max{ max{ max{ LSWRC("a"), LSWRC("b"), "ab" },
            max{ LSWRC("c"), LSWRC("a"), "ca" },
            "abc"                                },
       max{ max{ LSWRC("b"), LSWRC("c"), "bc" },
            max{ LSWRC("b"), LSWRC("b"), "b"  },
            "cb"                                 },
       "cab"                                        }
= max{ max{ max{ "a", "b", "ab" },
            max{ "c", "a", "ca" },
            "abc"                  },
       max{ max{ "b", "c", "bc" },
            max{ "b", "b", "b"  },
            "cb"                   },
       "cab"                          }
= max{ max{ "ab", "ca", "abc" },
       max{ "bc", "b" , "cb"  },
       "cab"                     }
= max{ "abc", "cb", "cab" }
= "abc"

T(n) = 2 \times T(\frac{n}{2}) + O(n), thus running time is O(n lg(n)).

However, there is a severe bug in this algorithm in the part of computing s[x : y]:

When x and y are expanded, they cannot be updated in the same time. For an example: LSWRC("xxxcabcxy") = max{ LSWRC("xxxca"), LSWRC("bcxy"), "abcxy" }.

However, in the process of getting "abcxy", if we add s[x] first:

0. "ab"
1. "cab"
2. "cabc" // stop increasing y
3. "xcab"
4. "xxcab"
5. return "xcab"

s[x : y] is locked by the right c. If we add s[y] first:

0. "ab"
1. "abc"
2. "cabc" // stop decreasing x
3. "abcx"
4. "abcxy"
5. return "abcxy"

It's rather difficult to fix this bug since index x and y are symmetrical in theory while non-symmetrical in code:

class Solution {
    private String[] s;

    public int lengthOfLongestSubstring(String s) {
        this.s = s.split("");
        return LSWRC(0, s.length()-1);
    }

    public int[] LSWRC(int i, int j) {
        // if a character
        if (i == j) {return new int[] {i, j};}

        int   k       = (i + j) / 2;
        int[] lswrc   = {0, 0};

        // divide
        int[] l_lswrc = LSWRC(i  , k);
        int[] r_lswrc = LSWRC(k+1, j);

        int   l1      = l_lswrc[1] - l_lswrc[0] + 1;
        int   l2      = r_lswrc[1] - r_lswrc[0] + 1;
        lswrc = (l1 > l2) ? l_lswrc : r_lswrc;

        // conquer
        int i = k;
        int j = k + 1;

        int x = k;
        int y = k + 1;

        if (this.s[i].equals(this.s[j])) {return lswrc;}

        HashSet<String> hs = new HashSet<String>();
        boolean      istop = false;
        boolean      jstop = false;

        for ( ; !(istop && jstop); ) {
            boolean p1 = hs.contains(this.s[i]);
            boolean p2 = (i == i);
            istop = p1 || p2;

            if (!p1) {
                hs.add(this.s[i]);
                x = i;
                i = p2 ? i : i - 1;
            }

            boolean q1 = hs.contains(this.s[j]);
            boolean q2 = (j == j);
            jstop = q1 || q2;
            if (!q1) {
                hs.add(this.s[j]);
                y = j;
                j = q2 ? j : j + 1;
            }
        }

        int   l3 = lswrc[1] - lswrc[0] + 1;
        int   l4 = y - x + 1;
        lswrc = (l3 > l4) ? lswrc : new int[] {x, y};

        return lswrc;
    }
}

Dynamic Programming

For all substring s[i : j], we have i <= j.

If s[i : j] has repeating characters, then all s[a : b] with a <= i && j <= b have repeating characters too. We can describe this relationship by table A according to Bottom-Up Dynamic Programming:

8|111100000
7|11110000
6|1111000
5|110000
4|11000
3|1100
2|110
1|10
0|0
 +---------
  012345678
  xxxcabcxy

Here, A[i][j] equals

  • 0 : s[i : j] does not have repeating characters
  • 1 : s[i : j] has repeating characters

Then the relationship can be visualized by a rectangle area:

8|1111
7|1111
6|1111
5|
4|
3|
2|
1|
0|
 +---------
  012345678
  xxxcabcxy

Here all substrings contains[3 : 6] = "cabc" have repeating characters.

The longest substring without repeating characters are bounded by a line with all 1 parallel to j = i (single character line):

8|   1    0
7|  1    0
6| 1    0
5|1    0
4|    0
3|   0
2|  0
1| 0
0|0
 +---------
  012345678
  xxxcabcxy

How to check if s[i : j] has repeating characters or not? Suppose the set of characters of s[i : j] is B[i][j]. Since there exists:

s[i : j] = s[i : k] + s[k+1 : j]

we have:

B[i][j] = B[i][k] union B[k+1][j]

if:

B[i][k] intersect B[k+1][j] == Empty

Then s[i : j] does not have repeating characters.

What's more, if we compute the triangle from j axle:

8|11    000
7|11   000
6|11  000
5|11?000
4|11000
3|1100
2|110
1|10
0|0
 +---------
  012345678
  xxxcabcxy

s[i : j] = s[i : j-1] + s[j : j]. In this direction, B[i][j-1] of s[i : j-1] is computed. Then we only check if s[j] is in B[i][j - 1] or not.

For instance, s[2 : 5] = s[2 : 4] + s[5 : 5]. B[2][4] = {"x", "a", "c"}, s[5] = "b". Then A[2][5] = 0.

Complexity

The space complexity is huge. For A, O(n^2) is needed. For B, since the elements of the table are all sets, the worst case is O(n^3). The algorithm visits all A[i][j], thus at least O(n^2) time complexity. For HashSet,

  • B[i][j-1].contians(s[j])
  • B[i][j-1].add(s[j])

are all O(1). However, if we do not reduce the visiting times of 1 points like A[0][n-1], the total time complexity would be much more than O(n^2).

class Solution {
    public int lengthOfLongestSubstring(String s) {
        if (s.equals("")) { return 0; }

        String[] x = s.split("");
        int      n = s.length();

        int[]             A = new int[f(n-1, n-1) + 1];
        HashSet<String>[] B = new HashSet[f(n-1, n-1) + 1];

        for (int i=0; i<n; i++) {
            A[f(i, i)] = 0;
            B[f(i, i)] = new HashSet<String>();
            B[f(i, i)].add(x[i]);
        }

        int     c    = 1;
        boolean flag = true;
        for( ; c<n; c++) {
            flag = true;
            for (int i=0; i<n-c; i++) {
                int j = i + c;
                if (A[f(i, j)] != 1) {
                    if (B[f(i, j-1)].contains(x[j])) {
                        // to be optimized
                        for (int ii=0; ii<=i; ii++) {
                            for (int jj=n-1; j<=jj; jj--) {
                                A[f(ii, jj)] = 1;
                            }
                        }
                    }
                    else {
                        A[f(i, j)] = 0;
                        B[f(i, j)] = B[f(i, j-1)];
                        B[f(i, j)].add(x[j]);
                    }
                }
                flag = flag && (A[f(i, j)] == 1);
            }
            if (flag) { return c; }
        }
        return c;
    }

    private int f(int i, int j) { return i + (j * (j + 1)) / 2; }
}

Improvement

Like LCS problem in CLRS, we solve the problem in O(n) space complexity. Different from LCS, we use a queue instead of an array.

Here we give a looser condition: if A[i][j]==1, then for all j<=k, A[i][k]=1. In other words, we just check one column of the table instead of the rectangle area, because of the sequence the improved algorithm visiting table A.

For table A, we make it a queue. It is updated as following for input xxxcabcxy:

44,43,41,38,34,29,23,16,08
42,40,37,33,28,22,15,07
39,36,32,27,21,14,06
35,31,26,20,13,05
30,25,19,12,04
24,18,11,03
17,10,02
09,01
00

A         i+(j*(j+1))/2
-------------------------------------
0         [00]
00        [00,01]
000       [00,01,02]
0000      [00,01,02,03]
00000     [00,01,02,03,04]
000000    [00,01,02,03,04,05]
0000000   [00,01,02,03,04,05,06]
00000000  [00,01,02,03,04,05,06,07]
000000000 [00,01,02,03,04,05,06,07,08]
000000001 [01,02,03,04,05,06,07,08,09]
000000011 [02,03,04,05,06,07,08,09,10]
000000110 [03,04,05,06,07,08,09,10,11]
000001100 [04,05,06,07,08,09,10,11,12]
000011000 [05,06,07,08,09,10,11,12,13]
000110000 [06,07,08,09,10,11,12,13,14]
001100000 [07,08,09,10,11,12,13,14,15]
011000000 [08,09,10,11,12,13,14,15,16]
11000000  [09,10,11,12,13,14,15,16]
10000001  [10,11,12,13,14,15,16,17]
00000011  [11,12,13,14,15,16,17,18]
00000110  [12,13,14,15,16,17,18,19]
00001100  [13,14,15,16,17,18,19,20]
00011000  [14,15,16,17,18,19,20,21]
00110000  [15,16,17,18,19,20,21,22]
01100000  [16,17,18,19,20,21,22,23]
1100000   [17,18,19,20,21,22,23]
1000001   [18,19,20,21,22,23,24]
0000011   [19,20,21,22,23,24,25]
0000110   [20,21,22,23,24,25,26]
0001101   [21,22,23,24,25,26,27]
0011010   [22,23,24,25,26,27,28]
0110100   [23,24,25,26,27,28,29]
110100    [24,25,26,27,28,29]
101001    [25,26,27,28,29,30]
010011    [26,27,28,29,30,31]
100111    [27,28,29,30,31,32]
001111    [28,29,30,31,32,33]
011110    [29,30,31,32,33,34]
11110     [30,31,32,33,34]
11101     [31,32,33,34,35]
11011     [32,33,34,35,36]
10111     [33,34,35,36,37]
01111     [34,35,36,37,38]
1111      [35,36,37,38]
RETURN 5

B updates synchronously with A. However, the running time is still O(n^2).

import java.util.LinkedList;
import java.util.Queue;
import java.util.HashSet;

class Solution {    
    public int lengthOfLongestSubstring(String s) {
        if (s.equals("")) { return 0; }

        String[] x = s.split("");
        int      n = s.length();

        Queue<Integer>         A = new LinkedList<Integer>();
        Queue<HashSet<String>> B = new LinkedList<HashSet<String>>();

        // O(n)
        for (int i=0; i<n; i++) {
            HashSet<String> b = new HashSet<String>();
            b.add(x[i]);
            A.offer(0);
            B.offer(bi);
        }

        int     c    = 1;
        boolean flag = true;
        for( ; c<n; c++) {
            flag = true;
            for (int i=0; i<n-c; i++) {
                int             j = i + c;
                int             a = A.poll();
                HashSet<String> b = B.poll();
                // [i][j-1] has been removed
                if (a != 1) {
                    if (b.contains(x[j])) {
                        A.offer(1);
                        B.offer(b);
                        flag = flag && true;
                    }
                    else {
                        b.add(x[j]);
                        A.offer(0);
                        B.offer(b);
                        flag = flag && false;
                    }
                }
                else {
                    A.offer(1);
                    B.offer(b);
                    flag = flag && true;
                }
            }
            A.poll();
            B.poll();
            if (flag) {return c;}
        }
        return c;
    }
}
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容