title: Longest Substring Without Repeating Characters
tags:
- longest-substring-without-repeating-characters
- No.3
- medium
- string
- divide-conquer
- dynamic-programming
- queue
Problem
Given a string, find the length of the longest substring without repeating characters.
Examples:
Given "abcabcbb", the answer is "abc", which the length is 3.
Given "bbbbb", the answer is "b", with the length of 1.
Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.
Corner Cases
"""a""ab""xxxcabcxy"
Solution
Divide and Conquer
Divide
For string s[i : j], it can be divided into 2 independent substrings: s[i : k], s[k+1 : j]. If the longest substring without repeating characters for s[i : j] is LSWRC(s[i : j]), then we have:
LSWRC(s[i : j]) = max{ LSWRC(s[i : k]), LSWRC(s[k+1 : j]), s[x : y] }
where s[x : y] is the longest substring without repeating characters accrossing over k, k+1.
In this case, we can divide s[i : j] into 2 halves, which means k = (i + j) / 2
Conquer
Let say LSWRC(String s) gives the longest substring without repeating characters for input s. Then we directly get the results of the following for input s[i : j]:
LSWRC(s[i : k])LSWRC(s[k+1 : j])
The main problem is s[x : y].
Starting from k and k+1, we move x and y one by one at one time untill:
-
xdecreases toiandyincreases toj -
s[x - 1]ors[y + 1]is already ins[x : y]
Then we get the required s[x : y].
Finally, compare LSWRC(s[i : k]), LSWRC(s[k+1 : j]) and s[x : y].
An instance follows:
LSWRC("abcabcbb")
= max{ LSWRC("abca"), LSWRC("bcbb"), "cab" }
= max{ max{ LSWRC("ab"), LSWRC("ca"), "abc" },
max{ LSWRC("bc"), LSWRC("bb"), "cb" },
"cab" }
= max{ max{ max{ LSWRC("a"), LSWRC("b"), "ab" },
max{ LSWRC("c"), LSWRC("a"), "ca" },
"abc" },
max{ max{ LSWRC("b"), LSWRC("c"), "bc" },
max{ LSWRC("b"), LSWRC("b"), "b" },
"cb" },
"cab" }
= max{ max{ max{ "a", "b", "ab" },
max{ "c", "a", "ca" },
"abc" },
max{ max{ "b", "c", "bc" },
max{ "b", "b", "b" },
"cb" },
"cab" }
= max{ max{ "ab", "ca", "abc" },
max{ "bc", "b" , "cb" },
"cab" }
= max{ "abc", "cb", "cab" }
= "abc"
T(n) = 2 \times T(\frac{n}{2}) + O(n), thus running time is O(n lg(n)).
However, there is a severe bug in this algorithm in the part of computing s[x : y]:
When x and y are expanded, they cannot be updated in the same time. For an example: LSWRC("xxxcabcxy") = max{ LSWRC("xxxca"), LSWRC("bcxy"), "abcxy" }.
However, in the process of getting "abcxy", if we add s[x] first:
0. "ab"
1. "cab"
2. "cabc" // stop increasing y
3. "xcab"
4. "xxcab"
5. return "xcab"
s[x : y] is locked by the right c. If we add s[y] first:
0. "ab"
1. "abc"
2. "cabc" // stop decreasing x
3. "abcx"
4. "abcxy"
5. return "abcxy"
It's rather difficult to fix this bug since index x and y are symmetrical in theory while non-symmetrical in code:
class Solution {
private String[] s;
public int lengthOfLongestSubstring(String s) {
this.s = s.split("");
return LSWRC(0, s.length()-1);
}
public int[] LSWRC(int i, int j) {
// if a character
if (i == j) {return new int[] {i, j};}
int k = (i + j) / 2;
int[] lswrc = {0, 0};
// divide
int[] l_lswrc = LSWRC(i , k);
int[] r_lswrc = LSWRC(k+1, j);
int l1 = l_lswrc[1] - l_lswrc[0] + 1;
int l2 = r_lswrc[1] - r_lswrc[0] + 1;
lswrc = (l1 > l2) ? l_lswrc : r_lswrc;
// conquer
int i = k;
int j = k + 1;
int x = k;
int y = k + 1;
if (this.s[i].equals(this.s[j])) {return lswrc;}
HashSet<String> hs = new HashSet<String>();
boolean istop = false;
boolean jstop = false;
for ( ; !(istop && jstop); ) {
boolean p1 = hs.contains(this.s[i]);
boolean p2 = (i == i);
istop = p1 || p2;
if (!p1) {
hs.add(this.s[i]);
x = i;
i = p2 ? i : i - 1;
}
boolean q1 = hs.contains(this.s[j]);
boolean q2 = (j == j);
jstop = q1 || q2;
if (!q1) {
hs.add(this.s[j]);
y = j;
j = q2 ? j : j + 1;
}
}
int l3 = lswrc[1] - lswrc[0] + 1;
int l4 = y - x + 1;
lswrc = (l3 > l4) ? lswrc : new int[] {x, y};
return lswrc;
}
}
Dynamic Programming
For all substring s[i : j], we have i <= j.
If s[i : j] has repeating characters, then all s[a : b] with a <= i && j <= b have repeating characters too. We can describe this relationship by table A according to Bottom-Up Dynamic Programming:
8|111100000
7|11110000
6|1111000
5|110000
4|11000
3|1100
2|110
1|10
0|0
+---------
012345678
xxxcabcxy
Here, A[i][j] equals
-
0:s[i : j]does not have repeating characters -
1:s[i : j]has repeating characters
Then the relationship can be visualized by a rectangle area:
8|1111
7|1111
6|1111
5|
4|
3|
2|
1|
0|
+---------
012345678
xxxcabcxy
Here all substrings contains[3 : 6] = "cabc" have repeating characters.
The longest substring without repeating characters are bounded by a line with all 1 parallel to j = i (single character line):
8| 1 0
7| 1 0
6| 1 0
5|1 0
4| 0
3| 0
2| 0
1| 0
0|0
+---------
012345678
xxxcabcxy
How to check if s[i : j] has repeating characters or not? Suppose the set of characters of s[i : j] is B[i][j]. Since there exists:
s[i : j] = s[i : k] + s[k+1 : j]
we have:
B[i][j] = B[i][k] union B[k+1][j]
if:
B[i][k] intersect B[k+1][j] == Empty
Then s[i : j] does not have repeating characters.
What's more, if we compute the triangle from j axle:
8|11 000
7|11 000
6|11 000
5|11?000
4|11000
3|1100
2|110
1|10
0|0
+---------
012345678
xxxcabcxy
s[i : j] = s[i : j-1] + s[j : j]. In this direction, B[i][j-1] of s[i : j-1] is computed. Then we only check if s[j] is in B[i][j - 1] or not.
For instance, s[2 : 5] = s[2 : 4] + s[5 : 5]. B[2][4] = {"x", "a", "c"}, s[5] = "b". Then A[2][5] = 0.
Complexity
The space complexity is huge. For A, O(n^2) is needed. For B, since the elements of the table are all sets, the worst case is O(n^3). The algorithm visits all A[i][j], thus at least O(n^2) time complexity. For HashSet,
B[i][j-1].contians(s[j])B[i][j-1].add(s[j])
are all O(1). However, if we do not reduce the visiting times of 1 points like A[0][n-1], the total time complexity would be much more than O(n^2).
class Solution {
public int lengthOfLongestSubstring(String s) {
if (s.equals("")) { return 0; }
String[] x = s.split("");
int n = s.length();
int[] A = new int[f(n-1, n-1) + 1];
HashSet<String>[] B = new HashSet[f(n-1, n-1) + 1];
for (int i=0; i<n; i++) {
A[f(i, i)] = 0;
B[f(i, i)] = new HashSet<String>();
B[f(i, i)].add(x[i]);
}
int c = 1;
boolean flag = true;
for( ; c<n; c++) {
flag = true;
for (int i=0; i<n-c; i++) {
int j = i + c;
if (A[f(i, j)] != 1) {
if (B[f(i, j-1)].contains(x[j])) {
// to be optimized
for (int ii=0; ii<=i; ii++) {
for (int jj=n-1; j<=jj; jj--) {
A[f(ii, jj)] = 1;
}
}
}
else {
A[f(i, j)] = 0;
B[f(i, j)] = B[f(i, j-1)];
B[f(i, j)].add(x[j]);
}
}
flag = flag && (A[f(i, j)] == 1);
}
if (flag) { return c; }
}
return c;
}
private int f(int i, int j) { return i + (j * (j + 1)) / 2; }
}
Improvement
Like LCS problem in CLRS, we solve the problem in O(n) space complexity. Different from LCS, we use a queue instead of an array.
Here we give a looser condition: if A[i][j]==1, then for all j<=k, A[i][k]=1. In other words, we just check one column of the table instead of the rectangle area, because of the sequence the improved algorithm visiting table A.
For table A, we make it a queue. It is updated as following for input xxxcabcxy:
44,43,41,38,34,29,23,16,08
42,40,37,33,28,22,15,07
39,36,32,27,21,14,06
35,31,26,20,13,05
30,25,19,12,04
24,18,11,03
17,10,02
09,01
00
A i+(j*(j+1))/2
-------------------------------------
0 [00]
00 [00,01]
000 [00,01,02]
0000 [00,01,02,03]
00000 [00,01,02,03,04]
000000 [00,01,02,03,04,05]
0000000 [00,01,02,03,04,05,06]
00000000 [00,01,02,03,04,05,06,07]
000000000 [00,01,02,03,04,05,06,07,08]
000000001 [01,02,03,04,05,06,07,08,09]
000000011 [02,03,04,05,06,07,08,09,10]
000000110 [03,04,05,06,07,08,09,10,11]
000001100 [04,05,06,07,08,09,10,11,12]
000011000 [05,06,07,08,09,10,11,12,13]
000110000 [06,07,08,09,10,11,12,13,14]
001100000 [07,08,09,10,11,12,13,14,15]
011000000 [08,09,10,11,12,13,14,15,16]
11000000 [09,10,11,12,13,14,15,16]
10000001 [10,11,12,13,14,15,16,17]
00000011 [11,12,13,14,15,16,17,18]
00000110 [12,13,14,15,16,17,18,19]
00001100 [13,14,15,16,17,18,19,20]
00011000 [14,15,16,17,18,19,20,21]
00110000 [15,16,17,18,19,20,21,22]
01100000 [16,17,18,19,20,21,22,23]
1100000 [17,18,19,20,21,22,23]
1000001 [18,19,20,21,22,23,24]
0000011 [19,20,21,22,23,24,25]
0000110 [20,21,22,23,24,25,26]
0001101 [21,22,23,24,25,26,27]
0011010 [22,23,24,25,26,27,28]
0110100 [23,24,25,26,27,28,29]
110100 [24,25,26,27,28,29]
101001 [25,26,27,28,29,30]
010011 [26,27,28,29,30,31]
100111 [27,28,29,30,31,32]
001111 [28,29,30,31,32,33]
011110 [29,30,31,32,33,34]
11110 [30,31,32,33,34]
11101 [31,32,33,34,35]
11011 [32,33,34,35,36]
10111 [33,34,35,36,37]
01111 [34,35,36,37,38]
1111 [35,36,37,38]
RETURN 5
B updates synchronously with A. However, the running time is still O(n^2).
import java.util.LinkedList;
import java.util.Queue;
import java.util.HashSet;
class Solution {
public int lengthOfLongestSubstring(String s) {
if (s.equals("")) { return 0; }
String[] x = s.split("");
int n = s.length();
Queue<Integer> A = new LinkedList<Integer>();
Queue<HashSet<String>> B = new LinkedList<HashSet<String>>();
// O(n)
for (int i=0; i<n; i++) {
HashSet<String> b = new HashSet<String>();
b.add(x[i]);
A.offer(0);
B.offer(bi);
}
int c = 1;
boolean flag = true;
for( ; c<n; c++) {
flag = true;
for (int i=0; i<n-c; i++) {
int j = i + c;
int a = A.poll();
HashSet<String> b = B.poll();
// [i][j-1] has been removed
if (a != 1) {
if (b.contains(x[j])) {
A.offer(1);
B.offer(b);
flag = flag && true;
}
else {
b.add(x[j]);
A.offer(0);
B.offer(b);
flag = flag && false;
}
}
else {
A.offer(1);
B.offer(b);
flag = flag && true;
}
}
A.poll();
B.poll();
if (flag) {return c;}
}
return c;
}
}