Longest Substring Without Repeating Characters

The solution to the Longest Substring Without Repeating Characters in Java

Jan 2, 20237 min read

Overview:

Longest Substring Without Repeating Characters is a medium level problem taken directly from LeetCode. It is a string problem that tests basic data structures and algorithms knowledge. There are multiple ways to solve this problem and we will explore a couple of them in this article.

Question:

Given a string s, find the length of the longest substring without repeating characters.

For example, for a given string “pwwkew”, the output will be 3 because the longest substring “wke” and has a length of 3. Keep in mind that this longest substring does not have any repeating characters.

Let’s assume that string s = “pwwkew” for the entirety of this article.

Brute Force:

First, let’s list each substring we can generate from the string without repeating a character to have a clear view of what we need to generate with our code.

s = "pwwkew"
list = ["pw", "w", "wke", "kew", "ew", "w"]

We can create this list of substrings by iterating through each character in the string. For each iteration, we need to create a substring subs starting with the current character. Then, we will add characters to subs by looping over the remaining characters in s until we find a character that already exists in subs . This process is visualized below:

In this visualization, we are storing a list of all the possible substrings and then figuring out what the longest substring is. However, we can actually reduce the amount of memory the list takes by storing just the length of the current substring. For this, we will use a max variable to store the maximum length of the current subs and update max if the length of subs is greater than the current max in each iteration.

class Solution {
  public int lengthOfLongestSubstring(String s) {
    if (s.length() == 0 || s.length() == 1) {
      return s.length();
    }
    int max = 0;
    for (int i = 0; i < s.length(); i++) {
      String subs = Character.toString(s.charAt(i));
      for (int j = i + 1; j < s.length(); j++) {
        if (!subs.contains(Character.toString(s.charAt(j)))) {
          subs += Character.toString(s.charAt(j));
        } else {
          if (subs.length() > max) {
            max = subs.length();
          }
          break;
        }
      }
      if (subs.length() > max) {
        max = subs.length();
      }
    }
    return max;
  }
}

The runtime complexity of this solution is not the most optimal.

In this solution, we are iterating over each character in the string which gives us a runtime complexity of O(n), where n is the total number of characters in the string s.

Then for each character, we are iterating over the rest of the characters in the string which gives us an additional runtime complexity of O(n), where n is the total number of characters remaining in the string s.

On top of that, we are using the string contains method to figure out if a character is repeated in subs which adds to the runtime complexity by a factor of O(n), where n is the total number of characters in subs.

This gives us a total runtime complexity of O(n³).

Surprisingly, LeetCode accepts this solution and beats more than 60% of other solutions. The runtime for this solution is over 450 ms. We will now look at optimization strategies to improve the runtime of the solution.

HashSet:

One way to optimize our current code is to replace the string contains method with something more optimal. The string contains method has a worst-case runtime complexity of O(n) because it looks for a match by iterating through each character in the string until it does or does not find the character. So, instead of using a string to store our substring, we can use a HashSet data structure to store the characters.

Why HashSet? What is a HashSet?

HashSet is a collection available in the java.util package. HashSet is a class that implements the Set interface and is backed by a hash table. HashSet is actually a HashMap instance.

The main reason we are opting for a HashSet is that every item in the HashSet is unique. This means that if we add two identical items in a HashSet it will store only one.

import java.util.HashSet;

class Main {
  public static void main(String[] args) {
    HashSet<Character> c = new HashSet<Character>();
    c.add('a');
    System.out.println(c); // [a]
    c.add('a');
    System.out.println(c); // [a]
  }
}

Furthermore, HashSet offers constant time performance for basic operations like add, remove, contains, and size. This is perfect in our case because we can use the contains method to decrease our total runtime of the solution by a factor of O(n).

class Solution {
  public int lengthOfLongestSubstring(String s) {
    if (s.length() == 0 || s.length() == 1) {
      return s.length();
    }

    int max = 0;
    for (int i = 0; i < s.length(); i++) {
      HashSet<Character> hash = new HashSet<>();
      hash.add(s.charAt(i));

      for (int j = i + 1; j < s.length(); j++) {
        if (!hash.contains(s.charAt(j))) {
          hash.add(s.charAt(j));
        } else {
          if (hash.size() > max) {
            max = hash.size();
          }
          break;
        }
      }

      if (hash.size() > max) {
        max = hash.size();
      }
    }

    return max;
  }
}

Since the runtime complexity of the HashSet contains method is constant, our total runtime complexity of the solution is now O(n²). After this improvement, the runtime of this solution in LeetCode is around 115 ms.

Next, let's look at an algorithm that will optimize our code even further.

Sliding Window:

One thing you can notice in our brute force solution is that for every outer loop, we stop the inner loop when we find a character that already exists in our HashSet. We can do the same thing with only one iteration through s by using two pointers instead of an inner and outer loop as shown in the visualization below:

In the visualization above, the left pointer is the start of a substring and the right pointer is pointing towards the next possible character in the substring. The two pointers create a dynamic sliding window that will represent the current substring without a repeated character. We will store the characters of the substring in a HashMap hash.

Here, the right pointer keeps moving forward and continues to build our current substring until it reaches a character that is already present in hash. We can achieve this by adding the character pointed by right to hash and incrementing right to find the next possible character.

When the right pointer visits a character that is already available in hash, we will increment the left pointer and delete the character in hash pointed by the left pointer.

Why increment the left pointer?

If we compare this approach to our brute-force solution, we can see that the right pointer corresponds to the inner loop and the left pointer to the outer loop. In other words, the left pointer is the start of a substring and the right pointer is pointing toward the next possible character in the substring.

In the brute force solution, when we find a repeating character, we increment the index of the outer loop thus forcing the loop to iterate to the next character. We are basically doing the same thing here by incrementing the left pointer because the left pointer should be at the start of a substring.

Why delete the character from the HashSet?

Once again, if we compare this approach to our brute force solution, every time we start a new substring, we start with the character pointed by the outer loop. Since we replaced the two loops with two pointers, the new substring should start with the new character left is pointing to.

The two pointers in our algorithm create a window of characters that represent our substring. In the visualization above, we can see that when the left pointer is incremented, this new character now needs to be at the start of the new substring. We can also see that hash already has this new character and we don’t need to add it again to the HashMap. Since the window already represents the new substring, we just need to delete the character left was pointing to because it is not part of the new substring.

class Solution {
  public int lengthOfLongestSubstring(String s) {
    if (s.length() == 0 || s.length() == 1) {
      return s.length();
    }

    int left = 0;
    int right = 0;
    int max = 0;

    HashSet<Character> hash = new HashSet<>();

    while (right < s.length()) {
      if (!hash.contains(s.charAt(right))) {
        hash.add(s.charAt(right));
        right++;
        if (hash.size() > max) {
          max = hash.size();
        }
      } else {
        hash.remove(s.charAt(left));
        left++;
      }
    }

    return max;
  }
}

This code lets us iterate through s only once and the two pointers help us determine the start and the end of a substring. We are storing each character of the current substring in a HashMap hash. The HashMap operations that we are using in our code is add, size, and contains. All these operations have constant runtime complexity.

Since all the operations have constant runtime complexity and the code is iterating through the string only once, our runtime complexity of this solution is O(n). This solution beats about 92% of other solutions and has a runtime of about 7 ms.

In a nutshell:

Longest Substring Without Repeating Characters is one of the first medium level LeetCode problems you will come across. It tests your basic data structure and algorithms knowledge. The brute force solution we used simply follows your intuition and returns the correct output. However, the brute force solution has a runtime complexity of O(n³). However, we can use a HashMap data structure and Sliding Window algorithm to optimize our code to a runtime complexity of O(n). There are definitely more solutions you can find for this problem but the ones discussed here are one of the most common solutions.

Happy Programming!