You are given a 0-indexed string s and a dictionary of words dictionary. You have to break s into one or more non-overlapping substrings such that each substring is present in dictionary. There may be some extra characters in s which are not present in any of the substrings.
Return the minimum number of extra characters left over if you break up s optimally.
Example 1:
Input: s = "leetscode", dictionary = ["leet","code","leetcode"]
Output: 1
Explanation: We can break s in two substrings: "leet" from index 0 to 3 and "code" from index 5 to 8. There is only 1 unused character (at index 4), so we return 1.
Example 2:
Input: s = "sayhelloworld", dictionary = ["hello","world"]
Output: 3
Explanation: We can break s in two substrings: "hello" from index 3 to 7 and "world" from index 8 to 12. The characters at indices 0, 1, 2 are not used in any substring and thus are considered as extra characters. Hence, we return 3.
Constraints:
1 <= s.length <= 50
1 <= dictionary.length <= 50
1 <= dictionary[i].length <= 50
dictionary[i] and s consists of only lowercase English letters
dictionary contains distinct words
Solutions
Solution 1: Hash Table + Dynamic Programming
We can use a hash table \(ss\) to record all words in the dictionary, which allows us to quickly determine whether a string is in the dictionary.
Next, we define \(f[i]\) to represent the minimum number of extra characters in the first \(i\) characters of string \(s\), initially \(f[0] = 0\).
When \(i \ge 1\), the \(i\)th character \(s[i - 1]\) can be an extra character, in which case \(f[i] = f[i - 1] + 1\). If there exists an index \(j \in [0, i - 1]\) such that \(s[j..i)\) is in the hash table \(ss\), then we can take \(s[j..i)\) as a word, in which case \(f[i] = f[j]\).
In summary, we can get the state transition equation:
where \(i \ge 1\), and \(j \in [0, i - 1]\) and \(s[j..i)\) is in the hash table \(ss\).
The final answer is \(f[n]\).
The time complexity is \(O(n^3 + L)\), and the space complexity is \(O(n + L)\). Here, \(n\) is the length of string \(s\), and \(L\) is the sum of the lengths of all words in the dictionary.
We can use a trie to optimize the time complexity of Solution 1.
Specifically, we first insert each word in the dictionary into the trie \(root\) in reverse order, then we define \(f[i]\) to represent the minimum number of extra characters in the first \(i\) characters of string \(s\), initially \(f[0] = 0\).
When \(i \ge 1\), the \(i\)th character \(s[i - 1]\) can be an extra character, in which case \(f[i] = f[i - 1] + 1\). We can also enumerate the index \(j\) in reverse order in the range \([0..i-1]\), and determine whether \(s[j..i)\) is in the trie \(root\). If it exists, then we can take \(s[j..i)\) as a word, in which case \(f[i] = f[j]\).
The time complexity is \(O(n^2 + L)\), and the space complexity is \(O(n + L \times |\Sigma|)\). Here, \(n\) is the length of string \(s\), and \(L\) is the sum of the lengths of all words in the dictionary. Additionally, \(|\Sigma|\) is the size of the character set. In this problem, the character set is lowercase English letters, so \(|\Sigma| = 26\).