433. Minimum Genetic Mutation
Description
A gene string can be represented by an 8-character long string, with choices from 'A'
, 'C'
, 'G'
, and 'T'
.
Suppose we need to investigate a mutation from a gene string startGene
to a gene string endGene
where one mutation is defined as one single character changed in the gene string.
- For example,
"AACCGGTT" --> "AACCGGTA"
is one mutation.
There is also a gene bank bank
that records all the valid gene mutations. A gene must be in bank
to make it a valid gene string.
Given the two gene strings startGene
and endGene
and the gene bank bank
, return the minimum number of mutations needed to mutate from startGene
to endGene
. If there is no such a mutation, return -1
.
Note that the starting point is assumed to be valid, so it might not be included in the bank.
Example 1:
Input: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"] Output: 1
Example 2:
Input: startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"] Output: 2
Constraints:
0 <= bank.length <= 10
startGene.length == endGene.length == bank[i].length == 8
startGene
,endGene
, andbank[i]
consist of only the characters['A', 'C', 'G', 'T']
.
Solutions
Solution 1: BFS
We define a queue q
to store the current gene sequence and the number of changes, and a set vis
to store the visited gene sequences. Initially, we add the starting gene sequence start
to the queue q
and the set vis
.
Then, we continuously take out a gene sequence from the queue q
. If this gene sequence equals the target gene sequence, we return the current number of changes. Otherwise, we iterate through the gene bank bank
, calculate the difference value between the current gene sequence and the gene sequence in the gene bank. If the difference value is $1$ and the gene sequence in the gene bank has not been visited, we add it to the queue q
and the set vis
.
If the queue q
is empty, it means that the gene change cannot be completed, so we return $-1$.
The time complexity is $O(C \times n \times m)$, and the space complexity is $O(n \times m)$. Where $n$ and $m$ are the lengths of the gene sequence and the gene bank respectively, and $C$ is the size of the character set of the gene sequence. In this problem, $C = 4$.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|