Edit Distance DP Problem
Edit distance problems is: if we are given two string word1 and word2 what is the minimum number of operation needed to transform a into b. The operations are: substitution, insertion and deletion. You can find this problem in interviewbit and leetcode.
In order to solve this problem, we have to be familiar with sub-string operation. Substations are contiguous sequence of characters within a string( from Wikipedia).Let’s say we have a string “horse” which goes from index 0 to 4. So, the sub-strings are: index ( 0, 0) -> ‘h’, (0,1) -> ‘ho’, (0 , 2)-> hor, (0, 3)-> ‘hors’, (0, 4) -> ‘horse’.
Now, lets see the scenarios we will face while transforming one string to another. There are four scenarios.
- characters are same, so we do nothing
- Characters are different, so we do substitution
- Characters are different, so we do insertion
- Characters are different, so we do deletion
1.Characters are same, so we do nothing
To transform the sub-string a[0, 3]( “abcd”) to b[0,3](“ xyzd”) we will start comparing the string in reverse order. As the last character of both the strings are same, we will do nothing. That means number of operation that we will need to do for substring a[0,3] and b[0,3] will be equal to the number of operation we need to do for substrings a[0,2] and b[0,2]. Because the character ‘d’ is the same, so we won’t have to do any operations to transform them.
2.Characters are different, so we do substitution
To transform substring a[0,2](“abc”) to b[0,2](“xyz”) we notice that ‘c’ doesn’t match ‘z’. Now we have three options(substitution, insertion, deletion). We will do substitution. To do that, we will drop ‘c’ and ‘z’ from the sub-strings and take the substrings a[0,1] (“ab”)and b[0,1](“xy”) and we will transform “ab” into “xy” and in place of ‘c’ will put ‘y’. That means to do substitution, we transform everything before the mismatched character and at the mismatch we perform substitution.
3.Characters are different, so we do insertion
To transform substring a[0,2](“abc”) to b[0,2](“xyz”) with insertion, we drop ‘z’ from sub-string of b and then transform “abc” into “xy” then insert ‘z’ at the end.
4.Characters are different, so we do deletion
To transform substring a[0,2](“abc”) to b[0,1](“xy”) with deletion, we delete ‘c’ from sub-string of a and transform “ab” into “xy”.
We can solve this problem by building a dp table(i.e. 2D array). In each index of this 2D array we are going to store minimum number of operation that we can perform to a substring of our first string to turn it into another substring of our second string. If the input is: word1 = “horse” word2 = “ros”, then the dp table will look like this:
The first row and the first column is for empty string, which is our base case. What happening here is: in the first row we are converting each character into empty string which takes 1 delete operation for each character. So, to convert ‘h’ -> “ ” = 1 delete, to convert ‘o’ -> “ ” = 2 delete(delete ‘h’ then delete ‘o’) and so on. Likewise in the first column we are converting empty string to “ros”, which will take 1 insert for each character : convert “ ” -> ‘r’ = 1 insert, convert “ ” -> ‘ro’ = 2 insert(insert ‘r’ then insert ‘o’) and so on.
Now we can move to converting “horse” into “ros”. First characters we compared is ‘h’ and ‘o’ which is not same so, we can do any operation(i.e substitution, deletion, insertion), in this case if we do substitution(i.e scenario 2 described above) .
Next we are comparing ‘o’ to ‘r’ which is not same. In this case we can do either substitution or deletion, both will take same number of operation. The next characters we compared are ‘r’ and ‘r’, which are same. So we won’t have to do anything(i.e scenario 1 described above). We will go on counting operations by using the 4 scenarios and in last index of the dp table we will get out minimum edit distance.
Code in C++
The code is pretty self explanatory. Whatever I described in my post I have just implemented it. First initialized the dp table(i.e 2d vector) and then checked if the characters are same or not and according to that applied any of the 4 scenarios that I have described above.
Time complexity: O(nm)
Here n is the length of one string and m is the length of another string and for string length n we are execution a nested loop of length m.
Space Complexity: O(nm)
Because we are storing the number of operations in a 2d vector which has n rows and m columns.
So, we have a optimized dp solution for the edit distance problem. But can we improve the complexity? Yes, we can. We may not be able to optimize the time complexity but we can definitely improve the space complexity.
If you have read the solution approach, notice that in every comparison we are just checking values which is located in the row we are currently traversing and the previous row. That means at a time we only use two rows. So, we actually don’t need the 2d vector.
Here we are taking two 1d arrays to keep track of two rows. And We are changing the values of each array depending on whether the index is odd or even but the logic to find the minimum edit distance is still the same. This code uses pointer to put values in evenEdits and oddEdits arrays. So, if you find this hard to understand, print evenEdits[j] and oddEdits[j] each time the code is updating the currentEdits[j] and I think you will understand how evenEdits and oddEdits are updating. Also the time complexity of this code is still O(nm) but the space complexity is O(min(n, m) because, we are taking two arrays in size of the smaller string.