Longest Common Subsequence (LCS) DP Problem
--
A subsequence is a group of characters that appears sequentially but it doesn’t have to be contiguous though it can be. For example s = “AFPKL” in this string valid sub-sequences can be “AFP”, “APK”, “AFKL” etc.
So, the problem is asking us to return the sequence of the longest common subsequence (LCS) between two strings. For instance:
Input: “abdez”, “bxide”
Output: [“b”, “d”, “e”]
Solution Approach
If we had 2 strings “abc” and “ac” the LCS will be “ac” . We get to this LCS by looking at the last letter of both strings. So, first, we compare ‘c’ from “abc” and ‘c’ from “ac”, these two characters are equal so we can add it to the end of our LCS.
Now we are left with “ab” and “a”, we will now compare ‘b’ from “ab” and ‘a’ from “a”, but these two are not equal so, we will remove first check if we remove ‘a’ from “a” we are left with “”(empty string) and the LCS between “” and “ab” will be “”.
So, we will now instead of removing “a” we will remove ‘b’ from “ab” and compare with “a” , after removing ‘b’ we are left with “a” from “ab” and the other string is already “a”, both characters are equal so we can append “a” before “c” (i.e. “ac”) in our LCS.
So, the logic here is, whenever we are finding LCS between two strings, we look at the final character and we say are these final characters of each of these strings are equal? if they are equal, then we get the LCS of the two strings without that character and prepend it to that character. But if they are not equal then we find the longest LCS between the strings by removing one or the other character.
Now we will apply this logic in our dp table and we will use two string “abdez” and “bxide” nd the LCS will be [‘b’, ‘d’, ‘e’]
Code Implementation
Complexity
Time complexity: O(nm * min(n,m)), Space complexity: O(nm * min(n,m))
Here n is the length of one string and m is the length of the other string. The space complexity is O(nm * min(n,m)) because, we have a 2D vector which stores vector type value which stores the LCS and maximum LCS can be string which has minimum length. Time complexity is O(nm * min(n,m)) because, we are traversing the whole dp table and whenever we get equal character we are storing the LCS.
But can we improve the complexity? Yes we can. Instead of storing the LCS in out DP table, if we store the length of the LCS the complexity will be reduced to O(nm) for both time and space. So, the DP table will look like:
But how will we return the LCS if we are only storing the length?
If we notice the dp table, we can see that every time we get an equal character the length is increased. So, we start traversing the the dp table from the last index(i.e dpTable[5][5]) and check if the length of the LCS in the previous column(dpTable[5][4]) and the length of the LCS in the previous row(dpTable[4][5]) is equal or not. If they are equal that means the character of this index will be added to our LCS. So, we will leverage this to return our LCS.
Code Implementation
Complexity
Time complexity: O(nm), Space complexity: O(nm)
Time complexity is O(nm) because we are traversing the dp table once that take O(nm) time and we are building the LCS which took O(min(n,m)) time which is less than O(nm).So, the time complexity is O(nm).
The space complexity is O(nm) because we are taking one 2D vector for the dp table where n is the number of column and m is the number of row and one 1D vector which will have size of min(n,m). So, The space complexity is O(nm).