Edit Distance

2 minute read

72. Edit Distance

Solution: Dynamic programming.

This is basically a simplified version of global sequence alignment commonly used in bioinformatics. In bioinformatics, different operations are usually assigned different weights to reflect the likelihood. For example, in some situations, mutations are more likely than insertions due to the properties of DNA/RNA, so replacement has lower penalty.

1. Naïve method

This uses a matrix as big as R x C where R is the length of word1 and C is the length of word2.

class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        nrow, ncol = len(word1), len(word2)
        if nrow == 0: return ncol
        if ncol == 0: return nrow
        
        dp = []
        for i in range(nrow+1):
            dp.append([])
            for j in range(ncol+1):
                dp[i].append(0)
                
        for r in range(nrow+1):
            for c in range(ncol+1):
                if r == 0 and c == 0:
                    dp[r][c] = 0
                elif r == 0:
                    dp[r][c] = dp[r][c-1] + 1
                elif c == 0:
                    dp[r][c] = dp[r-1][c] + 1
                else:
                    score = 0 if word1[r-1] == word2[c-1] else 1
                    dp[r][c] = min(dp[r-1][c] + 1, dp[r][c-1] + 1, dp[r-1][c-1] + score)
        # print(dp)            
        return dp[-1][-1]

2. Lower space complexity

class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        nrow, ncol = len(word1), len(word2)
        if nrow == 0: return ncol
        if ncol == 0: return nrow
        
        dp = list(range(0, ncol + 1))
        dp2 = [0] * (ncol + 1)
        
        for r in range(nrow+1):
            for c in range(ncol+1):
                if r == 0 and c == 0:
                    dp2[c] = 0
                elif r == 0:
                    dp2[c] = dp2[c-1] + 1
                elif c == 0:
                    dp2[c] = dp[c] + 1
                else:
                    score = 0 if word1[r-1] == word2[c-1] else 1
                    dp2[c] = min(dp[c] + 1, dp2[c-1] + 1, dp[c-1] + score)
            dp, dp2 = dp2, [0] * (ncol+1)
        # print(dp)            
        return dp[-1]

3. Some more tweaks

If we set up the initial arrays right, we can skip some of the statements

class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        nrow, ncol = len(word1), len(word2)
        if nrow == 0: return ncol
        if ncol == 0: return nrow
        
        dp = list(range(0, ncol + 1))
        dp2 = [0] * (ncol + 1)
        
        for r in range(1, nrow+1):
            for c in range(ncol+1):
                if c == 0:
                    dp2[c] = dp[c] + 1
                else:
                    score = 0 if word1[r-1] == word2[c-1] else 1
                    dp2[c] = min(dp[c] + 1, dp2[c-1] + 1, dp[c-1] + score)
            dp, dp2 = dp2, [0] * (ncol+1)

        return dp[-1]

Comments