In the previous article we discussed how to use a grid approach to calculate the ‘**Levenshtein distance**‘ and I mentioned that there is a shortcut to fill the grid. Lets see what that shortcut is.

We will get the same example as the previous article. transforming the word ‘five’ to ‘four’. If we go through the method discussed in that article we end up with

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | 3 |

u | 3 | 2 | 2 | 2 | 2 |

r | 4 | 3 | 3 | 3 | 3 |

So what is the shortcut.

Step 1 – Create the empty grid

ε |
f | i | v | e | |

ε |
|||||

f | |||||

o | |||||

u | |||||

r |

Step 2 – Fill the first row and first column from 0, increment one at a time.

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | ||||

o | 2 | ||||

u | 3 | ||||

r | 4 |

Step 3 – This is where things are bit tricky. We have to fill one cell at a time by getting the minimum value of the three cell (west, northwest and north) and adding one to it; EXCEPT when the cell’s column and row letters are the same. In that case we fill the cell with the northwest cell’s value.

Lets see by filling the example grid.

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | x | |||

o | 2 | ||||

u | 3 | ||||

r | 4 |

We need the value for ‘x’. We cannot apply the rule because its the exception (both column and row letters are same; ‘f’). So we fill the value ‘x’ with the value of the northwest cell.

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | |||

o | 2 | ||||

u | 3 | ||||

r | 4 |

Now lets fill the next cell.

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | x | ||

o | 2 | ||||

u | 3 | ||||

r | 4 |

To fill the ‘x’ we see that the letters are not the same (‘i’ and ‘f’). So we can apply the rule without a worry. We consider the three values (0, 1 and 2) and then get the least value (which is 0) then add 1 to it. So the value of x is 0+1 = 1.

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | ||

o | 2 | ||||

u | 3 | ||||

r | 4 |

Using this method we can quickly fill the grid.

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | |

o | 2 | ||||

u | 3 | ||||

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | ||||

u | 3 | ||||

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | |||

u | 3 | ||||

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | ||

u | 3 | ||||

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | |

u | 3 | ||||

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | 3 |

u | 3 | ||||

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | 3 |

u | 3 | ||||

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | 3 |

u | 3 | 2 | |||

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | 3 |

u | 3 | 2 | 2 | ||

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | 3 |

u | 3 | 2 | 2 | 2 | |

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | 3 |

u | 3 | 2 | 2 | 2 | 3 |

r | 4 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | 3 |

u | 3 | 2 | 2 | 2 | 3 |

r | 4 | 3 |

ε |
f | i | v | e | |

ε |
0 | 1 | 2 | 3 | 4 |

f | 1 | 0 | 1 | 2 | 3 |

o | 2 | 1 | 1 | 2 | 3 |

u | 3 | 2 | 2 | 2 | 3 |

r | 4 | 3 | 3 | 3 | 3 |

So we get the same grid as before. This time we didnt have to worry about operations such as add, delete or replace. We just had to look at the values of the adjacent cells and the column and row letters.

Now as we have gone through 3 articles on this topic, we will finish it up with a basic python code which calculates the ‘**Levenshtein distance**‘ in the next article.

Exploring NLP 06 – Minimum Edit Distance 4/4

Pingback: Exploring NLP 03 -Minimum Edit Distance 2/4 | The Puddle Jumper