Exploring NLP 05 -Minimum Edit Distance 3/4

In the previous article we discussed how to use a grid approach to calculate the ‘Levenshtein distance‘ and I mentioned that there is a shortcut to fill the grid. Lets see what that shortcut is.

We will get the same example as the previous article. transforming the word ‘five’ to ‘four’. If we go through the method discussed in that article we end up with

ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2 3
u 3 2 2 2 2
r 4 3 3 3 3

So what is the shortcut.

Step 1 – Create the empty grid

ε f i v e
ε
f
o
u
r

Step 2 – Fill the first row and first column from 0, increment one at a time.

ε f i v e
ε  0 1 2 3 4
f 1
o 2
u 3
r 4

Step 3 – This is where things are bit tricky. We have to fill one cell at a time by getting the minimum value of the three cell (west, northwest and north) and adding one to it; EXCEPT when the cell’s column and row letters are the same. In that case we fill the cell with the northwest cell’s value.

Lets see by filling the example grid.

ε f i v e
ε 0 1 2 3 4
f 1  x
o 2
u 3
r 4

We need the value for ‘x’. We cannot apply the rule because its the exception (both column and row letters are same; ‘f’). So we fill the value ‘x’ with the value of the northwest cell.

ε f i v e
ε  0 1 2 3 4
f  1 0
o  2
u  3
r  4

Now lets fill the next cell.

ε f i v e
ε 0 3 4
f  1  x
o  2
u  3
r  4

To fill the ‘x’ we see that the letters are not the same (‘i’ and ‘f’). So we can apply the rule without a worry. We consider the three values (0, 1 and 2) and then get the least value (which is 0) then add 1 to it. So the value of x is 0+1 = 1.

ε f i v e
ε 0 1 2 3 4
f 1 0 1
o 2
u 3
r 4

Using this method we can quickly fill the grid.

ε f i v e
ε 0 1 2 3 4
f 1 0 1 2
o 2
u 3
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2
u 3
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1
u 3
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2  1 1
u 3
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2
u 3
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2 3
u 3
r  4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2 3
u 3
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2  1 1  2 3
u 3  2
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2 3
u 3 2 2
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2 3
u 3 2 2 2
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2 3
u 3 2 2 2 3
r 4
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2 3
u 3 2 2 2 3
r 4 3
ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2 3
u 3 2 2 2 3
r 4 3 3 3 3

So we get the same grid as before. This time we didnt have to worry about operations such as add, delete or replace. We just had to look at the values of the adjacent cells and the column and row letters.

Now as we have gone through 3 articles on this topic, we will finish it up with a basic python code which calculates the  ‘Levenshtein distance‘ in the next article.

Exploring NLP 06 – Minimum Edit Distance 4/4

Advertisements

Exploring NLP 04 -Minimum Edit Distance 2/4

In the previous article we looked in to what ‘Levenshtein distance‘ is and an example use case. We saw that when the word we are transforming has small number of letters it is easy to find the minimum edit distance, but as the number of letters in the word increases it is difficult to be sure that we have found the ‘minimum’. So we will be discussing a method that will make sure that we end up with the value for minimum edit distance.

We will take the following example.

five four

From one glance we can see that we need minimum 3 operations to transform five to four (3 replace operations). Lets see how this new method would give that answer. First we have to make a grid like follows. Columns will have the word before transformation and the rows will have the word after transformation. The ‘ε‘  (epsilon) denotes an empty string.

ε f i v e
ε
f
o
u
r

We start from the top and we fill the cells. We read the grid from top to left. For example, to fill the first cell we have to think ‘how many operations does it take to convert an empty string (ε) to an empty string (ε) ?’ The answer is zero. So we put 0 there.

ε f i v e
ε  0
f
o
u
r

Next, to fill the second cell we have to think ‘how many operations does it take to convert a ‘f’ to an empty string (ε) ?’ The answer is 1 (delete ‘f’). So we put 1 there.

ε f i v e
ε  0  1
f
o
u
r

Next to fill the third cell we have to think ‘how many operations does it take to convert ‘fi’ to an empty string (ε) ?’ The answer is 2 (delete ‘f’, delete ‘i’). So we put 2 there.

ε f i v e
ε  0  1  2
f
o
u
r

We can fill the first row in a similar manner and then we get the following.

ε f i v e
ε  0 1  2  3  4
f
o
u
r

Next to fill the first cell in the second row we have to think ‘how many operations does it take to convert an empty string ‘ε‘ to ‘f’ ?’ The answer is 1 (add ‘f’). So we put 1 there.

ε f i v e
ε 0 1 2 3 4
f 1
o
u
r

Next to fill the second cell in the second row we have to think ‘how many operations does it take to convert ‘f’ to ‘f’ ?’ The answer is 0. So we put 0 there.

ε f i v e
ε 0 1 2 3 4
f 1 0
o
u
r

similarly we can keep on going filling.

ε f i v e
ε  0 1 2 3 4
f  1  0 1 2 3
o 2 1 1  x
u
r

One more example, so how do we fill the above ‘x’ ?  We have to think ‘how many operations does it take to convert ‘fiv’ to ‘fo’ ?’ The answer is 2 (replace ‘i’ with ‘o’. delete ‘v’). So we put 2 there.

ε f i v e
ε  0 1 2 3 4
f 1 0 1 2 3
o 2 1 1 2
u
r

Finally we end up with something like this.

ε f i v e
ε 0 1 2 3 4
f 1 0 1 2 3
o  2 1 1 2 3
u  3 2 2 2 3
r  4 3 3 3 3

The ‘Minimum Edit Distance’ (Levenshtein distance) is the value of the last cell.

So we get the same answer 3. For the example I took a small word, but when dealing with longer words this grid method is much useful in making sure we got the minimum number of operations.

Okay so we now know the grid method, but it takes time right ? What if there is a cheat method? a short cut to fill this table ? There is. We will look in to it in the next part, and we will finish up with the python code for this calculation in the article after that.

Exploring NLP 05 – Minimum Edit Distance 3/4

 

Exploring NLP 03 -Minimum Edit Distance 1/4

Even though I was planning on having a nltk tutorial as the 3rd article, I came across the ‘Levenshtein distance ‘ or simply know as the ‘Minimum edit distance‘.  I thought I would share some ideas on that first. Before going to writing codes lets try to see in what scenario do we need this.

Imagine a word processing tool we are using, say Microsoft Word, when we type some words with incorrect spellings it usually underlines it in red or something and then give us the suggestions right ? So how did it do that ?

These use a basic technique where it calculate the similarities of our word with the most frequent similar words with correct spellings. (of cause a tool like Microsoft Word is using much more complex algorithms)

For example if I type ‘supehero’, the engine will ask me something like ‘did you mean superhero’. So what it does is it sees that the ‘supehero’ is not a valid English word, therefore it tries to find out words which are having similar letters, say superhero, superman, supergirl,  etc. Then it calculate Levenshtein distance to each word and then return the word with the least distance. nlp1

Usually depending on the algorithm and the use case the tool will suggest us one word or few word. In the image it suggested one word. So lets see how it does that.

Basically the ‘minimum edit distance’ is the number of operations that we have to do on a particular word in order to transform it to a different word. Okay, first question is ‘what are the allowed operations ?’

There are 3 operations that we can choose from.

  • add a letter
  • delete a letter
  • substitute a letter

Lets see few example.

Operation Word Before Word After Comment
Add a letter ello hello Add ‘h’
Delete a letter heallo hello Delete ‘a’
Substitute a letter helbo hello Replace ‘b’ with ‘l’

In each of the previous examples we can see that we are able to apply a single operation and get the desired word. So in the above cases the minimum edit distance is 1.

Lets see another example

rabbit habit

The m.e.d. is 2 in this case. How ?  First we have to do the substitute operation for ‘r’ -> ‘h’. Then we have to do the delete operation to delete the unnecessary ‘b’.

(One might say it can be done in three operations as well. Add a ‘h’, remove the ‘r’ and remove the ‘b’. Yes that is true. We can do it in various ways. But the Levenshtein distance is the MINIMUM number of operations. Therefore we have to find the least number of operations that we need)

Lets see another example

elephant elegant Distance = 2

(replace ‘p’ with ‘g’. Add ‘h’)

When we look at it there are many ways of doing the transformation. For simple few letter words finding the minimum number of operations are easy. But for longer words like above, how can we be sure that we have found the minimum distance, that there is no other solution which will be able to do this in lesser number of steps ? We will see that in the next article since that will take a bit more explaining than this.

Exploring NLP 04 – Minimum Edit Distance 2/4

Understanding Data 01 – Why is it important ?

Machine learning is not all about mathematical models and algorithms. In fact it can be described as a combination of following three fields.

  1. Data
  2. Feature Engineering
  3. Model

Each of these play a vital role in developing an efficient and effective machine learning system. Lets see why.

Imagine the model as a smart student. And we give him a book to study (in this case the book represents the data+feature). If the content of the book is wrong, or if the content of the book is right but is in a different language which the student can’t understand then even though the student is smart, he will have a hard time learning.

Similarly even if we have a mathematical model which is super smart, if our data and features (how we represent that data to the model) is wrong then the algorithm will most probably fail.

Same goes the other way as well, if the book is correct but the student is stupid then also he will not learn properly. And also even if we have correct data, and a smart algorithm, if our representation of that data (how we feed that data to the algorithm) is wrong then still the system will fail. Therefore to have an efficient as well as an effective machine learning system we have to pay attention to all three above aspects.

Now one might be wondering why do we need the feature engineering (what ever the hell that is) ?   Well we will discuss feature engineering in detail in the coming articles. But for now lets see why.

On the get go we might think okay, I have data, I have the model, why not feed the data directly ?

Well in a handful of special situations we can do that. But in a real world situation things are bit messy. Among the things that can happen here are few;

  1. some parts of data can be missing
  2. some parts of data might be corrupted
  3. some of the data might be duplicated
  4. etc.

And also we have to consider the amount of data. Say we are building a machine learning predictor system for a YouTube like channel, where it can predict what a particular user will be interested in. In such a case we will have Peta bytes of data or more (trillions of records about each video that was clicked by users from all around the world.) Such a large volume of data will include details such as, say, user’s middle name, which is in the data but has almost nothing to do with what he will click when he is on YouTube. Such information are useless in the perspective of the machine learning algorithm. If we feed it such data then it might try to find out relationships between the videos watched and the middle name (which is more or less a stupid thing to do). And also having such huge mount of data, an algorithms might take weeks even months to learn even in hardware accelerated computers.

Thus feature engineering is necessary in order to filter out the necessary parts from the data and then to feed it to the algorithm. This will make the system efficient and also much more accurate.

So we will step by step look in to the aspects of data and feature engineering in the coming articles.

Next on : Understanding Data 02 – Attributes and Values

Baby steps towards machine learning 02 – Supervised, Unsupervised and Reinforcement learning

When facing with a problem which needs a machine learning solution one of the first things that we should do (after analyzing the data and understanding the required output of the final model) is to decide which type of learning approach we need to take. This is about deciding whether we are going with a supervised, unsupervised or reinforcement learning approach.

Okayyy, but I don’t know what these approaches mean‘, one might say. So lets take a simple look at what these are and also we will take our future cats and dogs identifying robot with us for this. (If you are unfamiliar with who this robot is, its an example we took in our previous article ‘Baby steps towards machine learning 01 – An Introduction‘.  So the idea is that we have a robot who’s task is to learn how to identify between cats and dogs when we show it pictures.)

Okay so the most widely used approach is ‘supervised learning’. The name itself gives us a clue. The learning of the robot is under supervision. Its like having a teacher. A teacher who teaches the robot first about how cats look like and how dogs look like. So by observing what the teacher is showing the robot will recognize patterns and learn. (remember? in machine learning we search for patterns).

What happens is that we give the robot, say 50,000 images of cats and dogs. And each image is named as cat or dog. So the robot takes these pictures one by one and study how the picture looks and what is the name (‘label’) that picture has. As it looks through each image and its name, it begins to see patterns. ‘okay, the images which are having white colored small animals are mostly named cats. So that’s a pattern i should remember‘.

Like that it goes through the images over and over again and comes to a good understanding of the patterns that differentiate cats from dogs. We provided it with ‘labeled data’ (images with proper names). So we supervised its learning the same way a teacher would guide a student. This is why its called ‘supervised’ learning. Also see that we didn’t write any rules, the robot by its own analyzes the images and finds patterns. Therefore it is ‘supervised machine learning’.

Then there is ‘unsupervised learning’. Like before the name gives us a clue on the what is going on. If the previous one was supervised and that meant having a guidance, then this one is learning without guidance.

What happens is like before we give the robot the set of images to study. But this time there is no name tag (‘unlabeled’) with each images. So it will observe the images and see patterns and put the similar patterned images in to similar groups. Its like giving a Lego set to a child and asking him to put the same colored Legos in to same group. Ones he has done it we give him a new Lego and asks him to which group does the Lego belong to. So the child will look at the pattern (in this case the pattern is the color) and tell this belongs to this group. The robot will do the same thing when asked to learn unsupervised. Since there are no ‘labels’ the robot doesn’t have a concept of what the words cat or dog means. In fact in the robot’s world there are no such words. So it will study the images and put similar patterned images to groups. And after learning when we give it a image of a cat what it will do is it will say ‘okay this picture you gave me have lots of resemble to that group of images, so i am going to predict that this images belongs with those‘. A good example of this is search by image engines where we can give an image to a model and get all the similar or related images to that (Google search by image).

Why use unsupervised learning when it can’t say if its a cat or a dog (only thing it can do is say that this is probably belong to this group or that). There are lots of reasons. One is in applications such as ‘google search by image’ this is exactly what we need. We don’t need the image’s name, we need to find out to which group the image is closely related to and get more images from that group. And also the ‘labeled data’ is very expensive. Imagine 50,000 pictures of cats and dogs; who is going to rename all these images saying ‘cat-1’, ‘cat-2’… ‘dog-25637’. Usually we either have to label these ourselves or pay others to do it. Both of which are not feasible most of the time. But the world is filled with ‘unlabeled data’. Therefore ‘unsupervised learning’ has huge potential in coming years.

Okay, two down, one to go. The last type we will be discussing will be ‘reinforcement learning’. (I must admit this is where my interest lies).

In both previous types we gave the model (in our case the robot) a learning period. But with ‘reinforcement learning’ from day one we are showing it images and asking it ‘tell me its name‘. Based on its answer we give it positive or negative reinforcement (rewards or punishment). On the very first day I show it a image of a dog and it answers (since it doesn’t have even a slightest clue on whats going on, it will just make a guess). Say it said ‘dog’. So I reinforce it with ‘good robot’.

Then I show it a picture of a cat. It remembers what happened last time, when it said ‘dog’ it got positive feedback. So it thinks ‘ah saying dog is good‘. It says ‘dog’ this time too. And then since its wrong I reinforce it with a negative feedback. I say ‘bad robot’. Now the robot wonders ‘what the hell, this guy said that that picture was a dog, now this guys says that this picture is a cat, I must find out a pattern which differentiate two. Then I will know how to answer correctly and also get good robot feed back

So bit by bit with trial and error the robot learns patterns in the pictures which separate cats from dogs. With time it becomes very good at answering correctly.

In my opinion ‘reinforcement learning’ might be the most interesting approach when it comes to using machine learning for robotics while other two methods are highly effective in data analysis and related fields.

Here we are at the end of the second step. We will go through supervised learning algorithms and try to develop simple python codes to try out our algorithms in the next few steps. Next step ‘Baby steps towards machine learning 03 – Nearest Neighbor Algorithm