This article is the third of a series in which I explain what my research is about in (I hope) a simple and straightforward manner. For more details, feel free to check the Research section.
For my first research paper during my PhD, the basic idea was pretty simple. Imagine that, after recording several hours of people being guided around a room, I realize the following: everytime a player stood in front of a door, and someone told them "go straight", they walked through the door. So now I ask: if you are standing in front of a door, and I want you to walk through it, would it be enough for me to say "go straight", like before? My research team and I wanted to give this question an answer, so this is what we did.
We looked at our recorded data. Whenever we saw a player moving somewhere, we took notes about where the player was, where is the player now, and what was the instruction that convinced the player to move from one place to the other. We then created a big dictionary, where each entry reads "to move the player from point A to point B, say this". Quite smart, right?
The most important part about this idea is that we don't need to teach our computer how to understand language - in fact, when our system reads "turn right" in our dictionary, it has no idea about what "turn" or "right" mean. All our system cares about is that saying "turn right", for some strange reason, causes people to look to the right. This makes our system a lot simpler than other systems that try to understand everything.
Now, let's complicate things a bit: let's say I tell you "walk through the door to your left". You turn left, walk through the door, take 7 steps, give a full turn to look at the room, and then you wait for me to say something else. Which of those things you did because I told you, and which ones you did because you felt like it?
Since we didn't really know the answer, we tried two ideas: in the first case, we decided that everything you did was a reaction to our instruction (including the final turn), while in the second one we only considered the first action (turning left), and nothing else. As you can see, neither approach is truly correct: one is too short, and the other one is too long. But in research we like trying simple ideas first, and we decided to give these two a try.
Our results showed that the second approach works better, because if you advance just one step I can guide you to the next, but if you do too many things at once there's a chance you'll get confused and lost. Also, since our system is repeating what other humans said before, players thought the instructions were not too artificial.
Not bad for my first project, right?