7c0h

The Semantic and Observational models

This article is the fourth of a series in which I explain what my research is about in (I hope) a simple and straightforward manner. For more details, feel free to check the Research section.

Let's continue with our idea of guiding people around like I mentioned in the previous article. It turns out that people usually make mistakes, either because the instruction we gave was confusing, or because they weren't paying attention. How can I prevent those mistakes?

For my first research project at the University of Potsdam, we designed a system that took two things into account: how clear an instruction was, and what did the player do after hearing it. Let's focus on those points.

For the first part, which we called the Semantic Model, a system tries to guess what will the user understand after hearing an instruction. If the instruction says "open the door", and there's only one door nearby, then you'll probably open that one. But what if I tell you "press the red button" and there are two red buttons? Which one will you press? In this case, the model tells us "this instruction is confusing, so I don't know what the user will do", and we can use that to make a better instruction.

For the second part, which we called the Observational Model, a second system tries to guess what are your intentions based on what you are doing now. For instance, if you are walking towards the door with your arm extended, then there's a good chance you are going to open that door. Similarly, if you were walking towards a button, but then you stopped, looked around and walked away, then I'm sure you wanted at first to press that button but changed your mind.

When we put both models together, they are pretty good at guessing what you are trying to do: when the first one says "I'm sure you'll press one of the red buttons" and the second one says "I'm sure you'll press either this blue button or that red one", we combine them both and get "We are sure you'll press that red button". Even though neither of them were absolutely sure about what you'd do, together they can deduct the right answer.

Each system takes into account different clues to make their guess. The semantic model pays attention mostly to what the instruction says: did I mention a color? Is there any direction, such as "in front of"? Did I mention just one thing or several? And which buttons were visible when you heard the instruction? The other model, on the other hand, takes into account what you are doing: how fast you are moving, in which direction, which buttons are getting closer, and which ones are you ignoring, among others.

Something that both models like to consider is which buttons were more likely to call your attention, either because you looked at them for a long time or because one of them is more interesting. But there's a catch: computer's don't have eyes! They don't know what you are really looking at, right? Finding a way of solving this problem is what my next article will be about.