Saturday, August 1, 2009

Game Concept

The goal of the project is to see how gaze and voice can be used to interact in video games. In my last post I went through the game design implications arising from running the user trial. The main implication being that of time constraint. That is, the game needs to be simple enough for participants to get to grips with in order to play it at least twice within 20 minutes.

Obviously while the game needs to be simplistic enough to allow for a quick turn-around of participants, it also needs to be challenging enough so that any data gathered is valid. The game should include as many common gaming tasks as possible, such as, navigation, object selection and so on. I also wanted the game to have a fun, action, adventure, cartoon-like feel.

So with all this in mind, the concept for the game is as follows: the player is trapped in a rabbit warren, inhabited by evil bunny rabbits, from which they must escape. Players will be presented with a labyrinth of tunnels which they must find their way out of, collecting coins along the way, all the while avoid evil rabbits. Once the exit is found, players can only gain freedom (and win the game) depending on the number of coins collected. I have given it the working title "Rabbit Run".

The game is in first-person perspective, which, I think, lends itself well to gaze input. The above screen capture is taken from a demo I am working on at the moment. It shows the tunnels, the player must navigate through and one of the coins which must be collected along the way. Notice too the root in the tunnel, an obstacle which players must jump over. There will also be other roots higher up the tunnel wall, making players crouch to get past. Players will be given a map to guide them through the maze, which shows the areas of the maze traveled through. More on this in a later post, since there will be a subtle difference when using gaze interaction.

For the 3D models I have been getting help from a friend of mine, Rod Flynn, a graphic designer, who runs Who Design. He is interested in learning some more about 3D modelling and animations for games and has very kindly agreed to help out with the models used in the game. His expertise in 3ds Max has accelerated the creation of 3D models and to a higher standard than I could hope to achieve myself.

I leave you with a quick video of navigation through the warren. This particular video is fairly basic with no collision detection even. I will post new videos as I add more features.

Friday, July 31, 2009

User Trial: Implications for Game Design

The next part of the development is of course the game itself. One of the main objectives of the project is to gather useful information from which conclusions can be drawn, such as:
  • How does gaze and voice compare with keyboard and mouse as a means of interaction
  • How enjoyable is the use of gaze and voice as a means of interaction
To gather this information necessitates the running of a user trial. The first objective is (relatively) easy to quantify, by gathering and saving data while participants play the game. However it also means that comparisons need to be made, which in turn means participants will need to play the game using both modes of interaction, that is, play the game using conventional interaction devices and then with gaze and voice.

Another consideration is, if gaze and voice proves not to be a viable form of interaction, which mode of interaction was at fault? The gaze or the voice? So ideally data needs to be gathered to resolve this. This can be done by varying the different combinations of interaction so gaze and keyboard could be compared against mouse and voice.

This would mean that the user trial would be potentially asking participants to play the game 4 different times, with the four different combinations of interaction. This is has two main drawbacks. Firstly players are likely to improve at the game the more they play it. So by the time participants play using the 4th type of interaction, it is difficult to determine if the means of interaction has benefited the player or was it simply due to their skill level improving over time. The second drawback is time, it is unreasonable to ask participant to spend longer than 20 minutes completing such a user trial. Anything longer and they may begin to lose interest skewing results. Another aspect, regarding the duration of the user trial is that users will need to demo the game to get a feel for it. This again takes time away from the experiment itself.

So with this in mind the game needs to be relatively simple so users can get get to grips with it quickly and finish it within a reasonable time frame. Asking participants to play it more than twice would be unreasonable. Given the need for two different comparisons, involving four different types of interaction means that there should be 2 groups of experiments as follows;
  • Keyboard & Mouse versus Gaze & Voice
  • Keyboard & Gaze versus Mouse & Voice
Obviously the former comparison is of more importance and will require more participants. The second is more of a control group to see if there is a significant lacking in either gaze or voice as a means of interaction.

I realise its been over two weeks since my last post, so my apologies for the delay. In my next post I'll talk more about the game concept.

Monday, July 13, 2009

Eye Tracking Menu System

To make the game usable without having to use ones hands, it was decided to update the menu system to allow it to work with gaze and voice. Once the TetClient establishes that a compatible Tobii eye tracker is connected and has been calibrated the user can now use their gaze in conjunction with voice to select menu options, see video below:



Its fairly simple at the moment, the point of gaze is denoted by the simple cross-hairs and menu items can be seleceted using a simple "Select" voice command. Many thanks to Eoghan Cunneen for providing the images for the menu.

Sunday, July 12, 2009

Work Done So Far: Framework

Having chosen XNA as the tool in which to develop the gaze/voice game, the next step was to create a framework for the application. The main goals of the framework were to establish the following systems:
  • Eye tracking system
  • Voice recognition system
  • Input handling system
  • Game data storage system
  • Menu system
  • Asset management system
The Game class is the basis on which game projects are built in XNA and the use of GameComponents and GameServices allows for the efficient organisation of any such project, for more details see the following article. So where possible the above features were implemented in this way.

Eye tracking system: Creates a TetClient object and verifies if a compatible Tobii eye tracker is connected. If so, upon start up of the application, any calibration data present is cleared and a new calibration is initiated. Once calibrated, eye tracking is started. In a similar way to the speech recognition engine, events are fired by the TetClient when new gaze data is available. This data must then be relayed to the application. The gaze data includes the area being gazed at on screen by both the left and right eyes and the distance of both eyes from the screen. This information can be averaged to give a likely area of interest on screen.

Voice recognition system: Creates a speech recognition engine using Microsoft Speech API and attaches a grammar with specific grammar rules to it. These rules can be used to recognise certain words. For the moment these are very simple words such as, "Select", "Right", "Left", "Up", "Down" and so on. By keeping the voice commands as simple and as distinct from one another as possible, it is hoped that that the voice recognition will not have to be calibrated by individual users. Some more complex voice recognition systems often require "training" to improve the speech recognition process. This would be too time consuming to be practical to run user trials. The "recognition" event is fired when a voice command is recognised by the engine. This will be used to inform the game that a player has issued a specific command. The "hypothesis" event is fired when the engine is nearly certain a voice command has been recognized, this event may be used if accurate enough and the "recognition" event proves too slow.

Input handling: Handles all types of input possible in the application including mouse, keyboard, voice recognition and eye tracking. It extends from the XNA GameComponent class, so it's Update method is called each update cycle of the application. This is where the multi-modal input is interpreted and a list of InputActions created. This list of actions can then be accessed by other components in the game during their update cycle, through a public method PollInput, and those components can interpret how to handle these actions on an individual basis. So for example the MenuManager might open a new menu depending on InputActions returned to it from the InputHandler, via the PollInput method.

Game data storage system: A very basic system has been created which stores information in the form of XML files on the machine which the application is running. This will be updated at a later date to record game data which will be of interest during the user trials. Such information might include speed in which game tasks are completed using gaze/voice input versus keyboard/mouse and so on. This system has been implemented as group of static classes, since such XML files will not need to be saved every frame.

Menu system: A basic menu system was implemented. An abstract class Menu, was defined which is derived from the GameComponent class. All menus used in the system inherit from this abstract class. So for example the MainMenu class defines the initial menu and is a subclass of Menu. All menu classes in this basic menu system are managed by a MenuManager class, which basically handles which menu to display at a given time. This is also why Menu derives from GameComponent as opposed to DrawableGameComponent so that the MenuManager can control which menu is drawn.

Asset management system: Assets are a vital part of any game so the AssetManager class, again a GameComponent, was defined. It loads and returns fonts, textures and models required for the game.

All of the above has been implemented and will undoubtedly evolve over the development process.

Thursday, July 9, 2009

Work Done So Far: Investigation

The initial few weeks of the project were spent researching related work, to see how others had used gaze. If you are looking for recent academic papers relating to gaze interaction in computing, the COGAIN Association website is an excellent place to start. They even provide a section dedicated to gaze controlled games.

Various tools were also looked into, in order to see which would be the most appropriate for the project. Two game engines were looked at, Torque 3D and the Unity 3D Engine. Torque does receive much praise, is reasonably priced and is cited in at least one paper investigating gaze in FPS. However rather than a trial version of the engine, Torque provide only a demo, so I was unwilling to commit to purchasing it. Unity do provide a trial version of their software and is very impressive, you could undoubtedly have a game up and running in a very short time using this application. However it is a very high abstracted tool so it is not so clear how easy it might be to integrate the engine with the Tobii SDK or indeed voice recognition. So yet again I was unwilling to commit to this engine.

OGRE (Object-Oriented Graphics Rendering Engine) was also closely examined. It is a scene-oriented, flexible 3D engine, as opposed to a game engine. Ogre is written in C++ and is designed to make it easier and more intuitive to produce applications utilising hardware-accelerated 3D graphics. It is an excellent open source graphics engine with a very active community and was shown to work with both voice recognition and eye tracking by Wilcox et al in 2008. The wealth of resources, active community and proven record of having worked with voice and gaze data in the past seemed to make it the ideal tool to develop the gaze/voice game in.

Then at the start of June Jon Ward of Acuity-Ets made the T60 Tobii Eye Tracker and Tobii SDK available to us. The documentation which came along with the Tobii SDK is very good and makes it very easy to get started working with it. The Tobii Eye Tracker Components API, or TetComp, is an interface containing everything you need to programmatically get real-time high-precision gaze data from Tobii eye tracker hardware. It is a type library implemented as a set of COM objects, making it accessible to many modern high-level program languages for Microsoft 32-bit platforms.

With this in mind I decided to try a quick experiment and try and get the Tobii T60 to work within XNA. XNA is a set of tools with a managed runtime environment provided by Microsoft that facilitates computer game development. The XNA framework is based on the native implementation of the .Net Compact framework 2.0. I had done a lot of work using XNA during the year and have found it very user-friendly allowing for very rapid development of games. I felt that my familiarity with it would eliminate much of the extra time I would need to learn OGRE, so if I could integrate it with Tobii COM API, it would make it the ideal candidate in which to develop my game.

The TetComp interface was indeed easily referenced from within my XNA application and I was quickly able to get the application to start calibrating the eye tracker and gather gaze data from within XNA. Now all that was required was to integrate voice recognition. Having already installed Microsoft Speech SDK, I found that I was able to easily reference it's COM API. So now I had access to both gaze and voice data from within a development framework I was familiar with. I could start on the game proper.



Above is a video of my initial experimentation within XNA showing the tracking of gaze data and use of voice recognition. Its not too fancy but is proof enough that eye tracker works well within the XNA framework.

Tuesday, July 7, 2009

Project Overview

I am a postgraduate student at Trinity College Dublin, currently working towards a MSc in Interactive Entertainment Technology.

For my dissertation, being supervised by Dr. Veronica Sundstedt, I intend to develop a computer game exploring novel functionalities of using eye tracking and voice recognition as an alternative means of video game interaction. Eye tracking is a process that records eye movements allowing us to the determine where an observer’s gaze is fixed at a given time.

Alternative means of interaction in computers are especially important for disabled users for which traditional techniques using mouse and keyboard may not be feasible. By combining voice recognition and eye tracking I hope to provide an alternative hands-free way of interacting with video games.

Jon Ward of Acuity ETS has very kindly lent us a Tobii T60 eye tracker for the duration of the project. The application/game developed is intended to run on the eye tracker and will be implemented using the Tobii SDK and Microsoft Speech SDK (to implement voice recognition).

It is hoped that the game will be controllable using eye tracking and voice recognition and also keyboard and mouse. This is so the two modes of interaction can be compared to one another in a small user study. The user study will attempt to measure quantifiably the accuracy and speed of using eye tracking and voice recognition as opposed to keyboard and mouse. It is also intended to gather more subjective information from participants, such as how enjoyable was the use of eye tracking as an input device.

I have been working on this project for the last month and a half or so, so in my next post I will provide a summary of the work completed thus far. From then on I intend to post my progress on a more regular basis.