Having chosen XNA as the tool in which to develop the gaze/voice game, the next step was to create a framework for the application. The main goals of the framework were to establish the following systems:
- Eye tracking system
- Voice recognition system
- Input handling system
- Game data storage system
- Menu system
- Asset management system
The Game class is the basis on which game projects are built in XNA and the use of GameComponents and GameServices allows for the efficient organisation of any such project, for more details see the following article. So where possible the above features were implemented in this way.
Eye tracking system: Creates a TetClient object and verifies if a compatible Tobii eye tracker is connected. If so, upon start up of the application, any calibration data present is cleared and a new calibration is initiated. Once calibrated, eye tracking is started. In a similar way to the speech recognition engine, events are fired by the TetClient when new gaze data is available. This data must then be relayed to the application. The gaze data includes the area being gazed at on screen by both the left and right eyes and the distance of both eyes from the screen. This information can be averaged to give a likely area of interest on screen.
Voice recognition system: Creates a speech recognition engine using Microsoft Speech API and attaches a grammar with specific grammar rules to it. These rules can be used to recognise certain words. For the moment these are very simple words such as, "Select", "Right", "Left", "Up", "Down" and so on. By keeping the voice commands as simple and as distinct from one another as possible, it is hoped that that the voice recognition will not have to be calibrated by individual users. Some more complex voice recognition systems often require "training" to improve the speech recognition process. This would be too time consuming to be practical to run user trials. The "recognition" event is fired when a voice command is recognised by the engine. This will be used to inform the game that a player has issued a specific command. The "hypothesis" event is fired when the engine is nearly certain a voice command has been recognized, this event may be used if accurate enough and the "recognition" event proves too slow.
Input handling: Handles all types of input possible in the application including mouse, keyboard, voice recognition and eye tracking. It extends from the XNA GameComponent class, so it's Update method is called each update cycle of the application. This is where the multi-modal input is interpreted and a list of InputActions created. This list of actions can then be accessed by other components in the game during their update cycle, through a public method PollInput, and those components can interpret how to handle these actions on an individual basis. So for example the MenuManager might open a new menu depending on InputActions returned to it from the InputHandler, via the PollInput method.
Game data storage system: A very basic system has been created which stores information in the form of XML files on the machine which the application is running. This will be updated at a later date to record game data which will be of interest during the user trials. Such information might include speed in which game tasks are completed using gaze/voice input versus keyboard/mouse and so on. This system has been implemented as group of static classes, since such XML files will not need to be saved every frame.
Menu system: A basic menu system was implemented. An abstract class Menu, was defined which is derived from the GameComponent class. All menus used in the system inherit from this abstract class. So for example the MainMenu class defines the initial menu and is a subclass of Menu. All menu classes in this basic menu system are managed by a MenuManager class, which basically handles which menu to display at a given time. This is also why Menu derives from GameComponent as opposed to DrawableGameComponent so that the MenuManager can control which menu is drawn.
Asset management system: Assets are a vital part of any game so the AssetManager class, again a GameComponent, was defined. It loads and returns fonts, textures and models required for the game.
All of the above has been implemented and will undoubtedly evolve over the development process.
Eye tracking system: Creates a TetClient object and verifies if a compatible Tobii eye tracker is connected. If so, upon start up of the application, any calibration data present is cleared and a new calibration is initiated. Once calibrated, eye tracking is started. In a similar way to the speech recognition engine, events are fired by the TetClient when new gaze data is available. This data must then be relayed to the application. The gaze data includes the area being gazed at on screen by both the left and right eyes and the distance of both eyes from the screen. This information can be averaged to give a likely area of interest on screen.
Voice recognition system: Creates a speech recognition engine using Microsoft Speech API and attaches a grammar with specific grammar rules to it. These rules can be used to recognise certain words. For the moment these are very simple words such as, "Select", "Right", "Left", "Up", "Down" and so on. By keeping the voice commands as simple and as distinct from one another as possible, it is hoped that that the voice recognition will not have to be calibrated by individual users. Some more complex voice recognition systems often require "training" to improve the speech recognition process. This would be too time consuming to be practical to run user trials. The "recognition" event is fired when a voice command is recognised by the engine. This will be used to inform the game that a player has issued a specific command. The "hypothesis" event is fired when the engine is nearly certain a voice command has been recognized, this event may be used if accurate enough and the "recognition" event proves too slow.
Input handling: Handles all types of input possible in the application including mouse, keyboard, voice recognition and eye tracking. It extends from the XNA GameComponent class, so it's Update method is called each update cycle of the application. This is where the multi-modal input is interpreted and a list of InputActions created. This list of actions can then be accessed by other components in the game during their update cycle, through a public method PollInput, and those components can interpret how to handle these actions on an individual basis. So for example the MenuManager might open a new menu depending on InputActions returned to it from the InputHandler, via the PollInput method.
Game data storage system: A very basic system has been created which stores information in the form of XML files on the machine which the application is running. This will be updated at a later date to record game data which will be of interest during the user trials. Such information might include speed in which game tasks are completed using gaze/voice input versus keyboard/mouse and so on. This system has been implemented as group of static classes, since such XML files will not need to be saved every frame.
Menu system: A basic menu system was implemented. An abstract class Menu, was defined which is derived from the GameComponent class. All menus used in the system inherit from this abstract class. So for example the MainMenu class defines the initial menu and is a subclass of Menu. All menu classes in this basic menu system are managed by a MenuManager class, which basically handles which menu to display at a given time. This is also why Menu derives from GameComponent as opposed to DrawableGameComponent so that the MenuManager can control which menu is drawn.
Asset management system: Assets are a vital part of any game so the AssetManager class, again a GameComponent, was defined. It loads and returns fonts, textures and models required for the game.
All of the above has been implemented and will undoubtedly evolve over the development process.
No comments:
Post a Comment