G-Vision 2006
G-Vision is a vision-based gesture and motion recognition software toolset developed for real-time analysis within interactive installation and performance scenarios. (Historic note: this was a software solution developed from 2005-2007 prior to the release of Kinect in June 2009).
​
One of the greatest challenges facing artists and performers involved with the production of ‘real-time’, interactive installations, performances and digital artifacts is the creation of suitable and meaningful interfaces between the ‘user’ and the artwork. In the majority of cases the interaction experience of ‘point and click’ can be inappropriate, disappointing or devalue the conceptual and visual integrity of the artwork. The G-Vision project developed flexible software that harnessed developments in computer vision for use by artists in a wide range of installation and performance scenarios.
​
The project started in November 2005 and the first beta prototype phase was completed in December 2006 with user trials undertaken from March 2007 prior to its intended release.
General Description
Effective, low-cost, scalable, real-time interfaces have long been a “holy grail” within the interactive arts sector. Although there have been significant technological developments, the applications that exist have always been somewhat deficient in their implementation. Whilst some specially developed computer vision systems and software are available to artists, they tend to be either based on low-level visual analysis (e.g. frame differencing, colour thresholding) and consequently crude and lacking robustness, or employ tracking and recognition software tailored for a specific project and consequently difficult to configure and limited in application.
G-Vision aims to provide flexible software that harnesses recent developments in computer vision for use by artists in a wide range of installation and performance scenarios.
Installation Description
G-Vision builds on an already established, global software platform, i.e. MAX/MSP/Jitter widely used by artists and performers. G-Vision is a software plug-in providing functionality such as object tracking, gesture recognition and characterization of patterns of motion. It aims to provide a creative, unified, meaningful and ‘natural’ user interface for interactive applications which will also potentially have wider appeal within other application areas and the creative industries.
The project addresses the problems directly concerned with ‘artifact useability’ and the development of a more subtle, adaptable and intelligent interface for augmented interaction and aesthetic experiences, addressing clear gaps in an existing market. G-Vision is a comprehensive set of tools (plug-in) for use with Max/MSP/Jitter software that enables users to create robust motion tracking and gesture recognition applications using live or pre-recorded video input material. It has primarily been designed with the needs of artists, performers and musicians working with interactive installation and performance in mind although its application areas are potentially wide ranging.
G-Vision employs computationally efficient algorithms for visual representation, motion estimation and recognition, focusing on methods that are likely to prove robust and configurable for a range of interactive art application scenarios. These include real-time methods for adaptive background modelling, colour-based object tracking, face localisation, generation of summary motion statistics and gesture recognition.
The G-Vision tools can perform tasks such as tracking the positions of one or more objects moving across a space, recognising gestures and actions, and providing information about movement in a video image such as how circular motions are, or detect abrupt changes in motion. Alongside the core tracking and analysis tools are a set of supporting tools which adaptively extract background areas in an image, and filter out noise and unwanted data. G-Vision currently consists of a set of nineteen 'objects' that deal with filtering, tracking, analysis and recognition.
Technical Description
Development and implementation is based on the established Max/MSP/Jitter software environment distributed and developed by Cycling’74 based in the USA. Max/MSP is a graphical programming environment for music, audio and multimedia and has been in use worldwide for many years by performers, composers and installation artists. It provides a visual toolkit of powerful programming objects.
Jitter is a set of video, matrix, and 3D graphics objects for the Max graphical programming environment with flexible means to generate and manipulate matrix data. Jitter is useful for real-time video processing, custom effects, audio/visual interaction, data visualisation and analysis. Since Jitter and G-Visionare built upon the Max/MSP programming environment, limitations inherent in fixed purpose applications are mitigated and G-Vision becomes an adaptable, scalable set of objects integral to the application software.
The system is cross-platform (PC/Mac) and simply requires video input capability such as a DVcam, web cam or iSight camera. From the outset three types of interaction scenario were identified that could be adapted for the majority of scenarios artists and performers may require. These are (i) close range: hands-based interaction, (ii) medium range: half/full body interaction, and (iii) long range: group/crowd-based interaction.
These scenarios or combinations thereof, encompass, for example, interpretation of tabletop hand gestures and movements, room-based installation and performance spaces, and crowd and group dynamics. By combining such information with the capabilities of the Max/MSP/Jitter software environment, a greater range of possibilities suggested by the user’s imagination become feasible. The system incorporates learning capabilities whereby the user can train the system to recognise specific objects, gestures and movements.