The main software applications have been developed by me. Apple’s “home user DAW” Garageband was used to create samples, and the open source software Audacity was used for analysis and further processing and editing.
4.1. iPhone App
4.1.1. Motion Processor
The “Motion Processor” is responsible for receiving the tracking data and processing it. For each joint it stores a running histogram of 60 frames, i.e. a little more than one second. The following data is calculated:
- Movement directions of the joints
- Current speed
- Boundary box of the current virtual skeleton
- Joints path length
- Boundary boxes of the individual joint paths
- Complexity of the joint paths
- Rotation of the hips and shoulders, and their mean value
- Y-rotation speed
- Eccentric
- Contraction
There are also special classes for the detection of trigger movements like joint velocity or rotational speed. They can observe one or more joints at a time. If they are connected to multiple joints the observers can be configured to use either the average or the maximum values for their logic.
4.1.2. Music Engine
The heart of the audio part is the Music Engine. An essential quality feature should be the responsiveness of the music. Ideally, each note could be influenced by the player’s input. Therefore, an approach of merely varying the volume of a continuous playback (be it “flat” or several layers) is out of the question, because it would sound not very organic if, for example, a cymbal sound is just “cut off” when stopping the rhythm section. Much better is a format in which each note is actually triggered individually with certain parameters. The proven MIDI format offers exactly these capabilities and is therefore used here. I use it to operate virtual samplers, as well as to modulate real-time synthesis and effects.
A piece consists of one or more sections. Each section can contain one or more looping tracks which encapsulate the interactive sound generation. Tracks have at least one audio output; effect tracks also have an audio input for a source signal to work on. Theoretically, each track can contain any number of any of instruments or effects. In practice I usually use only one per track for clarity and modularization.
The timing of all tracks is handled by a step sequencer. To respond to input, tracks can access continuous data from the body tracking model, or in response to MIDI notes. They also use the observer objects mentioned in 4.1.1.

Figure 17. Technical structure of a piece.Image: Weibezahn
4.1.3. Instruments
I use a number of different instruments, which I list here. Furthermore, I use a number of effects which act on the outputs of the instruments. Following is a list of basic instruments which are used in the track classes explained later.
Notes sampler
Sampler which is triggered by MIDI note and selects the appropriate sample file by note name, e.g. „piano_C#2.wav“ for note nr. 37. It also uses velocity from the MIDI data.
Continuous OSC Bank
A bank of multiple oscillators (OSC). It has a configurable harmonic structure, expressed in tuples of frequency offsets and amplitude multipliers. Each OSC can be configured to have a waveform, which is represented by a floating value between 0 and 3. It can interpolate smoothly between triangle (0), square (1), sine (2) and sawtooth (3). The main overall configurations are main frequency and main amplitude.
Triggered OSC Bank
The bank properties are identical to the continuous OSC bank. Additionally, this bank can be triggered and has configurable timings for for attack-decay-sustain-release (ADSR) envelope.
Triggered Noise Bank
Three layers with white-, pink- and brownian noise with configurable specific volumes and one ADSR envelope. Can be gated using configurable high pass and low pass filters.
Triggered String
Imitating the plucking of a string using a the Karplus-Strong-algorithm (built in in AudioKit), is triggered with a note as the fundamental for the “string”.
Arbitrary sounds sampler
Playback of a selection of sounds, triggered, at certain intervals, or looped. Can be used for sound scapes (especially combined with effects), background layer, voice track or more.
When realizing a concrete composition, these base classes are often extended with custom behaviour.
4.1.3. Track Types
I designed several track classes as interfaces between body capture input and musical reaction. They implement the aforementioned interaction methods and can mostly use either samples or live synthesis. Following is an overview of the basic track types.
Variable volume track
This is the most basic interface between motion input and sound output. It simply maps the current velocity to the volume of a continuously played sound, be it a recording or live synthesis.
- Joint velocity
- Velocity-volume mapping
Input:
Configuration:
Triggered note sequence
A sequence of notes and timings. The current note in the sequence is only played if the joint velocity passes a certain threshold. The velocity of the playback is mapped to the input velocity. An exemplary mapping would be from [0.05, 0.25] m/s to [32, 128] MIDI velocity.
- Joint velocity
- Note sequence
- Velocity mapping
Input:
Configuration:
Triggered transposed note sequence
This track inherits the playback behaviour of the “Triggered note sequence” and additionally transposes the notes, using the “Joint to Pitch” method.
- Joint velocity
- Joint y-coordinate
- Note sequence
- Velocity mapping
- Transposition mapping
Input:
Configuration:
Triggered transposed chord sequence
This track inherits the behaviour of the “Triggered transposed note sequence” but uses chords instead of single notes. There is one velocity mapping for each note of the chord. Those mappings are “staggered” from the fundamental upward. This means that the fundamental is more easily triggered than the second note which is still “more responsive“ than the third note (and so on, if more than three notes are used). At maximum input velocity, all notes are played with the same maximum velocity.
- Joint velocity
- Joint y-coordinate
- Chord progression
- Transposition mapping
- Velocities mapping
Input:
Configuration:
Dynamic melody
This track inherits the behaviour from the “Triggered transposed note sequence” but, like the “Triggered transposed chord sequence”, uses a chord progression as a base. At playback timing, the ad-hoc-note selection picks the current note using the “Joint to Pitch” method.
- Joint velocity
- Joint y-coordinate
- Chord progression
- Transposition mapping
- Velocities mapping
Input:
Configuration:
Elastic Dynamic Melody
This track inherits the behaviour from the “Dynamic melody” and adds the parameter of “cadence”. This means that the current note might be triggered after the base timing and fill up the space until the next timing, at 8th or 16th notes. The actual cadence is based on a mapping to the joint velocity and is determined at the base trigger timing.
- Joint velocity
- Joint y-coordinate
- Chord progression
- Transposition mapping
- Velocities mapping
- Cadence mapping
Input:
Configuration:
Dynamic drum machine
This drum machine is basically a combination of instrument tracks with respective individual patterns. The basic interaction method is to trigger the tracks based on a threshold value. Additionally, each sub-track can have several layers, corresponding to different levels of movement intensity. Other, custom interaction methods can be built in, e.g. the selection of the current instrument set based on the raise of the players hands, or body rotation etc.
- Joint velocity
- Note pattern
- Trigger velocity
- Layer selection mapping
Input (per track):
Configuration (per track)
Effect and filter tracks
These tracks do have an audio input as well as an output. Apart from that, there is no basic implementation because the configuration and behaviour of these tracks is very specific with each use case, as the mapping of motion- and effect parameters varies a lot with each effect. What they do have in common is that the parameter mapping is applied continuously. An example would be a reverb track with a virtual room size linked to the volume of the posture bounding box. I will list some concrete examples in section 4.2.
Non-interactive tracks
Static loop
Static loops are just that: a looping sample which starts with a fixed volume. It can fade out and in.
Sample sequence
A looping sequence of arbitrary sample files. The selection of sample files can be randomized and the master volume can be configured at initialization.
When realizing a concrete composition, subclasses of those tracks are often extended with additional behaviours, by adding conditions which are only useful for a specific part of that composition. Below are simplified examples in pseudo code of an instrument track and an effect track connected to it.
MELODY_TRACK
// DATA:
instrument motionDetectorLeftHand
// PROCEDURES:
onTiming(chord, velocity) {
if motionDetectorLeftHand.detectsMotion {
var note = selectNote(chord) velocity = selectVelocity(velocity)
instrument.playNote(note, velocity)
}
}
selectNote(chord) {
if BodyTracking.handLeft.y > BodyTracking.height * 0.8 {
return chord.topNote
} else if BodyTracking.handLeft.y > BodyTracking.height * 0.5 {
return chord.middleNote } else {
return chord.fundamental }
}
selectVelocity(velocity) {
if motionDetectorLeftHand.velocity < 0.1 {
return velocity * 0.5 } else {
return velocity }
}
REVERB_TRACK
// DATA:
input
reverbEffect
// PROCEDURES
udpateReverb() {
reverbEffect.dryWetMix = BodyTracking.playerVolume
}
4.2. Compositions
Introduction: “Pachelbel & You”
To introduce players to the concept carefully, I offer an introductory piece. This approach is inspired by computer game tutorials. Modern games often start at a training level, where players are familiarized with basic movements and (inter)actions. The introduction has three main goals:
- Familiarize the player with the Joint to Pitch mechanics, even if only on a sub-conscious level
- Convey that intensity has relevance
- Evoke a certain playfulness by assuring that there are no rules to follow and no “penalty” system
The piece is based on the chord progression of the well-known piece “Canon in Gigue in D-Major” by Johann Pachelbel (“Pachelbels-Canon”): D A b f# G D G A

The piece has only one part so as not to ask too much of the player at once. The idea is to gather a few interactive instruments and let the player try out this little “orchestra”.
This orchestra is comprised of five tracks. The main roles are played by dynamic harmony and melody tracks. They are supported by two lower and simpler note sequences which are inspired by the “basso continuo” of the original composition. To accent the use of space, I added some drums which are triggered by rotational movement. With this ensemble, it is quite possible to already create a dramaturgy, as I show in my interpretation in this video.
I deliberately chose a very well-known piece so that the dynamics are easily noticeable while playing it. It is also very harmonic and melodious. I named this piece “Pachelbel & You” to hint at the role which the player plays in performing the piece. The piece has a length of 32 bars with a 4:4 beat; the tempo is 80 BPM.
Following is an overview of the parts and the tracks used.
Triggerred transposed chord sequence
Joint | Dominant hand |
---|---|
Instrument | Notes sampler: Piano |
Chord sequence | Main progression |
Octave transposition | -1 to +2 |
Elastic dynamic melody
Joint | Secondary hand |
---|---|
Instrument | Notes sampler: Harp |
Chord sequence | Main progression |
Octave transposition | -1 - 2 |
Cadence | 4th note to 16th note |
Triggered transposed note sequence
Joint | Left foot |
---|---|
Instrument | Triggered OSC bank. Amplitudes: (0, 0.4) , Offsets: (0.5, 0.25), Waveform: 2. ADSR: (0.1, 0.1, 0.5, 2) |
Note sequence | d d h h g g g g |
Octave transposition | 0 - 1 |
Triggered transposed note sequence
Joint | Right foot |
---|---|
Instrument | Triggered OSC bank. Amplitudes: (0, 0.4) , Offsets: (0.5, 0.25), Waveform: 2. ADSR: (0.1, 0.1, 0.5, 2) |
Note sequence | d d h h g g g g |
Octave transposition | 1 - 2 |
Dynamic drum machine
Global trigger: | Rotation gesture | |
---|---|---|
Instrument | Trigger/Condition | Pattern (1/4 bar) |
Low orchestral drum | Hands are not raised | · · · x |
Low orchestral drum (2) | Hands are not raised | x x · x |
Orchestral drum | Both hands raised over chest | · · · x |
Orchestral drum (2) | Both hands raised over chest | x x · x |
Nefrin
This is currently the only completed piece which I have composed myself. While Pachelbel & You only consists of one part, Nefrin is subdivided roughly into five parts. The compositional goal is to create a musical narrative whose different moods are interpreted by the player through their movement. This narrative itself describes a kind of journey, during which the protagonist moves from a simple, friendly and uncluttered environment to an increasingly complex, sometimes uncomfortable and challenging world.
There are four chord progressions used in this piece to which I will refer to as “Progression 1, 2, 3, 4”. The note lengths are only their base timing; depending on the track type they are divided into 2, 4 or 8 shorter notes.
Progression 1: D3 F3 C3 C3 D3 F3 A3 A3

Progression 2: D3 C3 A2 A2 G2 C3 D3 A2

Progression 3: D3 C3 B2 B2 A2 C3 G2 G2 A2 C3 D3 C3 G2 C3 E3 E3

Progression 4 (shorter version of Progression 3): D3 C3 B2 B2 A2 C3 G2 G2

The piece has a length of 104 bars with a 4:4 beat; the tempo is 90 BPM (simplified, technically there is actually a small section in 180 BPM in the intro). Following is an overview of the parts and the tracks used.

Figure 18. Overview of the piece. The parts are explained on the next pages.Image: Weibezahn

An overview of the tonal tracks of the composition with their configurations. (Click to enlarge)Image: Weibezahn

An overview of the rhythm tracks of the composition with their configurations. (Click to enlarge)Image: Weibezahn
4.3. User interface
The graphical user interface is deliberately kept simple, with the focus on the song selection. The first time the app is started, players go through an“onboarding”, i.e. an explanation of how to use the software. However, in this case this only means the basic functionality, the space requirements and the setup, not an explanation of the interaction methods.

Figure 19. Onboarding screens provide an introduction.Image: Weibezahn

Figure 21. The song selection UI is kept minimal as well.Image: Weibezahn
Once the user has selected a piece, an instruction appears on how to set up the device. This initial setup process is a critical point. The device has to be leaned against the wall on the floor with the rear camera facing the players. This is not intuitive for users as they are used to looking at the screen and seeing themselves in it when using the camera. Initially I tried to explain the setup with text and an illustration, but it turned out that users ignored this information and failed during setup. Finally I made the textual explanation shorter and clearer and, probably even more important, integrated a video loop that shows the setup.

Figure 20. Left, middle: A looping video helps with the phone setup. Right: At first, I used an illustration for this, but testers could not easily understand the information.Image: Weibezahn
Figure 19. Onboarding screens provide an introduction.Video: Weibezahn
Until the device is in the correct position, music with a waiting loop character is played. Once the device is set up, a voice will ask the user to position themselves at least three steps away from the phone. The app automatically detects when the user has reached this position and the voice prompts them to raise any hand above their head. As soon as this happens, the piece starts.
There is also the option to record a video of the performance. This can then be edited, saved and shared by the user. More than half of the testers have expressed desire for such functionality.
4.4. Testing
I had more than 20 people test the software during the development process. Many, but not all, were students of HfK. The testing took place remotely as well as during physical meetings. The remote testers received a link where they could download the app via Apple’s “Testflight” platform. They could try the app at a time convenient to them and as often as they wanted. There was a button in the interface that led to a feedback questionnaire. Later I started to meet with testers personally. There were several reasons for this: firstly, many of them didn’t have a suitable device, so we used my iPhone. Furthermore I had to find out that only a few of the remote testers actually installed the app and also filled out the questionnaire.
When testing in person, I paid attention to the following standardized procedure in order not to influence the testers as much as possible:
- Briefly explain the general purpose of the software.
- Open the app and hand them the phone. Observe how they read the instructions and manage the setup process.
- Leave the room so as not to influence the performance by my presence. Testers have the option to record their session.
- After the performance, testers fill out the feedback survey.
Afterwards there was time for a personal conversation in which further feedback, ideas and thoughts about the project could be expressed. The feedback was mostly positive.
The piece played during testing was not Nefrin but an intermediary composition I had developed during my research and development process. It still mostly represented the described interaction methods.

Figure 22. Feedback results about the uers.Image: Weibezahn

Figure 23. Feedback about the setup.Image: Weibezahn

Figure 24. Feedback about the interaction.Image: Weibezahn

Figure 25. More feedback about the interaction.Image: Weibezahn

Figure 26. Feedback about the music.Image: Weibezahn

Figure 27. Additional thoughts, comments.Image: Weibezahn