11 May 2023 Articulating Data Symposium
'I shouted at my Google Home Mini..." There is a troubling de-humanising of communication in our interactions with voice assistants, particularly considering the power dynamic implicit in “assistance” and implications for our relationship with humans who also provide it: as simple examples, no longer saying “please” and “thank you” or acknowledging a job well done. In this interactive installation the types of voice recognition and synthesis models used in voice assistants respond to your presence in the room by prompting you to move and speak and then repeating what you say. 
This is not to expose the models’ linguistic failures, but to explore the possibilities of unpredictable success when the power dynamic of servitude is removed and replaced with one based on mimicry, embodiment and entrainment, the foundations of empathetic communication.
For the Articulating Data Symposium I delivered my artist talk as a performance. I later filmed the performance four times and edited together this composite version, selecting the most evocative parts from each performance. There is no overdubbing. Everything voiced by the artificial agent was recorded in real time and was a response to what I said under the reverberant conditions of the room, including feedback from the agent itself. If the agent did not detect a voice after several seconds it was programmed to say one of the following prompts: "come here", "stand by me", "are you listening?", and "what are you thinking?". None of the other uncanny statements it makes were preprogrammed: they are the result of either the voice recognition model "mishearing" or "hallucinating" something from its latent space. It has no soul, no sentience, but that doesn't seem to matter. Despite knowing exactly how it was made, it is difficult not to sense something more in the machine. While this might play into the "magic" narrative sold to us by the corporations who wish to profit from these machine learning models, I think it's more interesting  to consider how we cant help but anthropomorphise this technology and yet we still treat it like crap.
The agent's voice is coming from 16 tactile "bass shakers" under the floor tiles.  It has been trained using reinforcement learning to mimic my movements, vibrating the floor underneath my feet with its voice and altering its pitch as well as an underlying bass tone according to the physical height of my arms and legs as I pace and gesture. During the live talk at the symposium I invited the audience to sit around me so they could also feel the vibrations when it spoke.
Commissioned for the 2023 Articulating Data Symposium:
Back to Top