Purnima Kamath on Generative Models for Sound Design
Date:
Fri, 09/13/2024 - 10:30am - 12:00pm
Location:
CCRMA Seminar Room
Event Type:
Hearing Seminar Large language models (LLM) such as ChatGPT are making striking changes to how we think about words and intelligence. Generative models (https://developers.google.com/machine-learning/gan/generative) take these ideas a step further by creating new data from a text prompt. Can an LLM and a generative model create new kinds of sounds? It is easy to imagine a system that lets you generate dog sounds, for example. But how would you build a system that lets you ask for a dog sound with a touch of wolf? With steerability or morphing the sound landscapes become much more interesting. Can we control a generative model to make both big and small changes to the sound we generate?
Purnima Kamath will lead a discussion at the next Hearing Seminar on her work to create generative models for sound design.
Who: Purnima Kamath on Generative Models for Sound Design
What: Generative Models for Sound Design
When: Fri, 09/13/2024 - 10:30am - 12:00pm
Where: CCRMA Seminar Room. top floor of the Knoll at Stanford
Why: Can we harness generative models to create something really useful? For example sounds :-)
How would you describe your sound environment as you read this email? Can we expect a deep neural network to reconstruct it? Come to CCRMA to find out more.
Purnima Kamath
Purnima Kamath will lead a discussion at the next Hearing Seminar on her work to create generative models for sound design.
Who: Purnima Kamath on Generative Models for Sound Design
What: Generative Models for Sound Design
When: Fri, 09/13/2024 - 10:30am - 12:00pm
Where: CCRMA Seminar Room. top floor of the Knoll at Stanford
Why: Can we harness generative models to create something really useful? For example sounds :-)
How would you describe your sound environment as you read this email? Can we expect a deep neural network to reconstruct it? Come to CCRMA to find out more.
Purnima Kamath
Generative Models for Sound Design
Abstract
Sound design involves creatively using sounds to build cinematic experiences for films and games. While most AI models support novel sound generation, they need to be trained on large, semantically well-labeled datasets and often lack support for creative pursuits such as audio morphing.
Recording large datasets of environmental sounds for training models is relatively easy, but semantically labeling them is time-consuming and expensive. In this talk, I will introduce two novel ways of designing AI algorithms to support the creative pursuit of sound design. First, I will present a framework for "searching" or "querying" the design space generated by an AI algorithm trained on unlabeled data using synthetic sound examples. I will also discuss ways to enable or induce steerability in the algorithms during generation using these examples. In the second method, I will discuss leveraging existing pre-trained models to explore creative pursuits such as morphing.
The papers referenced during the talk:
Abstract
Sound design involves creatively using sounds to build cinematic experiences for films and games. While most AI models support novel sound generation, they need to be trained on large, semantically well-labeled datasets and often lack support for creative pursuits such as audio morphing.
Recording large datasets of environmental sounds for training models is relatively easy, but semantically labeling them is time-consuming and expensive. In this talk, I will introduce two novel ways of designing AI algorithms to support the creative pursuit of sound design. First, I will present a framework for "searching" or "querying" the design space generated by an AI algorithm trained on unlabeled data using synthetic sound examples. I will also discuss ways to enable or induce steerability in the algorithms during generation using these examples. In the second method, I will discuss leveraging existing pre-trained models to explore creative pursuits such as morphing.
The papers referenced during the talk:
Kamath, P., Gupta, C., Wyse, L., & Nanayakkara, S. (2024). Example-Based Framework for Perceptually Guided Audio Texture Generation. In IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE TASLP), vol. 32, pp. 2555-2565, 2024.
Kamath, P., Gupta, C., & Nanayakkara, S. (2024). MorphFader: Towards Fine-Grained Controllable Morphing With Text-to-Audio Models.
Purnima is a PhD candidate (All But Defense!) at the National University of Singapore (NUS), working with Lonce Wyse, Suranga Nanayakkara, and Kokil Jaidka. She is passionate about bridging the gap between arts and technology by building simplified tools for creativity. She is currently focusing on building steerable generative AI models and creative support tools for sound design. Before her doctoral studies, Purnima worked in the software industry for ~15 years in various engineering roles. You can find more details on her recent publications, software, and art on her website: https://purnimakamath.com/
FREE
Open to the Public