Is anyone working on audio libraries that will enable streaming audio chunks for...

josevalim · on April 11, 2023

The current pipeline expects PCM audio blobs and, if data is coming from a microphone in the browser, you can do the initial processing and conversion in the browser (see the JS in this single file Phoenix app speech to text example [0]).

On the other hand, if you expect a variety of formats (mp3, wav, etc), then shelling out or embedding ffmpeg is probably the quickest path to achieve something. The Membrane Framework[1] is an option here too which includes streaming. I believe Lars is going to do a cool demo with Membrane and ML at ElixirConf EU next week.

[0]: https://github.com/elixir-nx/bumblebee/blob/main/examples/ph...

[1]: https://membrane.stream/

ricketycricket · on April 11, 2023

> I believe Lars is going to do a cool demo with Membrane and ML at ElixirConf EU next week.

Yes, the relevant part of his demo with the membrane pipeline appears to be here: https://github.com/lawik/lively/blob/master/lib/lively/media...

Dowwie · on April 12, 2023

limited in usefulness.. seems that Lars kept a MembraneTranscript library dependency private

ricketycricket · on April 13, 2023

This one? https://github.com/lawik/membrane_transcription

mark_h · on April 12, 2023

Quick example video from Chris McCord using ffmpeg and whisper in Phoenix: https://www.phoenixframework.org/blog/whisper-speech-to-text...

lawik · on April 12, 2023

Sure.

I have a rough one using Membrane (media framework) that you can find here: https://github.com/lawik/membrane_transcription

I am using it for this talk I am putting together for ElixirConf EU so if you want it used in context that might be helpful: https://github.com/lawik/lively

Neither is release-worthy levels of polish but if interest is there I should make a proper library out of it.

That is to say streaming chunks works great already. I would love two things. Stitching the edges of the chunks, would probably need to do overlapping for that. And building chunks based on silence. That's more DSP than I know though.

Dowwie · on April 12, 2023

Hey Lars! Building chunks on silence is a worthy cause! Why stitch the edges of the chunks? Is that because there isn't a clean chunk on silence?

I think this work is very important. I don't understand whether I actually needed to install the library dependencies for Membrane's sake or specifically for this use case (mad, ffmpeg, portaudio). Doesn't feel right..

brentjanderson · on April 11, 2023

You may be able to incorporate the [Membrane Framework](https://membrane.stream/) to do that. Built in Elixir, deals in those types of multimedia problems.

I'm not an expert here, but I'd expect that capturing a sample using Membrane and piping it into Whisper should be doable.