New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio #32
Comments
Some comments from the Gitter channel on the above questions:
|
OK, after a few days of thinking I've come up with a pretty decent API design, I guess. Interfacestype Streamer interface {
Stream(samples []float32) (n int, err error)
}
type Player interface {
Play(Streamer) Handle
}
type Handle interface {
Stop()
Time() time.Duration
// maybe a few more methods
} Now, let's describe them in detail. StreamerThe Initially, I though that So, here's an example implementation of a streamer that would produce a simple sine wave: type SineWaveStreamer struct {
position int
}
func (sws *SineWaveStreamer) Stream(samples []float32) (n int, err error) {
for i := range samples {
samples[i] = float32(math.Sin(float64(sws.position + i) / 100))
}
sws.position += len(samples)
return len(samples), nil
} Simple, right? PlayerNow, the Calling the Here's a very primitive example. This type LowerVolumeEffect struct {
s Streamer
}
func (lve *LowerVolumeEffect) Play(s Streamer) Handle {
lve.s = s
return &lveHandle{lve}
}
func (lve *LowerVolumeEffect) Stream(samples []float32) (n int, err error) {
if lve.s == nil {
return len(samples), nil
}
n, err = lve.s.Stream(samples)
for i := range samples[:n] {
samples[i] /= 2 // half volume
}
return n, err
}
type lveHandle struct {
lve *LowerVolumeEffect
}
func (lh *lveHandle) Stop() {
lh.lve.s = nil
}
// Time method omitted for simplicity A more complicated player would do more advanced stuff. For example, a sequencer would append the provided streamer to its list of streamers and would stream those streamers one after another. A mixer would add the provided streamer to the list of streamers and would mix then when streaming. A speaker player would regularly pull new data from the provided streamers and play them through the actual speaker. HandleHandle allows us to control and monitor audio playback. I guess it doesn't require more explanation. Sound filesWe might be tempted to treat a sound file as a streamer, but that would be wrong. Streamer gets drained, a sound file can be played multiple times. So, when playing a sound file, I imagine an API like this: sound := loadSound("clap.mp3")
sequencer.Play(sound.Streamer())
sequencer.Play(sound.Streamer())
sequencer.Play(sound.Streamer())
speaker.Play(sequencer) Each SpeakerSpeaker needs to regularly pull new data from the streamers provided by for !win.Closed() {
// stuff, stuff, stuff
speaker.Update() // pulls new data from streamers and plays them
} ConclusionI think this API is pretty good. I appreciate any comments on it. We need to figure out a few things:
Then we need to split up the work and start working! ;) |
Maybe we could even use |
How |
@hajimehoshi No idea, it's just a placeholder in the code, so far. |
i added an example of recording audio to the oto lib too. its here: ebitengine/oto#8 |
It'd also be possible to get rid of type Streamer interface {
Stream(samples []float64) (n int, err error)
Time() time.Duration
Stop()
}
type Player interface {
Play(Streamer)
} |
How to rewind? I think something like this would work, without chaning any interface API. Arbitrary streamers cannot be rewound anyway. currentStreamer.Stop()
currentStreamer = sound.StreamerAt(time)
player.Play(currentStreamer)
|
So, one thing that was bugging me is that type Streamer interface {
Stream(samples []float64) (n int, err error)
}
type Ctrl struct {
Streamer Streamer
Paused bool
Time time.Duration
}
func (c *Ctrl) Stream(samples []float64) (n int, err error) {
if c.Streamer == nil {
return 0, io.EOF
}
if c.Paused {
return len(samples), nil
}
n, err = c.Streamer.Stream(samples)
c.Time += n * SampleDuration
return n, err
} So, with this, if we want to control the playback of a streamer, we wrap it inside a music := &audio.Ctrl{Streamer: musicFile.Streamer()}
speaker.Play(music) Then, when we want to pause the playback, we just set the music.Paused = true When we want to check the current playing time, we just check the fmt.Printf("Currently at: %v\n", music.Time) To fully stop the music playback, we set the streamer to nil: music.Streamer = nil |
Sounds nice, but the length of a given byte array to |
@hajimehoshi That is true, but think about when possibly could this become unsynchronized. Speaker would always pull only as much data as it needs, so no problem there. Let's think about a speee effect, that would sometimes pull more or less depending on it's speed. But it would still only take as much as it needs. Saving sound to a file. Here, the encoder is free to take arbitrary number of samples. But pausing makes no sense here. So, I think, that once we require that it is necessary to always take only as much samples as one needs, your concerns will only reflect the fact, that pausing only makes sense in real time. Hope this wasn't too confusing. |
Speaker is the user of Streamer? So in fact there is an assumption of the size of an array, right? I think it's ok as long as this is clarified :-) |
The number of bytes must be exactly as much as needed or the delay can be accumulated... |
Yeah, there is assumption, that samples which are requested by |
And now, that |
And btw, |
Ah, so data can be discarded when oto's buffer is overflowed and then delay is reduced. That makes sense. |
Sort of. Not really discarded, but postponed and pushed to |
We had a long and productive discussion which resulted in several modifications and clarifications to the API I proposed earlier. I will sum those up here, plus I'll introduced one little idea of my own which hasn't been discussed yet. Unified sample formatFirst thing we need to understand is that the idea of unified sample format is only affecting samples transferred through streaming (by @rawktron objected, that one single unified sample rate is a very bad idea, because it would usually result in resampling, which can reduce the audio quality. The solution is to introduce a global var SampleRate = 48000 Other than that, the sample format is two channels, StreamerHere's the new form of the type Streamer interface {
Stream(samples [][2]float64) (n int, ok bool)
} Replacing The There are three possible return patterns of the
Regarding the Basic compositorsWe can define several types for composing other streamers in various ways. These types will of course be streamers too, so composition can reach arbitrary depth. Here are the three basic compositors I think are most useful. Since they're so useful, I think we can shorten their names to avoid too much typing. Seqtype Seq []Streamer This type will stream the streamers in the slice one by one with perfect precision (i.e. zero silence between two consecutive streamers). Mixtype Mix []Streamer This type will stream the streamers in the slice simultaneously, adding (with +) samples together and thus mixing the sounds. Schedtype Sched []struct {
After time.Duration
S Streamer
} With |
Ok, guys, time to get some real work done! Here's what needs to be done:
The overall structure can change after discussion. Now, we need to get to work, so, anyone who wants to contribute, please:
|
To start, I am taking the Streamer interface. |
Streamer interface done. |
I'll take the initial Speaker implementation (or playback). Minimum time: probably 30 minutes I'll be starting this tomorrow afternoon, 7/4/2017 (I have the day off) |
I'm taking compositors (Seq, Mix, Sched). |
@alistanis The Speaker API as I imagine it: speaker.Play(streamer) // starts playing streamer, called when starting to play a streamer
speaker.Update() // pulls data from the playing streamer / streamers, called every tick Initially, a speaker can play one streamer at a time. But eventually, we want to be able to play arbitrary amount of streamers through a speaker. The reason for a separate |
Ok, I did Mix, Seq, Ctrl and speaker and I'll leave Sched for later, because implementing it efficiently is non-trivial, plus it's not as important yet. |
I'm taking WAV decoder. This will also serve as a reference API for other decoders. |
@faiface have we determined which decoders we want to support yet? I'm thinking wav/ogg/aiff/mp3 is a decent combination to start with, with mp3 being the most difficult (and also the one I have the most experience with now) Right now I'm dedicating this week to implementing low latency playback for MacOS with AudioHAL in oto, and due to lack of free time I expect that to take up the rest of this week and perhaps some of the weekend, even though I have some C prototypes already working - one plays a wav file and the other plays a sine wave. It's just a lot of C code and the interface isn't super compatible with ours/oto's like ALSA is. Min: 5 days |
The discussion is moving to faiface/beep#1. Hope to see you there ;) |
Audio is a major missing feature in Pixel. The time has come to fix this. This issue serves as a design and implementation discussion place as well as progress reporting place.
Requirements
Here, I will summarize the most important requirements I demand from the implementation of an audio system in Pixel.
Design
Let's define two abstract interfaces with no specific definitions yet:
Examples: A loaded sound file is a wave generator. A speaker is a wave receiver, since it receives the waves and plays them. An audio effect is both receiver and generator: it receives waves, modifies them and generates the modified waves.
Having defined these interfaces, here are a few example of ways the user would chain them together to create the final audio result. Objects sending to the right are generators and objects receiving from the left are receivers.
The "wave generator" and "wave receiver" names can and probably will change.
Implementation
This is not decided yet. Here are a few examples of possible implementation, all of which are a bit problematic.
1.
I don't like this implementation. For a stream of PCM waves, it requires the user to actively feed the speaker with new data each frame. This would be very hard to use, we would probably implement more abstractions on top of this.
2.
The way you would use this implementation is like this:
This way, it'd be easy to create arbitrary chains of generators/receivers, however, questions and problems arise when we ask: How to play multiple sounds together? How to stop the playback? How to play two sounds one after another?
What needs to be done
First, we need to find an actual design that meets all of the requirements. Then we need to implement it.
Low level stuff
For the actual audio playback, we'll use the awesome oto package by @hajimehoshi, which supports audio playback on all major platforms.
The text was updated successfully, but these errors were encountered: