Decoding Opus Packets

  • 1
  • Question
  • Updated 1 week ago
I am successfully receiving Opus audio packets both using C#, and latterly javascript via a websocket, but I am unable as yet to decode them to produce audio. I know from earlier posts that Opus packets of around 64 (32 bit) 'samples' are sent in each packet and also that the Opus stream is 2 channels 'sampled' at 24kHz but how do I decode them? If anybody has working C# code (probably with an external library) perhaps they could point me in the right direction?

I have even tried a javascript library without success. The Opus packets appear to be simply data packets without any form of packaging, eg ogg, and so even if I dump incoming data to a file and then attempt to play it back using vlc or audacity using ffmpeg I still cannot get any audio out.

As an aside I have successfully produced audio using a DAX channel and simply javascript player but of course there is no compression and the data rate is much higher.

Any pointers gratefully received of course.

Cliff, G4PZK
Photo of Cliff - G4PZK

Cliff - G4PZK

  • 30 Posts
  • 10 Reply Likes

Posted 3 weeks ago

  • 1
Photo of Doug - K3TZR

Doug - K3TZR

  • 112 Posts
  • 15 Reply Likes
Cliff,

The library to decode Opus can be found here:

https://opus-codec.org/downloads/

If you'd like to read some code (Swift for the Mac), take a look at the OpusDecode.swift file in my xSDR6000 project (https://github.com/DougPA/xSDR6000).

The received packets are bytes and can be a variable number. Each packet of bytes decodes into 10 milliseconds of two channel audio at a sample rate of 24,000 samples per second.

There is a small "data flow" diagram at the top of the OpusDecode.swift file and there are many comments in the file. Just reading those may give you a better idea of the process.

The OpusDecode.swift file uses a "framework" (Apple for a dynamic library) that provides the actual decoding mechanism. That framework was made by me from the files on the Opus web site. You can see the project that builds that framework here:

https://github.com/DougPA/OpusOSX

I'd be happy to answer any questions.
Photo of Cliff - G4PZK

Cliff - G4PZK

  • 30 Posts
  • 10 Reply Likes

Thanks very much for this Doug, I'll take a look. Sounds like you nailed it for the MAC a while ago. I'm not a MAC user myself but any heads up at this point will be very useful.

Thanks also for you offer of help, I may take you up on that after some reading!

Cliff, G4PZK
Photo of Cliff - G4PZK

Cliff - G4PZK

  • 30 Posts
  • 10 Reply Likes
Hi Doug (and/or anyone else who may be interested/able to help)

I've made some progress decoding the Opus packets and now better understand the process. I wonder if I could just check a few things with you as you've successfully managed to make it work?
Some of the things are fairly obvious but if you'll indulge me I'll note them down anyway.

1. The OPUS packets are of class 0x8005

2. The Opus packets are around 64 bytes long after subtracting the initial 28 bytes from the UDP packet (ignoring the UDP IP header stuff). They can and do vary in length as the audio changes.

3. The actual payload length is determined by the formula  4 x ((256 x data[2]) + data[3]) where data is the received UDP data payload byte array.

4. The opus packet starts at byte 28 of the data payload.

5. The first byte of each OPUS packet is always 0xd4, the 'd' meaning each frame is of type CELT-only, the bandwidth is SWB (super wide band, 12kHz, 24kHz sample rate) with each frame representing 10ms of time and the 4 means that the frames are stereo audio with precisely one audio packet in each frame.

6. The Opus decoder is therefore created passing parameters 24000 for sample rate and 2 for the number of channels.

7. From other correspondence I am decoding with function opus_decode_float which yields an array of 480 x 32 bit floats per opus packet. Each float is 4 bytes long being 32 bit. The floats are interleaved for the left and right channels. This seems to me to be correct as at a sample rate of 24kHz (24000 samples per second) a period of 10ms is equivalent to 24000/100 or 240 and with two channels that gives 480 samples.


I realise that you may not be able to assist in the next bit but I'll document what I've tried in case you have any suggestions.

As you know I do not use any Apple equipment and hence I've had to find Opus codec software from a variety of other places. I have tried Opus wrappers written in C# and also opus wrappers written in javascript. Some if not all of these libraries come complete with an example Visual Studio project or an online web page to allow a Microphone->Opus_Encode->Opus_Decode->Speaker test example. In pretty much all cases the example works well. I have concentrated latterly on using javascript for my decoding and have managed to decode packets without obvious errors from the decoding libraries. To eliminate complications due to actually playing the audio in a web page (although my player does work fine) I have been decoding 1000 Opus packets to provide a 480000 sample array which I've then written to a disk file. I then import the resulting file into Audacity and select 32 bit float, 24kHz, 2 channels. The data imports and plays after a fashion and this is where I am banging my head against a wall at the moment. The audio pans left and right when using the relevant 'audio client 0 slice 0 pan 0' tcp command as expected but it has a pronounced warble. I get exactly the same sound if I don't save the file but do feed it to the javascript audio player by the way so it's not simply the file saving process as far as I can tell.

Thanks for reading if you've got this far and I wonder if you can provide any insight or suggest what I might check out. I'm trying to avoid having to port the reference libopus code since there are already a number of javascript implementations out there.

If any Flex programmer is reading this I wonder what you use in SmartSDR (1.x in my case) to decode the audio? I couldn't see a decoder in the FlexLib library when I looked.

Cliff, G4PZK


Photo of Eric - KE5DTO

Eric - KE5DTO, Official Rep

  • 718 Posts
  • 211 Reply Likes
Take a look at POpus
Photo of Doug - K3TZR

Doug - K3TZR

  • 112 Posts
  • 15 Reply Likes
Cliff,

You've been busy, it took me a lot longer to figure out most of this.

In my experience (on the Mac) audio is a very difficult topic. Our modern operating systems are not "real-time" but unfortunately audio is. Your ear can hear very subtle errors in the audio. When the audio hardware asks your code for the next audio segment it has to respond in a very short time otherwise the "stitching" together of the audio segments isn't correct and you hear some sort of distortion.

My implementation still occasionally gets distorted (it's a small detail I've chosen to ignore while I work on more pressing issues). Originally I had that problem almost all of the time. To combat it, I inserted a ring-buffer (code that is supplied by Apple and was very carefully written to minimize the possibility of delays) between the incoming audio capture and the code that plays the audio, that way any processing delays don't affect the quality of the sound being reproduced. The trade off is that the buffer introduces latency.

I'm guessing that your code that captures the audio is somehow introducing the distortion. One thing you might want to try is to generate a pure sine tone (in code or some other way) and feed it to the Opus encoder and then feed its output to the Opus decoder. If it comes out garbled you can look at the data (painful I know) and see the discontinuities. I made a number of such "test" programs before I was able to make my implementation work (mostly) reliably.

When I first did this I had a bug that was ignoring one byte of the incoming Opus. In spite of that I could hear the audio but it had an annoying buzz to it.

I put one of my experiments on GitHub (https://github.com/DougPA/AVAudioEngine_Opus). It takes microphone audio, Opus encodes it, then Opus decodes it and writes it to file. It uses an audio player (that is a part of the Mac OS) to play it but you might get some ideas from it.

I wish I knew more about your development environment but it's taken me a few years to learn the Mac environment so I'm not about to start that again. Let me know how you do and don't hesitate to send questions (my direct email is douglas.adams@me.com).

Good luck
Photo of Cliff - G4PZK

Cliff - G4PZK

  • 30 Posts
  • 10 Reply Likes
Doug

Thanks for the reply - it's much appreciated.

I originally enabled a DAX channel and sent that audio (unencoded of course) over the same web socket link and played that back via javascript and I agree that a receive buffer system is most definitely needed. For the record it all worked very well on a local LAN connection but the amount of data was rather high of course. At least the sound was linear as far as I could tell, certainly listening via the SmartSDR app compared favourably with listening via my web browser solution. The conclusion was that (probably) I'm correctly receiving and passing the data from the Flex Radio to the raspberry pi web server and then on to the client browser, mainly Chrome but also Firefox. The same should hopefully apply to the OPUS packets but I shall investigate that more.

Anyway to your comments:

Interesting about the annoying buzz you had at first, I did wonder if I was somehow missing a byte or two. I say this because the OPUS UDP packets are quite often longer by one or two bytes than the payload indicated by the 4 x ((256 x data[2]) + data[3]) value. I just put that down to the UDP packets not being precise. Interestingly when I use wireshark and run the SmartSDR application I get the same effect so it's not just my software. I will investigate further and may add my own checksum to the UDP packets from the raspberry pi to the browser. A websocket is a tcp connection so it shouldn't be needed but it may help to confirm that the packets are complete. I would have thought that the opus decode process would complain if the data packets weren't kosher.

Your comments have given me two ideas to proceed. Firstly as per your suggestion I shall find a way to generate a 1kHz sinewave that is 24k, stereo OPUS encoded and see how that plays back via my code. Secondly I shall use wireshark to capture an OPUS stream when using the SmartSDR application. The coding for the latter might be tedious but the result will hopefully be definitive.

I'll certainly be happy to keep you in the loop. I have javascript code to display the spectrum and also a waterfall by the way so after I crack the pesky OPUS the whole thing should be quite useful!


Cliff, G4PZK
Photo of Doug - K3TZR

Doug - K3TZR

  • 112 Posts
  • 15 Reply Likes
Cliff,

Here's a snippet from my Vita decode function ( public class func decodeFrom(data: Data) -> Vita? ) ), the entire routine can be found here: 

https://github.com/DougPA/xLib6000/blob/master/xLib6000/Networking/Vita.swift:


// calculate the payload size (bytes)
// NOTE: The data payload size is NOT necessarily a multiple of 4 bytes (it can be any number of bytes)

    vita.payloadSize = data.count - vita.headerSize - (vita.trailerPresent ? kTrailerSize : 0)


The "//NOTE:" is a comment I wrote to myself about the number of bytes in a Vita payload. Although the Vita standard says that you always receive multiples of 4 bytes, apparently that isn't the way Flex used the Vita container for Opus. Opus is byte-centric so I guess that makes sense.

In this calculation data.count is the total number of bytes received from UDP. The others are just what they sound like (in bytes). The point being that the Vita payload can be any number of bytes (somewhere around 65 seems typical).

You mentioned Waterfall, I found a novel algorithm for producing one. It's demonstrated in another test project on GitHub:

https://github.com/DougPA/Waterfall-no-blit

Basically the top line is drawn into a texture and then two triangles covering the entire waterfall area are drawn using the texture. The effect is that the waterfall line(s) move down as new lines are received and the cpu / gpu usage is low.

Photo of Cliff - G4PZK

Cliff - G4PZK

  • 30 Posts
  • 10 Reply Likes
Doug

The number of bytes in the packet was indeed the problem!

I was feeding the number of bytes indicated by the contents of 4 x ((256 x data[2])+data[3]) into the decoder and ignoring any 'additional' trailing bytes that were in the actual UDP payload. When I feed ALL the bytes into the decoder it decodes sweetly.

It totally makes a mockery of the packet count indication in the 'specification' of course as there's no way to know that all the bytes in a UDP packet have been correctly received since UDP is not guaranteed delivery.
And of course the byte count is not so useful now. Anyway at least I have it sounding decent and the bandwidth required has reduced significantly.

I presume that the TX stream works in the same fashion, I will try that at some point after I fix the other code I wrote first so I can listen live again rather than via test files. I note also that the Opus specification requires that all packets are received and in the correct order otherwise the decode function should be called with a null pointer and 0 byte length for the missing packet.

Interesting about your waterfall, mine is written in HTML5 on a canvas and is 5 or 7 level colour - I forget which. It looks quite pretty and is very fast. I will however take a look at your code as a no blit solution will be smoother I think.

I was tempted to have a mini rant to Flex about lack of documentation but I'm getting too old for that these days and they and I doubtless have better things to do.

Thank you so much for your help and support!

Cliff, G4PZK

Photo of Doug - K3TZR

Doug - K3TZR

  • 112 Posts
  • 15 Reply Likes
Congratulations Cliff! All of this stuff is complicated.

Yes, in the case of Opus, the Vita packet count is somewhat misleading (and by my reading in violation of the spec). I'm not sure what the Vita authors intended for byte data.

I'm a retired software developer and one of the issues I have is that I no longer have a group of developers, each with a slightly different skill set, that I can go to when I'm stumped. Now I have to figure out everything myself. The web helps but it's not like an in-person q&a with an expert. I spent months learning enough Metal (Apple's replacement for OpenGL) in order to code the Panadapter and Waterfall displays. In the past I would have handed that to my graphics guy.

Same thing with audio... and on and on.

Right now I'm working on a bunch of UI issues so I'm learning (and struggling) with autolayout (Apple's approach to UI layout).

As another ham (Chen W7AY) and fellow software junky once told me the destination isn't the object, the journey is.
Photo of James Whiteway

James Whiteway

  • 905 Posts
  • 222 Reply Likes
Very interesting thread! Thanks for posting it here for all to follow along. I haven't really messed with audio decoding yet in my program. I am still trying to make my waterfall work properly.
Being an over the road trucker gives little time for coding!
James
WD5GWY
Photo of Cliff - G4PZK

Cliff - G4PZK

  • 30 Posts
  • 10 Reply Likes
Hi James,

Glad you found it interesting, programing the Flex is becoming an obsession for me now. The waterfall part of the coding is always tricky as it's quite complex in many ways.

Good luck when you get time to get it going.

Cliff, G4PZK
Photo of Cliff - G4PZK

Cliff - G4PZK

  • 30 Posts
  • 10 Reply Likes
Hi Doug

The funny thing about this stuff is that it is complicated until you understand it, then suddenly it's no big deal. This is where a complete specification and documentation comes in handy. Following things semi blind and having to 'guess' the problem can get quite irritating after a while, a bit like not being able to find your favourite hammer or soldering iron, the job is trivial when you find it but you are frustrated until you do!

I have to admire your fortitude fighting that Apple stuff, I looked at the swift framework and I consider it anything but 'easy' to understand, despite Apple's protestations. Apple have always done their own thing in their own way and have had massive success doing it. In the end it's about what you are comfortable with.

I am also retired and although I was trained as an Elec. Eng. a lot of my work was web based in the latter years. I've been fortunate to largely do what I enjoy using a variety of technologies over the years. I still think the only proper programming language is C and I'll include javascript in that statement for web applications but of course things have developed hugely from the early days. I imagine you went through a whole heap of programming languages during your career much the same as I did. The advent of Internet based client/server apps is the way to go for me, it means you get portability for minimum effort.

Now I have the audio working I'm reworking and modularising some of my javascript code and last night I rewrote bits of the interface daemon that runs on the PI as I noticed that the Flex doesn't always respond to a TCP command message first time. I'm not sure if that's documented anywhere but I've implemented a retry mechanism with handshake in any case.

When I started to learn C I was given one piece of advice - "C doesn't take any prisoners" - I'm not sure the Flex API does either at times.

Good chatting Doug.

Cliff, G4PZK

Photo of Doug - K3TZR

Doug - K3TZR

  • 112 Posts
  • 14 Reply Likes
Cliff,

I agree that "it's complicated until you understand it", the problem for me is that after I understand it I forget it. It seems that once I finish something (e.g. Opus audio), a few months later while working on something else I need to go back to it and have to relearn a bit. It's one reason I've become a fanatical commenter, the comments are mostly for my own benefit.

It would be nice to have a thorough set of documentation to refer to. Since Flex produces the Radio (i.e firmware), the API (FlexLib) and the client (SmartSDR) they don't really need to share documentation with the rest of us. It's only as a courtesy to us that they provide the source for FlexLib. I sympathize with their efforts to document things (e.g. the Wiki). I've tried similar approaches and come away believing the code and the comments are the only documentation that will remain current.

I have done a large variety of languages starting at ones and zeros, then assembler .... I started my Apple project in Objective-C but switched whole-heartedly to Swift. I find Swift to be very powerful and relatively easy. I really like the Apple development environment (Xcode). What I like about Swift (different from C) is that one of its core objectives is to detect errors at compile time as opposed to run time.

I'm not aware of the problem you describe with Flex and TCP. As far as I know I have never failed to receive a reply to a TCP command. I think I'll add some code to verify that.