ESP32 VoIP/RTP pager and Scream receiver

Very simple RTP pager (audio receiver) based on ESP32 SoC. See also ESP8266 rtp pager for general description.

Application is using settings stored as JSON file in SPIFFS filesystem. This might be overkill as only WiFi network SSID and password are stored at the moment, but it is easy to extend and both backward and forward compatible if reasonable default are used.

Application is able to receive RTP streams encoded with G.722, G.711a and G.711u. Receiving port if fixed in firmware to 4000. Apart from unicast address, receiver also joins multicast group at address Most desk VoIP phones with "BLF" keys can be used as transmitters. tSIP softphone works also (either wav file or default audio source device can be used with each programmed button):
tSIP multicast streaming

Including WiFi network SSID/password in source code is not practical, thus I would recommend using SmartConfig - application for android. SmartConfig code on ESP32 is started when SSID/password is not set, otherwise is not active.
ESP32 SmartConfig

Application settings can be reset to default values (empty SSID, empty password) by pulling low GPIO27 during startup. Once SSID/password SmartConfig is activated so they can be set again.

Firmware was built with ESP-IDF 3.3. I was also using Code::Blocks for editing code, so there is C::B project, but it can be ignored completely. For loading firmware programming either ESP-IDF can be used (e.g. -p COM4 flash from ESP-IDF command prompt) or ESP32 download tool, as with firmware files and load addresses shown below (or as listed in build/flasher_args.json). For popular ESP32 devkit (I have 30-pin variant) press Start in application, then long press "BOOT" button on PCB.
ESP32 download tool - RTP pager

To compile, change directory to project directory from ESP-IDF command prompt and use: build

Very basic version might use internal DAC:
ESP32 DAC RTP pager

Audio quality from internal DAC might be dissatisfying - noise level is pretty high. It might be compared to AM radio reception.

Firmware using ESP32 internal DAC:

External I2S codec

Let's use proper I2S codec then. These PCM5102A modules cost less than $4:
PCM5102 I2S codec module
There are two "S2RE" LDO regulators on the board - it can be powered from 3.3V or 5V.
PCM5102 can work with or without master clock / system clock (reducing number of connections to bit clock, word strobe clock and data).

  The device starts up expecting an external SCK input, but if BCK and LRCK start
  correctly while SCK remains at ground level for 16 successive LRCK periods,
  then the internal PLL starts, automatically generating an internal SCK from
  the BCK reference.  

SCK can be either tied to the ground manually or with solder jumper on the top side.

PCM5102 I2S codec module
There are four three-state solder jumbers on the bottom side. Default settings seems to be fine. Same signals are available on the goldpin header if anyone would want to change these settings dynamically - soft mute in particular:

PCM5102 I2S codec module

PCM5102 datasheet recommends minimum output load equal to 1kOhm, although there are already two 470Ohm resistors in series with outputs on the board and module seems to be able low-impedance headphones directly with no issues. Audio outputs are also available on 2.54mm header.

ESP32 I2S pins can be remapped creating nice looking 1:1 layout match with PCM5102 module header:

To avoid mistakes I would not solder pin for SCK at all - using solder jumper on top side instead.

Accidentally D2 pin is also connected to LED (blue one on my board), giving visual indication if something else than silence is transmitted to codec.

Two boards could be soldered together using single header, connected with jumper cables, connected with 5 jumpers, but here is something more rare - x8 jumper block from old network card:
ESP32 + PCM5102

Basic (still using 16ksps sampling rate) firmware with I2S output:

Version with larger pre-buffering time (800ms):

Scream virtual sound card

Scream is a virtual sound card emitting samples as uncompressed UDP stream. I have tested version 3.3.
Scream virtual sound card
Scream virtual sound card

Few minor issues with Scream installation and running:

ESP32 is listening on port 4010, joining multicast group address (default Scream configuration) and is expecting stereo 16-bit 44100 sps stream (this was default in my Scream installation).

Hardware: ESP32 + PCM5102 as with RTP pager above.

In my limited testing ESP32 (or WiFi link itself?) seems to be dropping substantial number of packets in this application and might not be acceptable as long-term replacement for real sound card. Most of the time sound quality is acceptable (single packet loss would not be significant) but once jitter buffer gets empty there is short but annoying break.

Packet loss seems to depend on many factors. I've received fairly good results (about 1 buffer underrun event for 4 minutes) with following setup:


 "Cookie monsters": 8155265    Parse time: 0.001 s