Re-enable tweaked congestion control

2024-11-29 20:29:01 +01:00 · 2020-03-30 19:10:20 +02:00 · 2020-03-30 19:10:20 +02:00 · 9ff55f6deb
commit 9ff55f6deb
parent 2252eb1a20
10 changed files with 385 additions and 271 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,48 @@
+# Daniil Gentili's submission to the VoIP contest 3
+
+In this submission, I created a new and improved protocol that unifies all flags, lengths that were previously sparsed around the protocol into one single data payload.
+
+This allows saving around 6 bytes for each data packet, and enormously reduces complexity of existing code, splitting business logic and (legacy) presentation/transport details into separate classes.
+All signaling packets (including init and initAck) are now unified into extra signaling fields. 
+A TL-B scheme for the new protocol can be viewed in `schemeNew.tlb` (compared to the old protocol in `scheme.tlb`).  
+
+Constructor serialization and parsing is implemented using a manually-written polymorphic approach, inspired by Telegram's C++ applications (tdesktop, tdlib).  
+My implementation of polymorphic TL serialization and parsing can be viewed in `controller/protocol/protocol/*` and `controller/protocol/packets/PacketStructs.*`.  
+
+**The new libtgvoip is 100% backwards compatibile with previous versions, keeping compatibility even with layer 8 and below, while enormously reducing complexity, encapsulating all legacy behaviour implementation details into a single `Legacy.cpp` file.**  
+
+Among other features, all packet data buffers are moved around as `unique_ptr`s instead of being copied: previously, feeding a packet to the OPUS decoder through the jitter required **two** copies, and allocation of a slot in a buffer pool; now, a single `unique_ptr` (generated when initially parsing the packet) is moved around, instead (`protocol/Protocol.cpp`).  
+Similarly efficient move-logic is used for outgoing audio buffers (EC & main), network socket payloads, and reliable packet resending (see below).
+
+
+The new protocol also allows for (backwards-compatible) transport multiplexing for streams, simply by making use of the packet seq (in newer protocols) or the audio PTS (in older protocols), without using additional fields.  
+
+This allows `AudioPacketSender.cpp` to reliably determine whether a certain packet was received by the other end or not, and resend only the missing audio pieces, especially in the case of Error Correction (EC) packets generated by the secondary low-bitrate opus encoder: only packets that are known to be lost (as of now) are re-sent, by using a Mask array type instead of a basic Array, skipping audio pieces that were already acknowledged for the current stream by the other endpoint.
+
+I've also kept the reliable packet resending for the main audio stream, by further building onto and refactoring the reliable packet logic, which was now merged with the extraSignaling reliability logic (`Reliable.cpp`).
+
+I've fixed a few nasty jitter buffer bugs:
+- Fixed a bug where the jitter buffer would wrongly reset in presence of EC packets (5ba6e99882bdc045ae64a3b4f65039910677edf0; for latest commit JitterBuffer.cpp line 142-151, wrongly interpreted EC packets as late main packets)
+- Fixed a minor bug where the jitter buffer would backtrack to an invalid timestamp when trying to fill gaps in transmission (457576e93faf8e99c49f936685d5c1582ca221bc; for latest commit JitterBuffer.cpp line 314, delay not multiplied by OPUS step)
+
+Among other changes, the data structure representing incoming and outgoing streams was split into several polymorphic data structures and classes (`protocol/packets/PacketStructs.h`, `protocol/Stream.h`) to save memory and properly separate business logic from datastructures.
+
+
+The commit history will be available @ https://gitlab.com/danog/libtgvoip as usual.
+
+Please note that commit 2252eb1a2056dc63ec09e4f1acc1c7bfa6fec9e4 contains an initial final version that suitably fixes one specific issues with audio quality: in case of sudden but short packet loss bursts (like when switching from WiFi to mobile network), libtgvoip would **usually** enable redundant EC, and most importantly **decrease the bitrate of both main OPUS stream to the absolute minimum**.  
+This causes hearable and persistent audio quality issues even if packet loss is reduced right afterwards (since it takes a while for the congestion controller to raise the audio quality again).  
+In that commit, I **completely disable manual OPUS bandwidth regulation**, instead relying on EC packets to fill in the gap during the short period of downtime: this yielded **excellent results** in my tests (I used a real-life device, switching constantly from wifi to weak 4g, with only very brief switches to EC hearable in the audio stream, automatically switching back to the high-quality 20kbps stream right afterwards).  
+The same results were confirmed by the `tgvoip-test-suite`: after re-enabling congestion control in the **next commit**, albeit with some fixes to account for packet resending: this also yielded good results even in low-bandwidth conditions.
+
+If I have time until the end of the submissions, I'll refactor the jitter buffer to avoid excessive switching between the main and EC audio streams in case of packet loss in low-bandwidth conditions.
+
+In the meantime, just wanted to say that I really enjoyed working on this contest: as usual, I have many more ideas for improvements, but for now, I'm pretty satisfied of the result.
+
+Thank you.
+
+---
+
 # Daniil Gentili's submission to the VoIP contest 2

 My submission consists of a major refactor (to C++) of the existing libtgvoip library.
--- a/controller/audio/AudioPacketSender.h
+++ b/controller/audio/AudioPacketSender.h
@ -41,6 +41,11 @@ public:
        return extraEcLevel;
    }

+    inline const double getResendCount() const
+    {
+        return resendCount;
+    }
+
    inline void setShittyInternetMode(bool shittyInternetMode)
    {
        this->shittyInternetMode = shittyInternetMode;
--- a/controller/audio/OpusDecoder.cpp
+++ b/controller/audio/OpusDecoder.cpp
@ -7,10 +7,10 @@
 #include "controller/audio/OpusDecoder.h"
 #include "audio/Resampler.h"
 #include "tools/logging.h"
+#include <algorithm>
 #include <cassert>
 #include <cmath>
 #include <cstdlib>
-#include <algorithm>

 #if defined HAVE_CONFIG_H || defined TGVOIP_USE_INSTALLED_OPUS
 #include <opus/opus.h>
@ -54,9 +54,9 @@ void tgvoip::OpusDecoder::Initialize(bool isAsync, bool needEC)
    else
        ecDec = NULL;
 #ifdef ANDROID
-	buffer = reinterpret_cast<unsigned char*>(std::malloc(8192));
+    buffer = reinterpret_cast<unsigned char *>(std::malloc(8192));
 #else
-	buffer = reinterpret_cast<unsigned char*>(std::aligned_alloc(2, 8192));
+    buffer = reinterpret_cast<unsigned char *>(std::aligned_alloc(2, 8192));
 #endif
    lastDecoded = NULL;
    outputBufferSize = 0;
@ -241,7 +241,7 @@ int tgvoip::OpusDecoder::DecodeNextFrame()
    if (!ptr)
    {
        fec = true;
-		ptr = jitterBuffer->HandleOutput( false, playbackDuration, isEC);
+        ptr = jitterBuffer->HandleOutput(false, playbackDuration, isEC);
        /*if (len)
 			LOGV("Trying FEC...");*/
    }
@ -256,14 +256,16 @@ int tgvoip::OpusDecoder::DecodeNextFrame()
            // It turns out the waveforms generated by the PLC feature are also great to help smooth out the
            // otherwise audible transition between the frames from different decoders. Those are basically an extrapolation
            // of the previous successfully decoded data -- which is exactly what we need here.
-            size = opus_decode(prevWasEC ? ecDec : dec, NULL, 0, reinterpret_cast<opus_int16*>(nextBuffer), packetsPerFrame * 960, 0);
-            if (size) {
-                int16_t* plcSamples = reinterpret_cast<int16_t*>(nextBuffer);
-                int16_t* samples = reinterpret_cast<int16_t*>(decodeBuffer);
+            size = opus_decode(prevWasEC ? ecDec : dec, NULL, 0, reinterpret_cast<opus_int16 *>(nextBuffer), packetsPerFrame * 960, 0);
+            if (size)
+            {
+                int16_t *plcSamples = reinterpret_cast<int16_t *>(nextBuffer);
+                int16_t *samples = reinterpret_cast<int16_t *>(decodeBuffer);
                constexpr float coeffs[] = {0.999802f, 0.995062f, 0.984031f, 0.966778f, 0.943413f, 0.914084f, 0.878975f, 0.838309f, 0.792344f, 0.741368f,
                                            0.685706f, 0.625708f, 0.561754f, 0.494249f, 0.423619f, 0.350311f, 0.274788f, 0.197527f, 0.119018f, 0.039757f};
                // But why 20 samples? Typically the sample size is 60....
-                for (int i = 0; i < 20; i++) {
+                for (int i = 0; i < 20; i++)
+                {
                    samples[i] = static_cast<int16_t>(round(plcSamples[i] * coeffs[i] + samples[i] * (1.f - coeffs[i])));
                }
            }
--- a/controller/audio/OpusEncoder.cpp
+++ b/controller/audio/OpusEncoder.cpp
@ -47,8 +47,9 @@ tgvoip::OpusEncoder::OpusEncoder(const std::shared_ptr<MediaStreamItf> &source,
 	opus_encoder_ctl(enc, OPUS_SET_INBAND_FEC(1));
 	opus_encoder_ctl(enc, OPUS_SET_SIGNAL(OPUS_SIGNAL_VOICE));
 	opus_encoder_ctl(enc, OPUS_SET_BANDWIDTH(OPUS_AUTO));
+	opus_encoder_ctl(enc, OPUS_SET_BITRATE(OPUS_AUTO));
 	requestedBitrate = 20000;
-	currentBitrate = 0;
+	currentBitrate = 20000;
 	running = false;
 	echoCanceller = NULL;
 	complexity = 10;
@ -106,7 +107,7 @@ void tgvoip::OpusEncoder::Stop()
 void tgvoip::OpusEncoder::SetBitrate(uint32_t bitrate)
 {
 	LOGE("=============== SETTING BITRATE TO %u ===============", bitrate);
-	requestedBitrate = 20000;
+	requestedBitrate = bitrate;
 }

 void tgvoip::OpusEncoder::Encode(int16_t *data, size_t len)
@ -117,6 +118,7 @@ void tgvoip::OpusEncoder::Encode(int16_t *data, size_t len)
 		currentBitrate = requestedBitrate;
 		LOGV("opus_encoder: setting bitrate to %u", currentBitrate);
 	}
+
 	if (levelMeter)
 		levelMeter->Update(data, len);
 	if (secondaryEncoderEnabled != wasSecondaryEncoderEnabled)
--- a/controller/controller/Init.cpp
+++ b/controller/controller/Init.cpp
@ -34,6 +34,8 @@ VoIPController::VoIPController() : rawSendQueue(64)
    maxUnsentStreamPackets = ServerConfig::GetSharedInstance()->GetUInt("max_unsent_stream_packets", 2);
    unackNopThreshold = ServerConfig::GetSharedInstance()->GetUInt("unack_nop_threshold", 10);

+    initAudioBitrate = 20000;
+    
    //audioBitrateStepDecr /= 2;
    auto aStm = std::make_shared<OutgoingAudioStream>();
    aStm->codec = Codec::Opus;
--- a/controller/net/CongestionControl.cpp
+++ b/controller/net/CongestionControl.cpp
@ -18,9 +18,7 @@ using namespace tgvoip;
 CongestionControlPacket::CongestionControlPacket(uint32_t _seq, uint8_t _streamId) : seq(_seq), streamId(_streamId){};
 CongestionControlPacket::CongestionControlPacket(const Packet &pkt) : seq(pkt.seq), streamId(pkt.streamId){};

-CongestionControl::CongestionControl() : cwnd(static_cast<size_t>(ServerConfig::GetSharedInstance()->GetInt("audio_congestion_window", 1024))),
-                                         max(cwnd * 1.1),
-                                         min(cwnd * 0.9)
+CongestionControl::CongestionControl() : cwnd(static_cast<size_t>(ServerConfig::GetSharedInstance()->GetInt("audio_congestion_window", 1024)))
 {
 }

@ -145,23 +143,59 @@ void CongestionControl::Tick()
    inflightHistory.Add(inflightDataSize);
 }

-int CongestionControl::GetBandwidthControlAction()
+CongestionControl::Action CongestionControl::GetBandwidthControlAction(int netMode, double multiply)
 {
    if (VoIPController::GetCurrentTime() - lastActionTime < 1)
-        return TGVOIP_CONCTL_ACT_NONE;
+        return None;

+    Action action;
    size_t inflightAvg = GetInflightDataSize();
+    size_t max = (cwnd * multiply) * 1.1;
+    size_t min = (cwnd * multiply) * 0.9;
+    LOGW("inflightAvg=%lu, max=%lu, min=%lu", inflightAvg, max, min);
    if (inflightAvg < min)
    {
-        lastActionTime = VoIPController::GetCurrentTime();
-        return TGVOIP_CONCTL_ACT_INCREASE;
+        action = Increase;
    }
-    if (inflightAvg > max)
+    else if (inflightAvg > max)
+    {
+        action = Decrease;
+    }
+    else
+    {
+        action = None;
+    }
+
+    uint8_t actAfter = 3;
+    if (netMode < NET_TYPE_LTE)
+    {
+        actAfter--;
+    }
+    if (netMode <= NET_TYPE_GPRS)
+    {
+        actAfter--;
+    }
+    if (netMode <= NET_TYPE_EDGE)
+    {
+        actAfter--;
+    }
+    if (action == lastAction)
+    {
+        LOGE("Would act on %hhu, current tries %hhu, act after %hhu", action, lastActionCount, actAfter);
+        if (lastActionCount++ == actAfter)
        {
            lastActionTime = VoIPController::GetCurrentTime();
-        return TGVOIP_CONCTL_ACT_DECREASE;
+            lastActionCount = 0;
+            return action;
        }
-    return TGVOIP_CONCTL_ACT_NONE;
+    }
+    else
+    {
+        lastActionCount = 0;
+        LOGE("Would act on %hhu, current tries %hhu, act after %hhu", action, lastActionCount, actAfter);
+    }
+    lastAction = action;
+    return None;
 }

 uint32_t CongestionControl::GetSendLossCount()
--- a/controller/net/CongestionControl.h
+++ b/controller/net/CongestionControl.h
@ -13,11 +13,8 @@
 #include <map>
 #include <stdlib.h>

-#define TGVOIP_CONCTL_ACT_INCREASE 1
-#define TGVOIP_CONCTL_ACT_DECREASE 2
-#define TGVOIP_CONCTL_ACT_NONE 0
-
 #define TGVOIP_CONCTL_LOST_AFTER 2
+#define TGVOIP_CONCTL_ACT_AFTER 2

 namespace tgvoip
 {
@ -47,6 +44,14 @@ public:
    CongestionControl();
    ~CongestionControl();

+    enum Action : uint8_t
+    {
+        None = 0,
+        Increase = 1,
+        Decrease = 2,
+        Min = 3
+    };
+
    void PacketSent(const CongestionControlPacket &pkt, size_t size);
    void PacketLost(const CongestionControlPacket &pkt);
    void PacketAcknowledged(const CongestionControlPacket &pkt);
@ -57,14 +62,17 @@ public:
    size_t GetCongestionWindow();
    size_t GetAcknowledgedDataSize();
    void Tick();
-    int GetBandwidthControlAction();
+    Action GetBandwidthControlAction(int netMode, double multiply = 1.0);
    uint32_t GetSendLossCount();

+
 private:
    HistoricBuffer<double, 100> rttHistory;
    HistoricBuffer<size_t, 30> inflightHistory;
    std::array<tgvoip_congestionctl_packet_t, 100> inflightPackets{};
    uint32_t lossCount = 0;
+    Action lastAction = None;
+    uint8_t lastActionCount = 0;
    double tmpRtt = 0.0;
    double lastActionTime = 0;
    double lastActionRtt = 0;
@ -75,8 +83,6 @@ private:
    size_t inflightDataSize = 0;

    size_t cwnd;
-    size_t max;
-    size_t min;
 };
 } // namespace tgvoip

--- a/controller/net/JitterBuffer.h
+++ b/controller/net/JitterBuffer.h
@ -70,7 +70,8 @@ private:

    //BufferPool<JITTER_SLOT_SIZE, JITTER_SLOT_COUNT> bufferPool;
    Mutex mutex;
-    uint32_t step;
+    int64_t lastMain = 0;
+
    std::array<jitter_packet_t, JITTER_SLOT_COUNT> slots;
    std::atomic<int64_t> nextFetchTimestamp{0}; // What frame to read next (protected for GetSeqTooLate)
    std::atomic<double> minDelay{6};
--- a/controller/protocol/Bandwidth.cpp
+++ b/controller/protocol/Bandwidth.cpp
@ -2,7 +2,6 @@

 using namespace tgvoip;

-
 #pragma mark - Bandwidth management

 double VoIPController::GetAverageRTT()
@ -117,21 +116,25 @@ void VoIPController::UpdateAudioBitrateLimit()
        if (dataSavingMode || dataSavingRequestedByPeer)
        {
            maxBitrate = maxAudioBitrateSaving;
+            LOGE("=============== SETTING BITRATE FROM BANDWIDTH CONTROLLER: saving %u ===============", initAudioBitrateSaving);
            encoder->SetBitrate(initAudioBitrateSaving);
        }
        else if (networkType == NET_TYPE_GPRS)
        {
            maxBitrate = maxAudioBitrateGPRS;
+            LOGE("=============== SETTING BITRATE FROM BANDWIDTH CONTROLLER: GPRS %u ===============", initAudioBitrateGPRS);
            encoder->SetBitrate(initAudioBitrateGPRS);
        }
        else if (networkType == NET_TYPE_EDGE)
        {
            maxBitrate = maxAudioBitrateEDGE;
+            LOGE("=============== SETTING BITRATE FROM BANDWIDTH CONTROLLER: EDGE %u ===============", initAudioBitrateEDGE);
            encoder->SetBitrate(initAudioBitrateEDGE);
        }
        else
        {
            maxBitrate = maxAudioBitrate;
+            LOGE("=============== SETTING BITRATE FROM BANDWIDTH CONTROLLER: NORMAL %u ===============", initAudioBitrate);
            encoder->SetBitrate(initAudioBitrate);
        }
        encoder->SetVadMode(dataSavingMode || dataSavingRequestedByPeer);
--- a/controller/protocol/Tick.cpp
+++ b/controller/protocol/Tick.cpp
@ -3,7 +3,6 @@

 using namespace tgvoip;

-
 void VoIPController::SetState(int state)
 {
    this->state = state;
@ -276,24 +275,39 @@ void VoIPController::UpdateAudioBitrate()
            SetState(STATE_FAILED);
        }

-        int act = conctl.GetBandwidthControlAction();
-        if (dynamic_cast<AudioPacketSender *>(GetStreamByType<OutgoingAudioStream>()->packetSender.get())->getShittyInternetMode())
+        auto *sender = dynamic_cast<AudioPacketSender *>(GetStreamByType<OutgoingAudioStream>()->packetSender.get());
+        double multiple = 1.0;
+        double resendCount = sender->getResendCount();
+        uint8_t ecLevel = sender->getExtraEcLevel();
+        if (ecLevel)
+        {
+            multiple *= ecLevel / 2;
+        }
+        if (resendCount)
+        {
+            multiple *= resendCount;
+        }
+
+        auto act = conctl.GetBandwidthControlAction(sender->getShittyInternetMode(), multiple);
+        if (act == CongestionControl::Min)
        {
            //encoder->SetBitrate(8000);
        }
-        else if (act == TGVOIP_CONCTL_ACT_DECREASE)
+        else if (act == CongestionControl::Decrease)
        {
-            LOGE("==== DECREASING BITRATE ======");
+            LOGE("=============== DECREASING BITRATE ===============");
            uint32_t bitrate = encoder->GetBitrate();
            if (bitrate > 8000)
                encoder->SetBitrate(bitrate < (minAudioBitrate + audioBitrateStepDecr) ? minAudioBitrate : (bitrate - audioBitrateStepDecr));
        }
-        else if (act == TGVOIP_CONCTL_ACT_INCREASE)
+        else if (act == CongestionControl::Increase)
        {
+            LOGE("=============== INCREASING BITRATE ===============");
            uint32_t bitrate = encoder->GetBitrate();
            if (bitrate < maxBitrate)
                encoder->SetBitrate(bitrate + audioBitrateStepIncr);
        }
+        LOGE("========= CURRENT BITRATE=%u =========", encoder->GetBitrate())

        if (state == STATE_ESTABLISHED && time - lastRecvPacketTime >= reconnectingTimeout)
        {