1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-15 10:17:20 +00:00

Import an implementation of the CAIA Delay-Gradient (CDG) congestion control

algorithm, which is based on the 2011 v0.1 patch release and described in the
paper "Revisiting TCP Congestion Control using Delay Gradients" by David Hayes
and Grenville Armitage. It is implemented as a kernel module compatible with the
modular congestion control framework.

CDG is a hybrid congestion control algorithm which reacts to both packet loss
and inferred queuing delay. It attempts to operate as a delay-based algorithm
where possible, but utilises heuristics to detect loss-based TCP cross traffic
and will compete effectively as required. CDG is therefore incrementally
deployable and suitable for use on shared networks.

In collaboration with:	David Hayes <david.hayes at ieee.org> and
		Grenville Armitage <garmitage at swin edu au>
MFC after:	4 days
Sponsored by:	Cisco University Research Program and FreeBSD Foundation
This commit is contained in:
Lawrence Stewart 2013-07-02 08:44:56 +00:00
parent b206d4e77f
commit 92a0637f73
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=252504
5 changed files with 862 additions and 1 deletions

View File

@ -81,6 +81,7 @@ MAN= aac.4 \
cardbus.4 \
carp.4 \
cas.4 \
cc_cdg.4 \
cc_chd.4 \
cc_cubic.4 \
cc_hd.4 \

155
share/man/man4/cc_cdg.4 Normal file
View File

@ -0,0 +1,155 @@
.\"
.\" Copyright (c) 2013 Swinburne University of Technology, Melbourne, Australia
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd July 2, 2013
.Dt CC_CDG 4
.Os
.Sh NAME
.Nm cc_cdg
.Nd CDG Congestion Control Algorithm
.Sh DESCRIPTION
CAIA-Delay Gradient (CDG) is a hybrid congestion control algorithm which reacts
to both packet loss and inferred queuing delay.
It attempts to operate as a delay-based algorithm where possible, but utilises
heuristics to detect loss-based TCP cross traffic and will compete effectively
as required.
CDG is therefore incrementally deployable and suitable for use on shared
networks.
.Pp
During delay-based operation, CDG uses a delay-gradient based probabilistic
backoff mechanism, and will also try to infer non congestion related
packet losses and avoid backing off when they occur.
During loss-based operation, CDG essentially reverts to
.Xr cc_newreno 4 Ns - Ns like
behaviour.
.Pp
CDG switches to loss-based operation when it detects that a configurable number
of consecutive delay-based backoffs have had no measurable effect.
It periodically attempts to return to delay-based operation, but will keep
switching back to loss-based operation as required.
.Sh MIB Variables
The algorithm exposes the following variables in the
.Va net.inet.tcp.cc.cdg
branch of the
.Xr sysctl 3
MIB:
.Bl -tag -width ".Va exp_backoff_scale"
.It Va version
Current algorithm/implementation version number.
.It Va beta_delay
Delay-based window decrease factor as a percentage (on delay-based backoff, w =
w * beta_delay / 100).
Default is 70.
.It Va beta_loss
Loss-based window decrease factor as a percentage (on loss-based backoff, w =
w * beta_loss / 100).
Default is 50.
.It Va exp_backoff_scale
Scaling parameter for the probabilistic exponential backoff.
Default is 2.
.It Va smoothing_factor
Number of samples used for moving average smoothing (0 means no smoothing).
Default is 8.
.It Va loss_compete_consec_cong
Number of consecutive delay-gradient based congestion episodes which will
trigger loss-based CC compatibility.
Default is 5.
.It Va loss_compete_hold_backoff
Number of consecutive delay-gradient based congestion episodes to hold the
window backoff for loss-based CC compatibility.
Default is 5.
.It Va alpha_inc
If non-zero, this enables an experimental mode where CDG's window increase
factor (alpha) is increased by 1 MSS every
.Va alpha_inc
RTTs during congestion avoidance mode.
(Setting
.Va alpha_inc
to 1 results in the most aggressive growth of the window increase factor over
time.
Use higher
.Va alpha_inc
values for slower growth.)
Default is 0.
.El
.Sh SEE ALSO
.Xr cc_chd 4 ,
.Xr cc_cubic 4 ,
.Xr cc_hd 4 ,
.Xr cc_htcp 4 ,
.Xr cc_newreno 4 ,
.Xr cc_vegas 4 ,
.Xr h_ertt 4 ,
.Xr mod_cc 4 ,
.Xr tcp 4 ,
.Xr khelp 9 ,
.Xr mod_cc 9
.Rs
.%A "D. A. Hayes"
.%A "G. Armitage"
.%T "Revisiting TCP Congestion Control using Delay Gradients"
.%J "Networking 2011 Proceedings, Part II"
.%D "May 2011"
.%P "328-341"
.Re
.Rs
.%A "N. Khademi"
.%A "G. Armitage"
.%T "Minimising RTT across homogeneous 802.11 WLANs with CAIA Delay-Gradient TCP (v0.1)"
.%R "CAIA Technical Report 121113A"
.%D "November 2012"
.%U "http://caia.swin.edu.au/reports/121113A/CAIA-TR-121113A.pdf"
.Re
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and The Cisco University Research Program Fund, a
corporate advised fund of Silicon Valley Community Foundation.
.Sh HISTORY
The
.Nm
congestion control module first appeared in
.Fx 9.2 .
.Pp
The module was first released in 2011 by David Hayes whilst working on the
NewTCP research project at Swinburne University of Technology's Centre for
Advanced Internet Architectures, Melbourne, Australia.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
congestion control module was written by
.An David Hayes Aq david.hayes@ieee.org .
This manual page was written by
.An Lawrence Stewart Aq lstewart@FreeBSD.org
and
.An Grenville Armitage Aq garmitage@swin.edu.au .
.Sh BUGS
The underlying algorithm and parameter values are still a work in progress and
may not be optimal for some network scenarios.

View File

@ -1,6 +1,7 @@
# $FreeBSD$
SUBDIR= cc_chd \
SUBDIR= cc_cdg \
cc_chd \
cc_cubic \
cc_hd \
cc_htcp \

View File

@ -0,0 +1,9 @@
# $FreeBSD$
.include <bsd.own.mk>
.PATH: ${.CURDIR}/../../../netinet/cc
KMOD= cc_cdg
SRCS= cc_cdg.c
.include <bsd.kmod.mk>

695
sys/netinet/cc/cc_cdg.c Normal file
View File

@ -0,0 +1,695 @@
/*-
* Copyright (c) 2009-2013
* Swinburne University of Technology, Melbourne, Australia
* All rights reserved.
*
* This software was developed at the Centre for Advanced Internet
* Architectures, Swinburne University of Technology, by David Hayes, made
* possible in part by a gift from The Cisco University Research Program Fund,
* a corporate advised fund of Silicon Valley Community Foundation. Development
* and testing were further assisted by a grant from the FreeBSD Foundation.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/*
* CAIA Delay-Gradient (CDG) congestion control algorithm
*
* An implemention of the delay-gradient congestion control algorithm proposed
* in the following paper:
*
* D. A. Hayes and G. Armitage, "Revisiting TCP Congestion Control using Delay
* Gradients", in IFIP Networking, Valencia, Spain, 9-13 May 2011.
*
* Developed as part of the NewTCP research project at Swinburne University of
* Technology's Centre for Advanced Internet Architectures, Melbourne,
* Australia. More details are available at:
* http://caia.swin.edu.au/urp/newtcp/
*/
#include <sys/cdefs.h>
__FBSDID("$FreeBSD$");
#include <sys/param.h>
#include <sys/hhook.h>
#include <sys/kernel.h>
#include <sys/khelp.h>
#include <sys/limits.h>
#include <sys/lock.h>
#include <sys/malloc.h>
#include <sys/module.h>
#include <sys/queue.h>
#include <sys/socket.h>
#include <sys/socketvar.h>
#include <sys/sysctl.h>
#include <sys/systm.h>
#include <net/if.h>
#include <net/vnet.h>
#include <netinet/cc.h>
#include <netinet/tcp_seq.h>
#include <netinet/tcp_timer.h>
#include <netinet/tcp_var.h>
#include <netinet/cc/cc_module.h>
#include <netinet/khelp/h_ertt.h>
#include <vm/uma.h>
#define CDG_VERSION "0.1"
#define CAST_PTR_INT(X) (*((int*)(X)))
#ifndef VIMAGE
#define vnet_sysctl_handle_uint(oidp, arg1, arg2, req) \
sysctl_handle_int(oidp, arg1, arg2, req)
#endif
/* Private delay-gradient induced congestion control signal. */
#define CC_CDG_DELAY 0x01000000
/* NewReno window deflation factor on loss (as a percentage). */
#define RENO_BETA 50
/* Queue states. */
#define CDG_Q_EMPTY 1
#define CDG_Q_RISING 2
#define CDG_Q_FALLING 3
#define CDG_Q_FULL 4
#define CDG_Q_UNKNOWN 9999
/* Number of bit shifts used in probexp lookup table. */
#define EXP_PREC 15
/* Largest gradient represented in probexp lookup table. */
#define MAXGRAD 5
/*
* Delay Precision Enhance - number of bit shifts used for qtrend related
* integer arithmetic precision.
*/
#define D_P_E 7
struct qdiff_sample {
long qdiff;
STAILQ_ENTRY(qdiff_sample) qdiff_lnk;
};
struct cdg {
long max_qtrend;
long min_qtrend;
STAILQ_HEAD(minrtts_head, qdiff_sample) qdiffmin_q;
STAILQ_HEAD(maxrtts_head, qdiff_sample) qdiffmax_q;
long window_incr;
/* rttcount for window increase when in congestion avoidance */
long rtt_count;
/* maximum measured rtt within an rtt period */
int maxrtt_in_rtt;
/* maximum measured rtt within prev rtt period */
int maxrtt_in_prevrtt;
/* minimum measured rtt within an rtt period */
int minrtt_in_rtt;
/* minimum measured rtt within prev rtt period */
int minrtt_in_prevrtt;
/* consecutive congestion episode counter */
uint32_t consec_cong_cnt;
/* when tracking a new reno type loss window */
uint32_t shadow_w;
/* maximum number of samples in the moving average queue */
int sample_q_size;
/* number of samples in the moving average queue */
int num_samples;
/* estimate of the queue state of the path */
int queue_state;
};
/*
* Lookup table for:
* (1 - exp(-x)) << EXP_PREC, where x = [0,MAXGRAD] in 2^-7 increments
*
* Note: probexp[0] is set to 10 (not 0) as a safety for very low increase
* gradients.
*/
static const int probexp[641] = {
10,255,508,759,1008,1255,1501,1744,1985,2225,2463,2698,2932,3165,3395,3624,
3850,4075,4299,4520,4740,4958,5175,5389,5602,5814,6024,6232,6438,6643,6846,
7048,7248,7447,7644,7839,8033,8226,8417,8606,8794,8981,9166,9350,9532,9713,
9892,10070,10247,10422,10596,10769,10940,11110,11278,11445,11611,11776,11939,
12101,12262,12422,12580,12737,12893,13048,13201,13354,13505,13655,13803,13951,
14097,14243,14387,14530,14672,14813,14952,15091,15229,15365,15500,15635,15768,
15900,16032,16162,16291,16419,16547,16673,16798,16922,17046,17168,17289,17410,
17529,17648,17766,17882,17998,18113,18227,18340,18453,18564,18675,18784,18893,
19001,19108,19215,19320,19425,19529,19632,19734,19835,19936,20036,20135,20233,
20331,20427,20523,20619,20713,20807,20900,20993,21084,21175,21265,21355,21444,
21532,21619,21706,21792,21878,21962,22046,22130,22213,22295,22376,22457,22537,
22617,22696,22774,22852,22929,23006,23082,23157,23232,23306,23380,23453,23525,
23597,23669,23739,23810,23879,23949,24017,24085,24153,24220,24286,24352,24418,
24483,24547,24611,24675,24738,24800,24862,24924,24985,25045,25106,25165,25224,
25283,25341,25399,25456,25513,25570,25626,25681,25737,25791,25846,25899,25953,
26006,26059,26111,26163,26214,26265,26316,26366,26416,26465,26514,26563,26611,
26659,26707,26754,26801,26847,26893,26939,26984,27029,27074,27118,27162,27206,
27249,27292,27335,27377,27419,27460,27502,27543,27583,27624,27664,27703,27743,
27782,27821,27859,27897,27935,27973,28010,28047,28084,28121,28157,28193,28228,
28263,28299,28333,28368,28402,28436,28470,28503,28536,28569,28602,28634,28667,
28699,28730,28762,28793,28824,28854,28885,28915,28945,28975,29004,29034,29063,
29092,29120,29149,29177,29205,29232,29260,29287,29314,29341,29368,29394,29421,
29447,29472,29498,29524,29549,29574,29599,29623,29648,29672,29696,29720,29744,
29767,29791,29814,29837,29860,29882,29905,29927,29949,29971,29993,30014,30036,
30057,30078,30099,30120,30141,30161,30181,30201,30221,30241,30261,30280,30300,
30319,30338,30357,30376,30394,30413,30431,30449,30467,30485,30503,30521,30538,
30555,30573,30590,30607,30624,30640,30657,30673,30690,30706,30722,30738,30753,
30769,30785,30800,30815,30831,30846,30861,30876,30890,30905,30919,30934,30948,
30962,30976,30990,31004,31018,31031,31045,31058,31072,31085,31098,31111,31124,
31137,31149,31162,31174,31187,31199,31211,31223,31235,31247,31259,31271,31283,
31294,31306,31317,31328,31339,31351,31362,31373,31383,31394,31405,31416,31426,
31436,31447,31457,31467,31477,31487,31497,31507,31517,31527,31537,31546,31556,
31565,31574,31584,31593,31602,31611,31620,31629,31638,31647,31655,31664,31673,
31681,31690,31698,31706,31715,31723,31731,31739,31747,31755,31763,31771,31778,
31786,31794,31801,31809,31816,31824,31831,31838,31846,31853,31860,31867,31874,
31881,31888,31895,31902,31908,31915,31922,31928,31935,31941,31948,31954,31960,
31967,31973,31979,31985,31991,31997,32003,32009,32015,32021,32027,32033,32038,
32044,32050,32055,32061,32066,32072,32077,32083,32088,32093,32098,32104,32109,
32114,32119,32124,32129,32134,32139,32144,32149,32154,32158,32163,32168,32173,
32177,32182,32186,32191,32195,32200,32204,32209,32213,32217,32222,32226,32230,
32234,32238,32242,32247,32251,32255,32259,32263,32267,32270,32274,32278,32282,
32286,32290,32293,32297,32301,32304,32308,32311,32315,32318,32322,32325,32329,
32332,32336,32339,32342,32346,32349,32352,32356,32359,32362,32365,32368,32371,
32374,32377,32381,32384,32387,32389,32392,32395,32398,32401,32404,32407,32410,
32412,32415,32418,32421,32423,32426,32429,32431,32434,32437,32439,32442,32444,
32447,32449,32452,32454,32457,32459,32461,32464,32466,32469,32471,32473,32476,
32478,32480,32482,32485,32487,32489,32491,32493,32495,32497,32500,32502,32504,
32506,32508,32510,32512,32514,32516,32518,32520,32522,32524,32526,32527,32529,
32531,32533,32535,32537,32538,32540,32542,32544,32545,32547};
static uma_zone_t qdiffsample_zone;
static MALLOC_DEFINE(M_CDG, "cdg data",
"Per connection data required for the CDG congestion control algorithm");
static int ertt_id;
static VNET_DEFINE(uint32_t, cdg_alpha_inc);
static VNET_DEFINE(uint32_t, cdg_beta_delay);
static VNET_DEFINE(uint32_t, cdg_beta_loss);
static VNET_DEFINE(uint32_t, cdg_smoothing_factor);
static VNET_DEFINE(uint32_t, cdg_exp_backoff_scale);
static VNET_DEFINE(uint32_t, cdg_consec_cong);
static VNET_DEFINE(uint32_t, cdg_hold_backoff);
#define V_cdg_alpha_inc VNET(cdg_alpha_inc)
#define V_cdg_beta_delay VNET(cdg_beta_delay)
#define V_cdg_beta_loss VNET(cdg_beta_loss)
#define V_cdg_smoothing_factor VNET(cdg_smoothing_factor)
#define V_cdg_exp_backoff_scale VNET(cdg_exp_backoff_scale)
#define V_cdg_consec_cong VNET(cdg_consec_cong)
#define V_cdg_hold_backoff VNET(cdg_hold_backoff)
/* Function prototypes. */
static int cdg_mod_init(void);
static void cdg_conn_init(struct cc_var *ccv);
static int cdg_cb_init(struct cc_var *ccv);
static void cdg_cb_destroy(struct cc_var *ccv);
static void cdg_cong_signal(struct cc_var *ccv, uint32_t signal_type);
static void cdg_ack_received(struct cc_var *ccv, uint16_t ack_type);
struct cc_algo cdg_cc_algo = {
.name = "cdg",
.mod_init = cdg_mod_init,
.ack_received = cdg_ack_received,
.cb_destroy = cdg_cb_destroy,
.cb_init = cdg_cb_init,
.conn_init = cdg_conn_init,
.cong_signal = cdg_cong_signal
};
/* Vnet created and being initialised. */
static void
cdg_init_vnet(const void *unused __unused)
{
V_cdg_alpha_inc = 0;
V_cdg_beta_delay = 70;
V_cdg_beta_loss = 50;
V_cdg_smoothing_factor = 8;
V_cdg_exp_backoff_scale = 3;
V_cdg_consec_cong = 5;
V_cdg_hold_backoff = 5;
}
static int
cdg_mod_init(void)
{
VNET_ITERATOR_DECL(v);
ertt_id = khelp_get_id("ertt");
if (ertt_id <= 0)
return (EINVAL);
qdiffsample_zone = uma_zcreate("cdg_qdiffsample",
sizeof(struct qdiff_sample), NULL, NULL, NULL, NULL, 0, 0);
VNET_LIST_RLOCK();
VNET_FOREACH(v) {
CURVNET_SET(v);
cdg_init_vnet(NULL);
CURVNET_RESTORE();
}
VNET_LIST_RUNLOCK();
cdg_cc_algo.post_recovery = newreno_cc_algo.post_recovery;
cdg_cc_algo.after_idle = newreno_cc_algo.after_idle;
return (0);
}
static int
cdg_cb_init(struct cc_var *ccv)
{
struct cdg *cdg_data;
cdg_data = malloc(sizeof(struct cdg), M_CDG, M_NOWAIT);
if (cdg_data == NULL)
return (ENOMEM);
cdg_data->shadow_w = 0;
cdg_data->max_qtrend = 0;
cdg_data->min_qtrend = 0;
cdg_data->queue_state = CDG_Q_UNKNOWN;
cdg_data->maxrtt_in_rtt = 0;
cdg_data->maxrtt_in_prevrtt = 0;
cdg_data->minrtt_in_rtt = INT_MAX;
cdg_data->minrtt_in_prevrtt = 0;
cdg_data->window_incr = 0;
cdg_data->rtt_count = 0;
cdg_data->consec_cong_cnt = 0;
cdg_data->sample_q_size = V_cdg_smoothing_factor;
cdg_data->num_samples = 0;
STAILQ_INIT(&cdg_data->qdiffmin_q);
STAILQ_INIT(&cdg_data->qdiffmax_q);
ccv->cc_data = cdg_data;
return (0);
}
static void
cdg_conn_init(struct cc_var *ccv)
{
struct cdg *cdg_data = ccv->cc_data;
/*
* Initialise the shadow_cwnd in case we are competing with loss based
* flows from the start
*/
cdg_data->shadow_w = CCV(ccv, snd_cwnd);
}
static void
cdg_cb_destroy(struct cc_var *ccv)
{
struct cdg *cdg_data;
struct qdiff_sample *qds, *qds_n;
cdg_data = ccv->cc_data;
qds = STAILQ_FIRST(&cdg_data->qdiffmin_q);
while (qds != NULL) {
qds_n = STAILQ_NEXT(qds, qdiff_lnk);
uma_zfree(qdiffsample_zone,qds);
qds = qds_n;
}
qds = STAILQ_FIRST(&cdg_data->qdiffmax_q);
while (qds != NULL) {
qds_n = STAILQ_NEXT(qds, qdiff_lnk);
uma_zfree(qdiffsample_zone,qds);
qds = qds_n;
}
free(ccv->cc_data, M_CDG);
}
static int
cdg_beta_handler(SYSCTL_HANDLER_ARGS)
{
if (req->newptr != NULL &&
(CAST_PTR_INT(req->newptr) == 0 || CAST_PTR_INT(req->newptr) > 100))
return (EINVAL);
return (vnet_sysctl_handle_uint(oidp, arg1, arg2, req));
}
static int
cdg_exp_backoff_scale_handler(SYSCTL_HANDLER_ARGS)
{
if (req->newptr != NULL && CAST_PTR_INT(req->newptr) < 1)
return (EINVAL);
return (vnet_sysctl_handle_uint(oidp, arg1, arg2, req));
}
static inline unsigned long
cdg_window_decrease(struct cc_var *ccv, unsigned long owin, unsigned int beta)
{
return ((ulmin(CCV(ccv, snd_wnd), owin) * beta) / 100);
}
/*
* Window increase function
* This window increase function is independent of the initial window size
* to ensure small window flows are not discriminated against (i.e. fairness).
* It increases at 1pkt/rtt like Reno for alpha_inc rtts, and then 2pkts/rtt for
* the next alpha_inc rtts, etc.
*/
static void
cdg_window_increase(struct cc_var *ccv, int new_measurement)
{
struct cdg *cdg_data;
int incr, s_w_incr;
cdg_data = ccv->cc_data;
incr = s_w_incr = 0;
if (CCV(ccv, snd_cwnd) <= CCV(ccv, snd_ssthresh)) {
/* Slow start. */
incr = CCV(ccv, t_maxseg);
s_w_incr = incr;
cdg_data->window_incr = cdg_data->rtt_count = 0;
} else {
/* Congestion avoidance. */
if (new_measurement) {
s_w_incr = CCV(ccv, t_maxseg);
if (V_cdg_alpha_inc == 0) {
incr = CCV(ccv, t_maxseg);
} else {
if (++cdg_data->rtt_count >= V_cdg_alpha_inc) {
cdg_data->window_incr++;
cdg_data->rtt_count = 0;
}
incr = CCV(ccv, t_maxseg) *
cdg_data->window_incr;
}
}
}
if (cdg_data->shadow_w > 0)
cdg_data->shadow_w = ulmin(cdg_data->shadow_w + s_w_incr,
TCP_MAXWIN << CCV(ccv, snd_scale));
CCV(ccv, snd_cwnd) = ulmin(CCV(ccv, snd_cwnd) + incr,
TCP_MAXWIN << CCV(ccv, snd_scale));
}
static void
cdg_cong_signal(struct cc_var *ccv, uint32_t signal_type)
{
struct cdg *cdg_data = ccv->cc_data;
switch(signal_type) {
case CC_CDG_DELAY:
CCV(ccv, snd_ssthresh) = cdg_window_decrease(ccv,
CCV(ccv, snd_cwnd), V_cdg_beta_delay);
CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh);
CCV(ccv, snd_recover) = CCV(ccv, snd_max);
cdg_data->window_incr = cdg_data->rtt_count = 0;
ENTER_CONGRECOVERY(CCV(ccv, t_flags));
break;
case CC_NDUPACK:
/*
* If already responding to congestion OR we have guessed no
* queue in the path is full.
*/
if (IN_CONGRECOVERY(CCV(ccv, t_flags)) ||
cdg_data->queue_state < CDG_Q_FULL) {
CCV(ccv, snd_ssthresh) = CCV(ccv, snd_cwnd);
CCV(ccv, snd_recover) = CCV(ccv, snd_max);
} else {
/*
* Loss is likely to be congestion related. We have
* inferred a queue full state, so have shadow window
* react to loss as NewReno would.
*/
if (cdg_data->shadow_w > 0)
cdg_data->shadow_w = cdg_window_decrease(ccv,
cdg_data->shadow_w, RENO_BETA);
CCV(ccv, snd_ssthresh) = ulmax(cdg_data->shadow_w,
cdg_window_decrease(ccv, CCV(ccv, snd_cwnd),
V_cdg_beta_loss));
cdg_data->window_incr = cdg_data->rtt_count = 0;
}
ENTER_RECOVERY(CCV(ccv, t_flags));
break;
default:
newreno_cc_algo.cong_signal(ccv, signal_type);
break;
}
}
/*
* Using a negative exponential probabilistic backoff so that sources with
* varying RTTs which share the same link will, on average, have the same
* probability of backoff over time.
*
* Prob_backoff = 1 - exp(-qtrend / V_cdg_exp_backoff_scale), where
* V_cdg_exp_backoff_scale is the average qtrend for the exponential backoff.
*/
static inline int
prob_backoff(long qtrend)
{
int backoff, idx, p;
backoff = (qtrend > ((MAXGRAD * V_cdg_exp_backoff_scale) << D_P_E));
if (!backoff) {
if (V_cdg_exp_backoff_scale > 1)
idx = (qtrend + V_cdg_exp_backoff_scale / 2) /
V_cdg_exp_backoff_scale;
else
idx = qtrend;
/* Backoff probability proportional to rate of queue growth. */
p = (INT_MAX / (1 << EXP_PREC)) * probexp[idx];
backoff = (random() < p);
}
return (backoff);
}
static inline void
calc_moving_average(struct cdg *cdg_data, long qdiff_max, long qdiff_min)
{
struct qdiff_sample *qds;
++cdg_data->num_samples;
if (cdg_data->num_samples > cdg_data->sample_q_size) {
/* Minimum RTT. */
qds = STAILQ_FIRST(&cdg_data->qdiffmin_q);
cdg_data->min_qtrend = cdg_data->min_qtrend +
(qdiff_min - qds->qdiff) / cdg_data->sample_q_size;
STAILQ_REMOVE_HEAD(&cdg_data->qdiffmin_q, qdiff_lnk);
qds->qdiff = qdiff_min;
STAILQ_INSERT_TAIL(&cdg_data->qdiffmin_q, qds, qdiff_lnk);
/* Maximum RTT. */
qds = STAILQ_FIRST(&cdg_data->qdiffmax_q);
cdg_data->max_qtrend = cdg_data->max_qtrend +
(qdiff_max - qds->qdiff) / cdg_data->sample_q_size;
STAILQ_REMOVE_HEAD(&cdg_data->qdiffmax_q, qdiff_lnk);
qds->qdiff = qdiff_max;
STAILQ_INSERT_TAIL(&cdg_data->qdiffmax_q, qds, qdiff_lnk);
--cdg_data->num_samples;
} else {
qds = uma_zalloc(qdiffsample_zone, M_NOWAIT);
if (qds != NULL) {
cdg_data->min_qtrend = cdg_data->min_qtrend +
qdiff_min / cdg_data->sample_q_size;
qds->qdiff = qdiff_min;
STAILQ_INSERT_TAIL(&cdg_data->qdiffmin_q, qds,
qdiff_lnk);
}
qds = uma_zalloc(qdiffsample_zone, M_NOWAIT);
if (qds) {
cdg_data->max_qtrend = cdg_data->max_qtrend +
qdiff_max / cdg_data->sample_q_size;
qds->qdiff = qdiff_max;
STAILQ_INSERT_TAIL(&cdg_data->qdiffmax_q, qds,
qdiff_lnk);
}
}
}
static void
cdg_ack_received(struct cc_var *ccv, uint16_t ack_type)
{
struct cdg *cdg_data;
struct ertt *e_t;
long qdiff_max, qdiff_min;
int congestion, new_measurement, slowstart;
cdg_data = ccv->cc_data;
e_t = (struct ertt *)khelp_get_osd(CCV(ccv, osd), ertt_id);
new_measurement = e_t->flags & ERTT_NEW_MEASUREMENT;
congestion = 0;
cdg_data->maxrtt_in_rtt = imax(e_t->rtt, cdg_data->maxrtt_in_rtt);
cdg_data->minrtt_in_rtt = imin(e_t->rtt, cdg_data->minrtt_in_rtt);
if (new_measurement) {
slowstart = (CCV(ccv, snd_cwnd) <= CCV(ccv, snd_ssthresh));
/*
* Update smoothed gradient measurements. Since we are only
* using one measurement per RTT, use max or min rtt_in_rtt.
* This is also less noisy than a sample RTT measurement. Max
* RTT measurements can have trouble due to OS issues.
*/
if (cdg_data->maxrtt_in_prevrtt) {
qdiff_max = ((long)(cdg_data->maxrtt_in_rtt -
cdg_data->maxrtt_in_prevrtt) << D_P_E );
qdiff_min = ((long)(cdg_data->minrtt_in_rtt -
cdg_data->minrtt_in_prevrtt) << D_P_E );
calc_moving_average(cdg_data, qdiff_max, qdiff_min);
/* Probabilistic backoff with respect to gradient. */
if (slowstart && qdiff_min > 0)
congestion = prob_backoff(qdiff_min);
else if (cdg_data->min_qtrend > 0)
congestion = prob_backoff(cdg_data->min_qtrend);
else if (slowstart && qdiff_max > 0)
congestion = prob_backoff(qdiff_max);
else if (cdg_data->max_qtrend > 0)
congestion = prob_backoff(cdg_data->max_qtrend);
/* Update estimate of queue state. */
if (cdg_data->min_qtrend > 0 &&
cdg_data->max_qtrend <= 0) {
cdg_data->queue_state = CDG_Q_FULL;
} else if (cdg_data->min_qtrend >= 0 &&
cdg_data->max_qtrend < 0) {
cdg_data->queue_state = CDG_Q_EMPTY;
cdg_data->shadow_w = 0;
} else if (cdg_data->min_qtrend > 0 &&
cdg_data->max_qtrend > 0) {
cdg_data->queue_state = CDG_Q_RISING;
} else if (cdg_data->min_qtrend < 0 &&
cdg_data->max_qtrend < 0) {
cdg_data->queue_state = CDG_Q_FALLING;
}
if (cdg_data->min_qtrend < 0 ||
cdg_data->max_qtrend < 0)
cdg_data->consec_cong_cnt = 0;
}
cdg_data->minrtt_in_prevrtt = cdg_data->minrtt_in_rtt;
cdg_data->minrtt_in_rtt = INT_MAX;
cdg_data->maxrtt_in_prevrtt = cdg_data->maxrtt_in_rtt;
cdg_data->maxrtt_in_rtt = 0;
e_t->flags &= ~ERTT_NEW_MEASUREMENT;
}
if (congestion) {
cdg_data->consec_cong_cnt++;
if (!IN_RECOVERY(CCV(ccv, t_flags))) {
if (cdg_data->consec_cong_cnt <= V_cdg_consec_cong)
cdg_cong_signal(ccv, CC_CDG_DELAY);
else
/*
* We have been backing off but the queue is not
* falling. Assume we are competing with
* loss-based flows and don't back off for the
* next V_cdg_hold_backoff RTT periods.
*/
if (cdg_data->consec_cong_cnt >=
V_cdg_consec_cong + V_cdg_hold_backoff)
cdg_data->consec_cong_cnt = 0;
/* Won't see effect until 2nd RTT. */
cdg_data->maxrtt_in_prevrtt = 0;
/*
* Resync shadow window in case we are competing with a
* loss based flow
*/
cdg_data->shadow_w = ulmax(CCV(ccv, snd_cwnd),
cdg_data->shadow_w);
}
} else if (ack_type == CC_ACK)
cdg_window_increase(ccv, new_measurement);
}
/* When a vnet is created and being initialised, init the per-stack CDG vars. */
VNET_SYSINIT(cdg_init_vnet, SI_SUB_PROTO_BEGIN, SI_ORDER_FIRST,
cdg_init_vnet, NULL);
SYSCTL_DECL(_net_inet_tcp_cc_cdg);
SYSCTL_NODE(_net_inet_tcp_cc, OID_AUTO, cdg, CTLFLAG_RW, NULL,
"CAIA delay-gradient congestion control related settings");
SYSCTL_STRING(_net_inet_tcp_cc_cdg, OID_AUTO, version,
CTLFLAG_RD, CDG_VERSION, sizeof(CDG_VERSION) - 1,
"Current algorithm/implementation version number");
SYSCTL_VNET_UINT(_net_inet_tcp_cc_cdg, OID_AUTO, alpha_inc,
CTLFLAG_RW, &VNET_NAME(cdg_alpha_inc), 0,
"Increment the window increase factor alpha by 1 MSS segment every "
"alpha_inc RTTs during congestion avoidance mode.");
SYSCTL_VNET_PROC(_net_inet_tcp_cc_cdg, OID_AUTO, beta_delay,
CTLTYPE_UINT|CTLFLAG_RW, &VNET_NAME(cdg_beta_delay), 70,
&cdg_beta_handler, "IU",
"Delay-based window decrease factor as a percentage "
"(on delay-based backoff, w = w * beta_delay / 100)");
SYSCTL_VNET_PROC(_net_inet_tcp_cc_cdg, OID_AUTO, beta_loss,
CTLTYPE_UINT|CTLFLAG_RW, &VNET_NAME(cdg_beta_loss), 50,
&cdg_beta_handler, "IU",
"Loss-based window decrease factor as a percentage "
"(on loss-based backoff, w = w * beta_loss / 100)");
SYSCTL_VNET_PROC(_net_inet_tcp_cc_cdg, OID_AUTO, exp_backoff_scale,
CTLTYPE_UINT|CTLFLAG_RW, &VNET_NAME(cdg_exp_backoff_scale), 2,
&cdg_exp_backoff_scale_handler, "IU",
"Scaling parameter for the probabilistic exponential backoff");
SYSCTL_VNET_UINT(_net_inet_tcp_cc_cdg, OID_AUTO, smoothing_factor,
CTLFLAG_RW, &VNET_NAME(cdg_smoothing_factor), 8,
"Number of samples used for moving average smoothing (0 = no smoothing)");
SYSCTL_VNET_UINT(_net_inet_tcp_cc_cdg, OID_AUTO, loss_compete_consec_cong,
CTLFLAG_RW, &VNET_NAME(cdg_consec_cong), 5,
"Number of consecutive delay-gradient based congestion episodes which will "
"trigger loss based CC compatibility");
SYSCTL_VNET_UINT(_net_inet_tcp_cc_cdg, OID_AUTO, loss_compete_hold_backoff,
CTLFLAG_RW, &VNET_NAME(cdg_hold_backoff), 5,
"Number of consecutive delay-gradient based congestion episodes to hold "
"the window backoff for loss based CC compatibility");
DECLARE_CC_MODULE(cdg, &cdg_cc_algo);
MODULE_DEPEND(cdg, ertt, 1, 1, 1);