Break the RX processing up into smaller chunks of 128 frames each.

Right now processing a full 512 frame queue takes quite a while (measured on the order of milliseconds.) Because of this, the TX processing ends up sometimes preempting the taskqueue: * userland sends a frame * it goes in through net80211 and out to ath_start() * ath_start() will end up either direct dispatching or software queuing a frame. If TX had to wait for RX to finish, it would add quite a few ms of additional latency to the packet transmission. This in the past has caused issues with TCP throughput. Now, as part of my attempt to bring sanity to the TX/RX paths, the first step is to make the RX processing happen in smaller 'parts'. That way when TX is pushed into the ath taskqueue, there won't be so much latency in the way of things. The bigger scale change (which will come much later) is to actually process the frames in the ath_intr taskqueue but process _frames_ in the ath driver taskqueue. That would reduce the latency between processing and requeuing new descriptors. But that'll come later. The actual work: * Add ATH_RX_MAX at 128 (static for now); * break out of the processing loop if npkts reaches ATH_RX_MAX; * if we processed ATH_RX_MAX or more frames during the processing loop, immediately reschedule another RX taskqueue run. This will handle the further frames in the taskqueue. This should have very minimal impact on the general throughput case, unless the scheduler is being very very strange or the ath taskqueue ends up spending a lot of time on non-RX operations (such as TX completion.)
svn path=/head/; revision=241558
2024-12-20 11:11:24 +00:00 · 2012-10-14 20:31:38 +00:00 · 2012-10-14 20:31:38 +00:00 · 516f67965a · 2020-12-20 02:59:44 +00:00
commit 516f67965a
parent 9b233e2307
1 changed files with 22 additions and 0 deletions
--- a/sys/dev/ath/if_ath_rx.c
+++ b/sys/dev/ath/if_ath_rx.c
@ -797,6 +797,8 @@ ath_rx_pkt(struct ath_softc *sc, struct ath_rx_status *rs, HAL_STATUS status,
 	return (is_good);
 }

+#define	ATH_RX_MAX		128
+
 static void
 ath_rx_proc(struct ath_softc *sc, int resched)
 {
@ -832,6 +834,15 @@ ath_rx_proc(struct ath_softc *sc, int resched)
 	sc->sc_stats.ast_rx_noise = nf;
 	tsf = ath_hal_gettsf64(ah);
 	do {
+		/*
+		 * Don't process too many packets at a time; give the
+		 * TX thread time to also run - otherwise the TX
+		 * latency can jump by quite a bit, causing throughput
+		 * degredation.
+		 */
+		if (npkts >= ATH_RX_MAX)
+			break;
+
 		bf = TAILQ_FIRST(&sc->sc_rxbuf);
 		if (sc->sc_rxslink && bf == NULL) {	/* NB: shouldn't happen */
 			if_printf(ifp, "%s: no buffer!\n", __func__);
@ -942,11 +953,22 @@ ath_rx_proc(struct ath_softc *sc, int resched)
 	}
 #undef PA2DESC

+	/*
+	 * If we hit the maximum number of frames in this round,
+	 * reschedule for another immediate pass.  This gives
+	 * the TX and TX completion routines time to run, which
+	 * will reduce latency.
+	 */
+	if (npkts >= ATH_RX_MAX)
+		taskqueue_enqueue(sc->sc_tq, &sc->sc_rxtask);
+
 	ATH_PCU_LOCK(sc);
 	sc->sc_rxproc_cnt--;
 	ATH_PCU_UNLOCK(sc);
 }

+#undef	ATH_RX_MAX
+
 /*
 * Only run the RX proc if it's not already running.
 * Since this may get run as part of the reset/flush path,