Research Wiki | Notes / TimingInterrupts

Accounting for Interrupts

May 15, 2014

Motivation

My system is set up for experiments. My kernel now has the RTPatch. I'm ready to take some basic benchmarks. But this is what I get for a simple round-trip test:

Histogram of round-trip times. X-Axis: Number of Cycles Y-Axis: Number of Occurrences

The top of the chart as been chopped. The large cluster to the left is just about how we thought the baseline time measurements should fall. The problem is that smaller bump to the far right. Why are a minority of round trip times so much larger?

The first suspect: interrupts.

Detecting / Timing Interrupts

The number of interrupts handled by each core is conveniently recorded, and can be checked with:
> cat /proc/interrupts

To get an estimate on how long it takes for an interrupt to be handled, from user-space, one can check the current time in a type loop. If the time that has elapsed is significantly longer than expected, an interrupt has likely occurred. This is what the following snippet does, using rdtsc (source):

  last_int_count = oldParse();
  rdtscll(start_time);
  rdtscll(last_time);

  while (next_result_index < result_count) {
    rdtscll(curr_time);

    results_time[next_result_index] = curr_time - last_time;
    if (results_time[next_result_index] > threshold) {
      next_result_index++;
    }

    last_time = curr_time;
  }

  rdtscll(end_time);
  curr_int_count = oldParse();

This code records an estimate of the number of interrupts that have occurred, and records the time each interrupt took. The function oldParse returns the total number of interrupts as reported by /proc/interrupts. The program is run as the highest priority real time process, is pinned to a cpu, and has all pages locked into memory. Thus, we are hopefully measuring only interrupts.

With this program tested and working, the next task is to actually examine the data. Everything is printed to the console (after all readings have been taken to ensure that we are not measuring how long printf takes), which I redirect into a file when I run.

While excel / calc are useful programs, they are tools much better suited to crunching this kind of data, such as the matplotlib library for python. I just used it to create some histograms, which is barely scratching the surface of its capabilities.

So, with a quick program such as this (source):

  import numpy as np
  import matplotlib.pyplot as plt

  data_file = open("./TimingTicks/HundredThousandInterruptions_new.txt", "r")

  print data_file.readline()

  x = []

  for index in range(0, 100000):
      x.append(int(data_file.readline()))

  plt.figure(figsize=(12, 5))

  n, bins, patches = plt.hist(x, 150, facecolor='g')

  plt.xlabel('Time in Cycles')
  plt.ylabel('Count')
  plt.grid(True)

  plt.show()

I get the plot shown below.

Histogram of interrupt times.

Clearly there is something wrong with matplotlib! That, or my data had a few outliers that I hadn't noticed before. Let's put the numbers into perspective: The loop in which rdtsc is being call takes 32 to 28 cycles to execute (about 10 nanoseconds on my system). The interrupt counting code used a threshold of 3100 cycles (about 1 microsecond). Out of 100,000 recorded interrupts, 99,998 of them took less than 70,000 cycles (about 22 microseconds). This could all possibly help explain that second bump in the first histogram.

But there are two outliers, which each clock in just over a whopping 4,500,000 cycles (about 1.5 milliseconds)! With these outliers removed, the histogram looks a lot better:

Histogram of interrupt times with outliers removed.

This is much more in line with the expected results. But what about the outliers? Clearly they must be accounted for.

Part 2

May 22, 2014

Moving On

By "clearly they must be accounted for" I actually mean, "I am not going to account for them." At least not for now.

In coming to this conclusion, I have made several attempts to track down the source of the latency, but am no closer to understanding the origin. Right now, my best guess is that there is a Non-Maskable Interrupt (NMI) that I'm just going to have to keep an eye out for. But the search itself was fruitful.

Kernel

The Linux kernel provides several debugging tools, including latency histograms for certain operations. Unfortunately the ones I looked at all showed latencies well under the 1.5 millisecond mark. However, this does imply that the latency is not from the kernel itself, but something outside of its control, such as a NMI.

I also tried a different kernel version / rt-patch set to help rule out a kernel bug / quirk. This did not resolve the situation either.

See RT Patch for additional notes.

Cyclictest

Through a rtwiki guide and an EMSOFT paper, I found out about Cyclictest; a standard tool used to measure latency in many systems.

> sudo apt-get install rt-tests
> cyclictest -a1 -t1 -n -p99 -b1000

TODO: Explain options, try Cyclictest with a round-robin scheduler.

System Management Mode

In particular, System Management Interrupts (SMI). A sub-type of NMIs. I stumbled upon these while reading a guide on building rt application on rtwiki. Unfortunately the document is a bit out of date.

The guide mentions possible problems with SMIs causing extended latencies, especially dealing with USB devices. However, as far as I could tell, the USB issues had no affects on my system. I even dug out an old ps2 mouse and keyboard to test this one.

In the course of researching my system's chipset (look for the ISA Bridge listed by lspci), I stumbled upon a number of possible latency sources, such as the Total Cost of Ownership (TCO) timer which is often used as a watchdog. It can do niffy things like restart a computer which has completely frozen (because it is handled by the BIOS, not the OS... which is why it is non-maskable). This led down a rabbit whole of reading my chipset's documentation (Attach:c200-chipset.pdf) and how to use setpci. Turns out the the TCO enable bit was already set to 0.

Hyper-Threading

Through this hunt, I did discover that my system has hyper-threading. With hyper-threading enabled, my previous results where completely bunk and unreliable.

The easiest way to disable hyper-threading is through BIOS settings if available. Another option is to individually turn off the CPUs which actually represent the second thread on a core. The terminology here gets confusing. A processor has 1 to many cores, and each core presents Linux with 1 (no hyper-threading) or 2 (hyper-threading) CPUs.

CPUs can be turned on and off by writing a 1 or a 0 to the file online, located in /sys/devices/system/cpu/cpu#/. For my 4 core system, with hyper threading, disable CPUs 4-7 to disable hyper-threading:
> echo 0 > /sys/devices/system/cpu/cpu4/online
> echo 0 > /sys/devices/system/cpu/cpu5/online
> echo 0 > /sys/devices/system/cpu/cpu6/online
> echo 0 > /sys/devices/system/cpu/cpu7/online

Now, the baseline measurement looks much improved (outliers > 1000000 cycles removed):

Histogram of interrupt times (no Hyper-threading).

There are still some outliers, but for the most part latency is well below what it was before.

The next step is to try and re-run the original round-trip measurements.

Round-Trip

Finally, this is starting to look reasonable! Compared to the original measurements, the round-trip times have on average increased, but the worst case execution times have decreased. This is what one would expect when moving to a real-time system.