Research Wiki | Notes / ForceCopy

Forcing Copies

April 25, 2014

The Problem

Many optimizations common to conventional computers sacrifice predictability and worst case run time for improved average case run times. Real time systems, on the other hand, are concerned with making guarantees about worst case execution times. One such example is the Copy-on-Write optimization utilized by the fork() system call. When a process forks itself, the child has the same memory mappings as the parent. If the child writes to any of that memory, it is then copied to a child-specific mapping.

So, in my work with providing a POSIX compliant real-time system with redundancy, I had to find a way around this optimization. This is a well know issue, and is even alluded to in the mlockall() man pages:

Real-time processes that are using mlockall() to prevent delays on page faults should reserve enough locked stack pages before entering the time-critical section, so that no page fault can be caused by function calls. This can be achieved by calling a function that allocates a sufficiently large automatic variable (an array) and writes to the memory occupied by this array in order to touch these stack pages. This way, enough pages will be mapped for the stack and can be locked into RAM. The dummy writes ensure that not even copy-on-write page faults can occur in the critical section.

The Solution

Building from the quoted man page entry, the solution seems straight forward: after forking, the child should perform dummy writes to all of its memory, so that any memory marked copy-on-write will be copied. Furthermore, we can ensure that the memory remains paged in if we also make a call to mlockall(MCL_FUTURE) before we walk the memory.

So, how can our child know what memory it needs to walk? Unfortunately I was not able to find a POSIX supported way for a process to learn about its own memory mappings. Linux, however, maintains a pseudo file for every process with the mappings listed. The listing of memory mapped in is located in /proc/self/maps (or /proc/<pid>/maps). It was then just a matter of parsing this file line by line, and for any mapping marked writable, performing the dummy write, which ends up looking something a bit like this:

  if (readable && writable) {
    current_address = start;
    while (current_address < end) {
      // read a byte, write a byte
      single_byte = *((char *)current_address);
      *((char*)current_address) = single_byte;
      // increase address by page size   
      // TODO: must be a better way to get this. Also, what about huge pages?
      current_address += 0x800;
    }
  }

As you can see from the comments, I need to look into a smarter way of deciding on the stride, since page sizes will vary with architecture. Huge pages may also need to be accounted for.

Verification

To actually test this code, I wanted to see if the virtual addresses were being updated to map to different physical addresses. This information is also available in /proc/self/pagemap, but it is not the most straight forward file to parse. Luckily EQWare has a tutorial on the topic with available source code here (Author not listed).

With this program, I was able to run some simple tests. First I examined the results of a program that just calls fork() to confirm that there where some virtual addresses in the parent that referred to the same physical address as a virtual address in the child. These are the copy-on-write pages that have not yet been copied. I then used mlockall and the dummy write trick in the child, and confirmed that the virtual addresses did in fact map to different physical addresses, indicating a copy had occurred.