<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Wesley Wei]]></title><description><![CDATA[Sarcasm and Software]]></description><link>https://weiwesley.com/</link><image><url>http://weiwesley.com/favicon.png</url><title>Wesley Wei</title><link>https://weiwesley.com/</link></image><generator>Ghost 3.17</generator><lastBuildDate>Tue, 07 Apr 2026 11:04:34 GMT</lastBuildDate><atom:link href="https://weiwesley.com/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Read, Eval, Print: A First Look]]></title><description><![CDATA[REPL isn't good for much on its own. In fact, I'll argue that if it is good for anything on its own, then I've probably done something wrong design-wise.]]></description><link>https://weiwesley.com/read-eval-print/</link><guid isPermaLink="false">5b8617cd79413e729d21eeee</guid><category><![CDATA[REPL]]></category><category><![CDATA[Projects]]></category><dc:creator><![CDATA[Wesley Wei]]></dc:creator><pubDate>Tue, 04 Sep 2018 12:00:00 GMT</pubDate><content:encoded><![CDATA[<p>With some mental gymnastics, the read, eval, print pattern drives most of our interactions with computers. A couple more contortions, and it sort of describes how we interact with each other in day to day life.</p><p>Don't get me wrong: I'm not trying to make some deep statement about life or people or society. My point is that this is how we often start off thinking about programming. Tell the computer what to do, let it do it, and then have it spit out results for the world to see. From the computer's perspective, it <em>reads</em> the instructions you give, <em>evaluates</em> them to produce a result, and then <em>prints</em> them out for you to read.</p><p>This is easiest to see in any of the numerous languages for which there exists an interactive interpreter, like Python. But I'm not here to talk to you about Python.</p><p>Okay, that's a lie. But the spotlight isn't on Python.</p><h1 id="the-worst-repl">The Worst REPL</h1><p>Today, I'm going to introduce what I rather unimaginatively call <a href="https://github.com/946336/The-Worst-REPL">The Worst REPL</a>, or REPL for short. REPL isn't really a general purpose programming language like Python. In fact, it's intended to be used much more like a POSIX shell. The difference, however, is that unlike a shell, which lends itself to executing executables stored on the filesystem, REPL is much more adept at executing Python functions.</p><p>A terrible analogy that I came up with 15 seconds ago is that REPL is to Python as a shell is to the operating system. This is a wildly inaccurate analogy with more holes in it than a sponge, but hopefully it holds water about as well.</p><h1 id="explain-yourself">Explain Yourself</h1><p>Many who are comfortable working inside of a terminal emulator will agree that there exist tasks that are more suited to a terminal environment than a graphical environment. REPL grew out of solving one such problem.</p><p>I, as one member of a three person team, was working on a project that involved client applications communicating with a remote server. In any case, it eventually came time to test our server, and we were sunk. We had tested individual parts and we'd done some integration testing as well, but we needed to test the whole package from the client's perspective too.</p><p>Here was the problem: the GUI that we had could display what we needed just fine, it just wasn't yet capable of driving interaction with the remote server.</p><p>Forget testing, we didn't have a way for anyone to use the client <em>even if everything other than the GUI worked flawlessly</em>.</p><h1 id="enter-repl-s-ancestor">Enter REPL's Ancestor</h1><p>From the beginning we had been entertaining the idea of having some scripting support, and that was reflected in our design enough for an early draft of REPL to emerge. I won't go into too much detail about this ancestor to REPL, but I will say the following: It was interesting enough and powerful enough that it spawned REPL, but also designed so horribly that I threw out almost everything when I started work on REPL.</p><h1 id="motivation">Motivation</h1><p>In that terrible hack of an interactive application shell I saw some things that worked well, as well as many shortcomings that I wanted to fix. So when I set out to create REPL, I had the fuzzy goal of making it "better," whatever that meant.</p><p>REPL isn't good for much on its own. In fact, I'll argue that if it <em>is</em> good for anything on its own, then I've probably done something wrong design-wise.</p><p>See, REPL's whole shtick is that it's embeddable into other applications as a scriptable driver for testing. At it's core, what REPL does is very, very simple: it forwards text arguments from the keyboard to functions written in Python. This allows users or developers to run Python code without worrying about too much syntax, and scripting support enables automation.</p><p>What struck me was that because execution was driven by a person hammering away on a keyboard, functions exposed through REPL were forced to serialize/deserialize any data that they exposed to the user. Since RPEL's precursor was embedded in an application whose purpose was largely to make API calls over a text based protocol, this forced serialization was immensely valuable because it made us think about API boundaries.</p><p>So to me, one of REPL's most fundamental uses is to force developers to think harder about whatever API they plan to inflict on themselves and on their users.</p><h1 id="what-now">What Now?</h1><p>REPL has a simple goal from my perspective, but it's vague, not terribly inspiring, and worst of all, not particularly useful - REPL won't make any design suggestions. Over time, REPL subtly changed directions and started to feel more and more like a shell (like Bash, for example). So while REPL has always been, and will always be, an embeddable shell for testing, it is now possible to write applications centered around REPL instead of the other way around.</p><p>Next time, I'll talk about what you need to know in order to plug your own functions into REPL, as well as what you'd need to know in order to build an application using REPL</p>]]></content:encoded></item><item><title><![CDATA[The Wrong Pointer]]></title><description><![CDATA[GDB isn't perfect.]]></description><link>https://weiwesley.com/the-wrong-pointer/</link><guid isPermaLink="false">5a8f20a0deb89965a107b8ae</guid><category><![CDATA[C]]></category><dc:creator><![CDATA[Wesley Wei]]></dc:creator><pubDate>Wed, 21 Mar 2018 01:17:23 GMT</pubDate><content:encoded><![CDATA[<p>In which I get very confused.</p><p>A few months ago, I was working on a C exercise that involved coordinating a number of identical threads with the goal of minimizing runtime. In principle, it was a simple enough problem, but I somehow managed to produce a solution that successfully minimized runtime and logged egregiously incorrect timing data.</p><p>Without going into too much detail, the problem was presented as an aardvark and anthill simulation, so you'll have to pardon the names and nouns.</p><p>The simulation emitted timestamps on standard output, so the following output was what tipped me off to the fact that something strange was happening:</p><pre><code>00.002668 Aardvark A slurping Anthill 0
1519874814.003379 Aardvark B slurping Anthill 1
-7070059777.995059 Aardvark C slurping Anthill 2
-7070059777.542307 Aardvark D slurping Anthill 0
-7070064072.509315 Aardvark E slurping Anthill 1
</code></pre><p>These are times in seconds, relative to the time when the program was started. It was immediately clear that at most one of these timestamps could possibly be correct. In other cases no reasonable timestamps were produced, so it was clear that a race condition of some sort was present. However, the simulation performed exactly as expected in all other aspects.</p><p>The timestamps were produced using <a href="http://man7.org/linux/man-pages/man2/gettimeofday.2.html"><code>gettimeofday</code></a>, and the two <code>struct timeval</code> were declared not only in static global space, but also in a separate translation unit from the one I was responsible for implementing. The compiler shouldn't have allowed me even make reference to the timing structures, and a quick <code>grep</code> confirmed that I was not.</p><p>At this point, I was less motivated than I should have been, because I could independently verify that my simulated aardvarks were coordinating correctly. They finished at the published target time of approximately fifty-six seconds. That said, I couldn't really let a bug this silly slip by in good faith.</p><p>Since this was C, there was a good chance that I was looking at a memory error. Since something was stomping over memory that I had no way of legally accessing, there was a <em>very</em> good chance that I was looking at a memory error.</p><pre><code>==7940== Memcheck, a memory error detector
==7940== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==7940== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==7940== Command: ./anthills
==7940==
1519876676.016039 Aardvark C slurping Anthill 1
1519876676.041669 Aardvark A slurping Anthill 0
-7070057915.958050 Aardvark D slurping Anthill 2
-7070062210.509055 Aardvark B slurping Anthill 0
...
==7822== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
</code></pre><p>But <code>valgrind</code>'s <code>memcheck</code> reported no memory errors, and the program didn't crash or show any other symptoms. This wasn't too unexpected. After all, I didn't have many references to heap memory floating around. In fact, I had no naked calls to <code>malloc</code> at all. Since this program used the <code>printf</code> family of functions and <code>pthread</code>s, there was undoubtedly some heap memory being used behind the scenes, but I wasn't managing any of it directly.</p><p>As it turns out, most of the memory accesses that occurred in code that I controlled happened to be in static memory, where <code>memcheck</code> won't complain if you overshoot by a few bytes. <code>valgrind</code> wasn't really going to be too helpful, so it was out.</p><p>At this point I was still wondering how I could possibly have stomped over memory that I had no way of getting a pointer to, so on a whim I swapped compilers from <code>gcc</code> to <code>clang</code>.</p><pre><code>==8039== Memcheck, a memory error detector
==8039== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8039== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==8039== Command: ./anthills
==8039==
00.017880 Aardvark F slurping Anthill 2
00.041125 Aardvark B slurping Anthill 0
00.047397 Aardvark A slurping Anthill 0
00.042611 Aardvark H slurping Anthill 2
00.043128 Aardvark E slurping Anthill 1
00.045085 Aardvark G slurping Anthill 1
...
==8039== Invalid read of size 4
==8039==    at 0x4E428D9: pthread_join (pthread_join.c:45)
==8039==    by 0x401F87: main (anthills.c:390)
==8039==  Address 0x2000002d1 is not stack'd, malloc'd or (recently) free'd
==8039==
==8039==
==8039== Process terminating with default action of signal 11 (SIGSEGV)
==8039==  Access not within mapped region at address 0x2000002D1
==8039==    at 0x4E428D9: pthread_join (pthread_join.c:45)
==8039==    by 0x401F87: main (anthills.c:390)
</code></pre><p>Compiling with <code>clang</code> instead of <code>gcc</code> gave correct timing data, but caused the program to segfault upon completion. That wasn't what I had wanted to see, but it was still helpful. Given the relative simplicity of our code, it wasn't likely that I'd found a compiler bug. So I was almost certainly hitting undefined behavior that <code>gcc</code> and <code>clang</code> handled differently, and since just about the only things happening here were memory accesses through pointers, I had to be going out of bounds somewhere.</p><p><code>valgrind</code> is cognitively cheap, but in hindsight I probably should have reached for <code>gdb</code> first anyway. I took advantage of <code>gdb</code>'s ability to <a href="https://sourceware.org/gdb/onlinedocs/gdb/Set-Watchpoints.html">watch</a> a location in memory and to interrupt the program whenever a change is detected.</p><p>For a few reasons, I decided to stick with <code>gcc</code>. First and foremost among them is that a segfault upon completion seems worse than incorrect timing data. In any case, I knew the names of the memory locations associated with the timing data, but I didn't know which <code>pthread_t</code> was causing problems.  Even if I did figure out which one it was, with over twenty threads, I had no guarantee that it would be the same one every time.</p><p>There were only two <code>timeval</code> structures: <code>start</code> and <code>now</code>. Since time proceeded forward normally between jumps, <code>now</code> was being used correctly.  <code>gettimeofday</code> gives time relative to the UNIX epoch, so I was fairly sure that <code>now</code> wasn't the problem. That left me with just <code>start</code> to worry about.</p><p>I fired up <code>gdb</code> and watched <code>start</code>.</p><pre><code>(gdb) p &amp;start
$1 = (struct timeval *) 0x6036d0 &lt;start&gt;
(gdb) watch (struct timeval) *0x6036d0
Hardware watchpoint 1: (struct timeval) *0x6036d0
</code></pre><p>The timing data is usually correct for the first couple of ticks, and sort of as expected, I see the following a few times.</p><pre><code>Hardware watchpoint 1: (struct timeval) *0x6036d0

Old value = {tv_sec = 0, tv_usec = 0}
New value = {tv_sec = 1519952936, tv_usec = 0}
0x00007ffff7ffac6d in gettimeofday ()
</code></pre><p>I tell <code>gdb</code> to continue a few times, and then something more interesting crops up:</p><pre><code>Thread 4 "anthills" hit Hardware watchpoint 1: (struct timeval) *0x6036d0

Old value = {tv_sec = 1519952940, tv_usec = 882809}
New value = {tv_sec = 1, tv_usec = 882809}
actually_slurp (aardvark=66 'B', hill=1, order=1) at aardvarks.c:170
170     if (0 != sem_trywait(ants_remaining + hill)) return;
</code></pre><p>This is much more interesting, and looking at the new value tells me that this is when things go wrong. Since the form of the equation used to calculate elapsed time is <code>start - now</code>, this also explains why the timestamps become extremely negative after this point.</p><p>That's progress, but something isn't right here. <code>sem_trywait</code> operates on <a href="http://man7.org/linux/man-pages/man7/sem_overview.7.html">semaphores</a>, and <code>ants_remaining</code> is very much an array of semaphores. Further, <code>hill</code> is guaranteed to be a valid index. So why is <code>sem_trywait</code> trampling over <code>start</code>?</p><p>I could try using <code>gdb</code> to watch all of the semaphores in <code>ants_remaining</code>, but my hardware is not capable of watching the three semaphores that make up <code>ants_remaining</code>, so that's not going to work.</p><p>I now know that for some reason, <code>sem_trywait</code> is stomping over memory that holds <code>struct timeval start</code>, but I don't know why. What else can I get out of <code>gdb</code>?</p><p>I know I'm in a function, so I can inspect the call stack.</p><pre><code>(gdb) where
#0  actually_slurp (aardvark=66 'B', hill=1, order=1) at aardvarks.c:170
#1  0x000000000040102e in aardvark (input=0x7fffffffdd71) at aardvarks.c:271
...
</code></pre><p>If I ignore all of the extra code I used while I was still working out how to coordinate the aardvarks, <code>actually_slurp</code> boils down to the following:</p><pre><code>void actually_slurp(char aardvark, int hill, int order)
{
    if (hill == -1)
        hill = next_robin();
        last_robins[order - 9] = hill;

    if (0 != sem_trywait(ants_remaining + hill)) return;

    clock_in(hill, aardvark - 'A');

    slurp(aardvark, hill);
}
</code></pre><p>Oh.</p><p>I'm an idiot: I've omitted the braces around the body of a conditional block. That section was meant to read:</p><pre><code>    if (hill == -1) {
        hill = next_robin();
        last_robins[order - 9] = hill;
    }
</code></pre><p>This fixes the timestamps, and the simulation no longer travels backwards into negative time.</p><p>As it turns out, the rest of the program works hard to guarantee that <code>hill</code> is only ever -1 when <code>order</code> is 9 or greater. This means that the condition <code>hill == -1</code> is an implicit bounds check for the assignment into <code>last_robins</code>.</p><p>But why would an out-of-bounds write into <code>last_robins</code> cause <code>sem_trywait</code> to clobber <code>start</code> when operating on <code>ants_remaining</code>?</p><p>I took a peek at the declarations of <code>last_robins</code> and <code>ants_remaining</code>.</p><pre><code>int last_robins[AARDVARKS % ANTHILLS];

// Keep at most 3 aardvarks on an anthill at once
static sem_t occupancy_per_hill[ANTHILLS];
static sem_t ants_remaining[ANTHILLS];
</code></pre><p>They're suspiciously close. <code>AARDVARKS</code> and <code>ANTHILLS</code> are 11 and 3, respectively, so intuitively it's not too much of a stretch to believe that going out of bounds in one of them might land me inside one of the others.</p><p>I decided to take a look at what memory got clobbered by the line that was being executed when it shouldn't have been, <code>last_robins[order - 9]</code>.</p><p>Recalling that <code>order</code> was 1 and appeared to be consistently 1, <code>gdb</code> says the following:</p><pre><code>(gdb) p &amp;(last_robins[1 - 9])
$1 = (int *) 0x6036d0 &lt;start&gt;
(gdb) p &amp;start
$2 = (struct timeval *) 0x6036d0 &lt;start&gt;
</code></pre><p><em>Oh.</em></p><p>If we compile with <code>clang</code>, the same exercise reports:</p><pre><code>(gdb) p &amp;(last_robins[1 - 9])
$1 = (int *) 0x603840 &lt;threads+64&gt;
</code></pre><p><code>threads</code> turns out to be yet another global static array in a translation unit that my code can't see. It contains the <code>pthread_t</code> objects that need to be cleaned up before the program terminates, so now I know why compiling with <code>clang</code> gave me an executable that behaved perfectly correctly until it tried to clean up after itself. Since out of bounds references are undefined behavior, the compiler is free to rearrange global static memory in whatever way it wishes to, and it turns out that <code>gcc</code> and <code>clang</code> chose different layouts.</p><p>I have just one last niggling concern: this isn't entirely consistent with my initial observation. <code>gdb</code> clearly called out a call to <code>sem_trywait</code> as the culprit, but we can clearly see here that it is in fact the previous line, <code>last_robins[order - 9]</code> that caused the erroneous write. I'm not too sure why this is the case, and it might be interesting to see if <code>gdb</code> will consistently point at the line immediately following the one that causes the write to memory.</p><pre><code>void actually_slurp(char aardvark, int hill, int order)
{
    if (hill == -1) {
        hill = next_robin();
    }
    last_robins[order - 9] = hill;

    fprintf(stderr, "Oh no\n");

    if (0 != sem_trywait(ants_remaining + hill)) return;

    clock_in(hill, aardvark - 'A');
    slurp(aardvark, hill);
}
</code></pre><p>The buggy code above yields the following:</p><pre><code>Thread 3 "anthills" hit Hardware watchpoint 1: (struct timeval) * 0x6036d0

Old value = {tv_sec = 1520234254, tv_usec = 914875}
New value = {tv_sec = 8589934593, tv_usec = 914875}
actually_slurp (aardvark=66 'B', hill=1, order=1) at aardvarks.c:165
165     fprintf(stderr, "Oh no\n");
(gdb) where
#0  actually_slurp (aardvark=66 'B', hill=1, order=1) at aardvarks.c:165
#1  0x000000000040104c in aardvark (input=0x7fffffffdcd1) at aardvarks.c:273
</code></pre><p><em>Well alright then.</em></p>]]></content:encoded></item></channel></rss>