A friend of mine is enrolled in a college intro to programming course. This course had a very simple entrance test: they needed to write a program in any language to display the numbers 5-60 prefixed with “number “, like so:
|
|
After he told me about this assignment, I thought that it was pretty funny, and I wanted to write it in assembly as a joke. I started off with stealing some integer printing code that I had written for another project.
|
|
After setting this up, I wrote a quick assembly program to iterate over the numbers 5-40 and print each one:
|
|
This was pretty simple, and I felt good about it, so I sent it to my friend. He then responded with, “Unfortunately a requirement was that it be ‘significantly less than 55 lines’”. This could only be taken as a challenge, of course. He suggested that I hardcode a block of memory to contain the numbers and text that I have to print, and I thought that was a great idea, so I got to work.
My first task was to write a throwaway Python script to generate this data. After some fiddling, I came up with this:
|
|
This program is pretty simple if you break it down. Let’s focus on the list comprehension first:
|
|
Inside the list comprehension, you can see the numbers we’re iterating over in range(5, 61)
. For every number in this range, we add a new string to the list:
|
|
The first chunk of this string, 6e756d62657220
, represents the text prefixing the number (“number “) in hex. Then, we convert the current number in the loop iteration to a string, and generate the ASCII hex representation for each of it’s digits. For example, str(12).encode().hex()
would return 3132
, since 1 in hex is 0x31
and 2 is 0x32
. You may have noticed the :0<4
at the end of the f-string. This fixes a bug that I discovered. I want each line to be the same length so that it’s super easy to print in assembly, however single digit numbers are only one hex number, while double digit numbers are two hex digits. To solve this, I introduced null-padding into the numbers. This means that if a number is only 1 hex number long, I add a null byte after it. This null byte is not rendered by the terminal, so it shouldn’t interfere with how the display is formatted. For example, if we have the number 7 (0x37), this little part of the f-string will add 0x00 after it, giving us a 4 byte long string of 0x37, 0x00
. The only part remaining is the 0a
at the end of the string, which is hex for a newline (\n
). This covers the list comprehension of the Python. Next is the second step:
|
|
This part is pretty simple as well. We take each sequence of hex digits that were generated from the previous step (e.g. 3132
), and we convert them into a format that assembly can read. First, we use the textwrap module (very strange usage of this module but I guess it works) to split the data into 2 digit long chunks. For example, '3132'
would be split into ['31', '32']
. We then prepend 0x
to each of these strings to tell the assembly that this is a hexadecimal number and not a base 10 number. The rest of the code on that line just strings together each of these new numbers with commas, giving you a final result of 0x31, 0x32
.
Next step was to write the program. I had a pretty strong mental image of how this should go:
I started writing, and after a while I came up with a very basic implementation. I had a counter that incremented until it was equal to the length of the data block, and when it was, it exited. However, I realized that I could do some math to replace the counter entirely, and this removed a few lines of code. Here’s what I ended up with:
|
|
As you can see, this implementation worked and it was significantly shorter than the previous implementation due to it’s hardcoded nature. Line 2 initializes the data, line 8 loads the address of the data into rsi
before we start the loop, and then we start printing. Lines 12-14 are for bounds checking. We load the address of numbers
plus the length of that block of data into rdi
, and then compare it with the address we’re currently reading from, rsi
. If they’re equal, that means we’ve read all of the data, and we can exit. If not, we continue and load data into the appropriate registers in order to print 10 bytes from our current memory address. Think of it as taking 10 bytes at a time from our huge list of bytes. Then, we print these 10 bytes to the screen and end up with something like number 12
. Then, we check if there’s another 10 bytes, available, and if there are, we continue doing this. However, I was really invested now and wanted to try and make it as short as possible, so I looked for ways to optimize it. Even when I remove linebreaks and put labels on the same lines as code, I still end up with this:
|
|
This is 18 lines, which is better than 25 but I still felt like I could do even better. I kept analyzing, and then all of a sudden, I saw it in a way I hadn’t seen before, and I quickly moved some stuff around. Here’s the final product:
|
|
You may have noticed something a little different, which is the trailing null byte at the end of the data block. I realized that the first byte in the chunk of 10 will never be null unless I explicitly set it, since the first thing in each line is the letter “n” (0x6e
). After adding this null byte, I could get rid of the numbers_len
variable completely and all of the extra lines that came along with it. The flow of the program is still mostly the same. Instead of bounds checking first, I print our 10 bytes first. Then, I increment our address by 10 and check if the first byte in this next chunk of 10 is 0x0
. If it is not, then that means we’re not done and we jump back up to the printing loop. This little inversion of checking saves us an extra line of code because we don’t have to jmp printing_loop
at the bottom of the loop, this is just done if we’re not finished. If we are finished, the program will continue reading top down, skipping the jump to printing_loop
, and we’ll exit with status code 0. When we remove all blank lines and we collapse labels, we end up with a final total of 15 lines, which is pretty good in my opinion. The full code for all of these files can be found below: