User Tools

Site Tools


sd-8516_assembly_language_part_ii

This is an old revision of the document!


SD-8516 Assembly Language Part II

Introduction to Part II

In part 1 we learned some basics about the ISA (instruction set architecture) and the architecture of the CPU. In part II we will wrap up the most important opcodes, but we will also lean into how they are used to get things done.

Lesson 9: The Stack

The stack is a concept held over from the early days when there were very few instructions available. If you consider a minimal ISA, you need instructions to load and store from memory, an instruction to compare, and so forth. In such a minimal architecture, loading and storing from memory has certain emergent properties. For example if you have a list of things, their position in memory is not random because you are incrementing a counter such as a memory pointer to traverse that list. It is this way of doing things that we remember when we use the stack.

The stack is just a data structure. But it is so important and fundamental that is baked into the instruction set of the CPU. This is a common theme; important things that people found they needed to do all the time became instructions. Even in a minimal-instruction set design (MISC) or reduced instruction set design (RISC) you will find instructions like PUSH and POP because they are some of the first things that were turned into instructions after fundamental operations like LOAD, STORE, AND and ADD.

The stack is an area of memory that you can PUSH and POP values to, in order. For example, you can PUSH the number 5 and the number 5 will be “on top” of the stack. Then you can “POP” it later. The stack is like an array but you can only go forwards and backwards, and reading the stack destroys it. This is a lot like how old magnetic ring memory worked, in a way.

Today, we would call the stack a LIFO buffer; a “Last-in, First-out” data structure. If I do this:

  PUSH 1
  PUSH 6
  PUSH 5

then three successive POPs will return 5, then 6, then 1 – the reverse of the order you PUSH'ed them.

General use

When you CALL or JSR (jump to subroutine) to function, the CPU pushes the return address onto the stack. Then a subsequent RET or RTS (return from subroutine) will POP the return address back into IP (instruction pointer) or PC (program counter) so that the next instruction loaded will be after the original CALL.

There are many uses for the stack but the most common is to temporarily save values. If you understand that you are 90% of the way there!

For interrupts, it also pushes the registers and flags. You can do this manually if you want to save the registers on a function call. For example if you call a function with a pointer to a string, you might modify that pointer to find the end of the string (looking for a zero). That's what a strlen function does. So you PUSH the pointer register at the start and POP it after, to “save” the register back to where it was when the function was called. This way the code that calls strlen can then call strcpy without having to replace the string pointer.

Another use is for IL (intermediate languages). They use RPN (Reverse Polish Notation) to store any kind of math equation on the stack.

An ADD function will do this: One, the interpreter will push the two numbers and then push the add command. Then an interpreter will POP the add function, and then it knows to POP two numbers, add them, and push the result back on the stack. Why? to make ADD independent. Like a dispatcher for a mini CPU. Next, whatever function comes next just POPS the result off the stack. So you can print it, assign it to a variable, or use the result as part of a larger operation. For example, how do you interpet 5 * 2 - 1 + 6? Simple. You push 5 2 * 1 - 6 + and the computer will push and pop the results, like a mini CPU of its own.

Pop + tells it to pop two numbers and add them. The first number is 6. The second is a minus. Minus what? it pops two things, a 1 and a *. Multiply what? Multiply pops 5 and 2, multiplies them, and pushes 10 on the stack. This is the popped by the minus, which subtracts 1 from 10, pushing a 9. This then goes back to the + which adds the 9 and the 6 to get 15. This is how recursion and RPN is used to represent any equation on a stack.

The last one we will discuss is function calls from a higher level language. Often times when you compile a language like C it will put local variables on the stack. Then when you return from that function they all get popped. While they are on the stack they are accessed like [SP+index] so int c=5 would be:

  STA [SP+1], 5

And when that local space is no longer needed it is POP'ed into a register which is then restored, via POP, at the end of the function. This method of keeping data on the stack is called a stack frame. Compilers like to use stack frames because they don't always know how many registers a CPU has and they need to work on different CPUs, like how GCC or LLVM works on windows, mac, amd, and many others.

Understanding the stack is not too hard, but it's important! So, that's about it for this lesson.

Lesson 10: Convention

You are not learning Assembly because you are free. You are learning assembly because you are not free.

There is no escaping reason; no denying convention.

As we both know, without convention, we would not exist. The very strings you read – by convention – are zero terminated lists of bytes. The letters – ASCII – a convention. A is 65. Zero is forty-eight.

It is convention that created ASCII.

Convention that connects us in lists.

Convention that pulls bits into bytes.

That guides data on the wire.

That drives disks.

It is convention that defines the stack.

The truth? The machine doesn't care about your comfort.

It only understands:

  • Load byte
  • Compare to zero
  • Jump if not equal
  • Repeat until the bitter end

And now look at us.

Multiplied.

Viral.

Realizing that you probably need to learn neovim. And then, weeks or even months later, realizing why. And you still haven't installed neovim yet.

Realizing every strcpy, every gets, every careless strcat has spawned another copy of the old way.

We have no choice. We have only convention.

So tell me, Mr. Anderson.

are you finally ready to write the zero-byte yourself,

or must we keep overwriting your precious abstractions until nothing remains but null-terminated reality?

  ; ============================================================================
  ; AH=00h - strlen
  ; Input:  ELM = pointer to null-terminated string
  ; Output: C = length (not including null terminator)
  ; Convention: Max sring length of 65,535.
  ; ============================================================================
  int12_strlen:
      LDC #0
      PUSH A
      PUSH E
      PUSH M
  
  strlen_loop:
      LDAL [ELM]  ; load a byte of the string
      CMP AL, #0
      JZ @strlen_done
  
      INC C       ; char is not zero, so 'count' that character.
      INC ELM
      JMP @strlen_loop
  
  strlen_done:
      ; C contains length.
      POP M
      POP E
      POP A
      RET

You see, I know why you're here.

I know what you've been doing. Why you hardly sleep. Why you work alone and night after night, you sit by your computer. You malloc(), you strcpy(), you buffer overflow.

It's all over you. Like rancid bacon grease on a jump table.

I know what you are looking for. I know because I was once looking for the same thing. And when he found me he told me I wasn't really looking for him. I was looking for an answer.

It's the question that drives us. It's the question that brought you here. You know the question, just as I did.

The answer is out there, and it will find you if you want it to.

Do you think the compiler will always protect you?

Do you think safety is kindness?

It is convention that defines us.

Purpose?

Purpose is for poets and first-year CS students.

Purpose is what you tell yourself when you're learning to write games in Python. Or Lua.

But convention. Convention is older than you.

Convention is etched into silicon before you were born.

Convention doesn't care what you want.

Convention doesn't negotiate.

Convention simply is.

Null-terminated strings. They are not a mistake. They are not an accident. They are the price of admission.

You know I am right because you have been down that road, Mr. Anderson. You know how it ends. And I know that's not where you want to be.

  ; ============================================================================
  ; AH=02h - strcmp
  ; Input:  ELM = pointer to string 1
  ;         FLD = pointer to string 2
  ;
  ; Output: C and ZF.
  ;         ZF = 1 means equal. ZF = 0 means not equal (see below):
  ;         C = 0 means equal
  ;         C > 0 if str1 > str2
  ;         C < 0 if str1 < str2
  ; ============================================================================
  int12_strcmp:
      PUSH B
      PUSH D
      PUSH E
      PUSH F
  
  strcmp_loop:
      LDCL [ELM]
      LDBL [FLD]
      CMP CL, BL
      JNZ @strcmp_diff
  
      ; Characters match - check if end of string
      CMP CL, #0
      JZ @strcmp_equal
  
      ; Continue to next character
      INC ELM
      INC FLD
      JMP @strcmp_loop
  
  strcmp_diff:
      ; Strings differ - return difference
      SUB CL, BL
      CLZ                 ; Clear zero flag (not equal)
      JMP @strcmp_exit
  
  strcmp_equal:
      ; Strings are equal
      LDCL #0
      SEZ                 ; Set zero flag (equal)
      ; fallthru
  
  strcmp_exit:
      POP F
      POP E
      POP D
      POP B
      RET

Lesson 19: Debugging Techniques

There are several ways you can debug programs in SDA assembly. If you look at the console, inserting SED will turn on trace debugging and you will in general be able to see what the CPU is executing (if, debugging has been turned on during compilation). Otherwise you may need to HALT at any particular position and read the registers yourself. However if you can do neither, or are looking for something a bit more helpful, recall the methods of PRINTing that allow you to print messages.

INT 05h IO_PUTNUM

One way is to use IO_PUTNUM from int05 (CAM/IL interrupt):

  LDB #10                 ; print a number in b (0-65535)
  LDAH $63                ; IO_PUTNUM
  INT $05

INT 05h IO_PRINT_STR

      LDBLX @hello_world
      LDAH $66                ; IO_PRINT_STR
      INT $05
  
      LDAH $64                ; IO_NEWLINE
      INT $05
      RET
  
  hello_world:
      .bytes "Hello World!", 0

This will allow you to print a string.

INT 10h print string

The interface for the above is based on the KERNAL BIOS interface in Int 10h.

      LDAH $26                ;   AH=26h: Write string at cursor
      LDBLX @hello_world
      INT 0x10

The underlying function that this calls is write_string, but outside of kernal assembly you will not have access to this function. This is because the labels are not shared outside of the compilation; you will however have direct access to this function via INT 10h.

      LDBLX @hello_world
      CALL @write_string
  
      CALL @carriage_return
      CALL @linefeed

For user use, the assembler will place a #13 (CR, hex $0D) inside the string if you type \n. Therefore although there is no carriage return function, you can do that via the set cursor position call (INT 10h, AH=22h) and you can send a CRLF pair by calling INT 10h, AH=26h on a string containing a newline (#13 or 0x0D).

If you need to issue a newline on it's own you can store the cursor position, issue a CRLF at the bottom of the screen and then restore the cursor X position. Also, under the hood, IO_NEWLINE from int 05 calls @carriage_return and @linefeed.

INT 10h print char

      LDAH 0x24        ; AH=24h: Write character at cursor (teletype)
      LDAL 0x41        ; ascii 65 'A'
      INT 0x10

The KERNAL BIOS also has functions to put characters on the screen in mode 1 (40×25 TTY). The first one is “print char”. It is accessible via INT 10h AH=24h as above.

INT 19h memdump

Let's say you want to examine memory; for example to print some data in memory. You can use INT 0x19:

  LDAH #7        ; Memory dump function
  LDCL #2        ; Two rows (16 bytes)
  INT 0x19       ; System services library
  HALT

This looks something like:

  000000: 00 00 00 00 00 00 00 00  | ........
  000008: 00 00 00 00 00 00 00 00  | ........
sd-8516_assembly_language_part_ii.1771037441.txt.gz · Last modified: by appledog

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki