The title of this post may sound like much was done, but in fact, a Kava assembler just translates human-readable ASCII Kava bytecode into a machine-code format and saves it to a binary file. However, Kava bytecode has, at least, and to some extent, been defined (the basic structure, not all the opcodes).
The assembler takes in a string of ASCII characters and maps them to a binary .klass file. This string of ASCII characters can be written by hand, in a simple text file, and is analogous to writing x86 Assembly by hand instead of compiling it from a C++ source. Usually, the string is generated by the compiler (which hasn't yet been implemented), and is not saved into a file at all. Here is an example of such string, contained in the Example.txt file which is in the repo:
Handwritten. name Example method add This is a description. meta lva 4 code ipush 1 0 ipush 2 2 iload 1 iload 2 iadd istore 3 iload 3 ireturn
Note: this is not Kava source code! It's Kava bytecode.
This bytecode is divided into two sections. The first one ends with "name Example", and is called the "class section". Here, all the information about the class will be listed, along with its name, description, and all classes it inherits from.
Next comes the method section. Each method is declared with with the "method" keyword, followed by the method's name. The meta section contains some information about the method, and the code section, obviously, the bytecode for that method.
I have compiled this class by hand from the following Kava source code (with some additions such as descriptions):
class Example
{
int add()
{
int a = 0;
int b = 2;
int result = a + b;
return result;
}
}
The structure of the .klass file is simple. It too is divided into sections, much like the bytecode itself. The file usually starts with the class description section opening code, which is 00 in hexadecimal. Then, each character in the description is placed, as one byte (ASCII code), all of which is followed by the termination code, which is FF in hexadecimal.
The name section comes next in a similar fashion, and then all of the methods as individual sections, with their own description, name, meta and code subsections. The bytecode is saved in the code subsection of the method section, and each opcode takes up one byte. The operands themselves take up either one byte themselves, or in case they are integers, 4 bytes (base 256). I'll probably save these in the data section, though, and have all the operands be one byte long.

No comments:
Post a Comment