What is a TOKEN (C++)

What is a TOKEN (C++)

Thomas Fjellstrom

Member #476

June 2000

I must tell you, the first VM i did was one of my first real projects, back almost 6 years ago. The code has been worked on at various times over the years, so its kinda messy and not well thought out.

casminfo.txt:

1/*
2CHIP and Casm INFO.
3-
4
5Instruction Information:
6
7Ins    Args       Desc
8----------------------
9NOOP   0          Does nothing.
10MUL    2          8bit Multiply arg2 with arg1 (arg1 *= arg2)
11DIV    2          8bit Divide arg2 with arg1   (arg1 /= arg2)
12MOD    2          8bit Modulo arg2 with arg1   (arg1 %= arg2)
13ADD    2          8bit Add arg2 with arg1      (arg1 += arg2)
14SUB    2          8bit Subtract arg2 with arg1 (arg1 -= arg2)
15MOV    2          8bit Copy arg2 to arg1
16PUT    1          Puts a character to stdout. (non buffered)
17GET    1          Gets a character from stdin (non buffered, noecho)
18OR     2          8bit bitwise OR ( | )
19XOR    2          8bit bitwise XOR ( ^ )
20AND    2          8bit bitwise AND ( & )
21POP    1          Pops a value off the stack to arg
22POP0   0          Pops a value off the stack and forgets it (same as DEC SP)
23PUSH   1          Pushes a value to the stack
24MULL   2          32bit Multiply arg2 with arg1 (arg1 *= arg2)
25DIVL   2          32bit Divide arg2 with arg1   (arg1 /= arg2)
26MODL   2          32bit Modulo arg2 with arg1   (arg1 %= arg2)
27ADDL   2          32bit Add arg2 with arg1      (arg1 += arg2)
28SUBL   2          32bit Subtract arg2 with arg1 (arg1 -= arg2)
29INC    1          Increment arg (++arg)
30DEC    1          Deincrement arg (--arg)
31DUMP   0          Dumps the register values to stdout
32MOVL   2          32bit Copy arg2 to arg1
33JMP    1          Unconditional jump to arg
34CMP    2          Compare arg1 to arg2
35JE     1          Jump to arg if last CMP result was equal
36JNE    1          Jump to arg if last CMP result was not equal
37JL     1          Jump to arg if last CMP result was less than
38JLE    1          Jump to arg if last CMP result was less than or equal
39JG     1          Jump to arg if last CMP result was greater than
40JG     1          Jump to arg if last CMP result was greater than or equal
41LOOP   2          Loop to arg for times specified in register C
42CALL*  1          Call 'function' arg.
43RET*   1          Return from a 'function' and push arg to stack
44HLT    0          Stops CHIP.
45ORL    2          32bit bitwise OR ( | )
46XORL   2          32bit bitwise XOR ( ^ )
47ANDL   2          32bit bitwise AND ( & )
48SHR    1          Bitwise right shift ( >> )
49SHL    1          Bitwise left shift ( << )
50XCHG   2          Exchanges the values of it's two arguments
51LOR    2          Logical OR  ( || )
52LAND   2          Logical AND ( && )
53INT*   1          Calls CHIP's 'BIOS' function 'arg' (returns nothing)
54CALLEX* 1         Calls CHIP's 'BIOS' function 'arg' (returns int)
55
56
57Aditional Instruction Information:
58----------------------------------
59CALL is almost like JMP except that all the registers
60are pushed on to the stack and arguments pushed onto
61the stack before CALL are available in reverse order
62using a '$' followed by a number, so arg $1 is the last
63argument that was pushed onto the stack.
64Also the CALLer is responsible for removing arguments
65it provided to the CALLed 'function'.
66
67All CALLed 'functions' must use RET to restore
68the original register values.
69After a 'function' RETurns it's return value
70is placed in the special purpose R register.
71
72INT is basically a hack to easily add instructions
73that don't need to be added directly to CHIP.
74These extra 'instructions' are added by the program
75that is using CHIP, i.e: [out|in]portb functionality
76or setting SIG* callbacks.
77
78Register information:
79---------------------
80There are 36 general registers %AA through %FF,
81one StackPointer (SP) register, and as mentioned
82above, a RETurn value register.
83You must prefix all the general registers with %,
84the SP and R register can be preixed by % but it is not required.
85The first 6 registers can be abriviated as %A through %F
86
87All registers can be preceded by MEM or ST (including SP and R :),
88which refrences the value in the register to
89a memory address or stack address in that order.
90(you get the value at the memory/stack addr)
91
92Also all general registers can be signed or unsigned and
93bisected into thier individual 'words' (16 bits un/signed)
94and those can be bisected into thier individual bytes
95(8 bits un/signed).
96
97The format for registers is:
98%AA (short form for signed long int register 0, or just %A)
99%AAUW1 (for unsigned high word of register 0, or just %AUW1)
100%AAUW0 (for same as above but low word.)
101%AAUB0 (for unsigned 'first' byte of the register)
102etc...
103Replace the 'U' with a 'S' for the signed variants.
104Also you can use an 'I' to use the register as
105a 32bit wide value, but it is not required as that
106is the default.
107
108Instruction Parameters:
109-----------------------
110Generally registers, memory addresses, '$' args, and
111literal integers can be used as a parameter to any
112instruction, with these exceptions:
113 1. A parameter can not be a literal integer if that parameter
114    gets modified by the instruction.
115 2. The second parameter for DIV DIVL MOD and MODL can not be
116    less than 1. (impossible when using an 'unsigned' register)
117
118All parameters can be preceded by MEM or ST,
119which refrences the value to a memory address
120or stack address in that order.
121(you get the value at the memory/stack addr)
122
123Additional Casm information:
124----------------------------
125The Chip Assembler is Case insesitive, so
126ADD is the same as AdD.
127Also the CHIP Assembler includes lables (case sensitive)
128which are Identical to other assembler's labels, and
129the psuedo instruction DB which stores the folowing string,
130and DW which stores the following integer at the current
131position in the compiled CHIP machine code.
132It is recomended that you precede the DB and DW psuedo
133ops with a lable so the address of the data is
134easily avaliable.
135*/

The actual code for the assembler is attached (comp.c), I assume it won't fi tin a code box, even with much of the unnesesary bits pulled out.

Also attached (parse.c) is the assembler for my newer script vm.

edit:

Heres an example asm file for the parse.c assembler:

1set rand.low 1
2 
3# set starting 'x' pos of thing
4set rand.seed screen.width
5set rand.high screen.width
6set thing.x rand.ival
7 
8# set starting 'y' pos of thing
9set rand.seed screen.height
10set rand.high screen.height
11set thing.y rand.ival
12 
13set rand.seed time.abs
14 
15# start loop
16label loop
17 
18# angle
19set thing.ax mouse.x
20sub thing.ax thing.x
21abs thing.ax thing.ax
22 
23set thing.ay mouse.y
24sub thing.ay thing.y
25abs thing.ay thing.ay
26 
27set thing.c thing.ax
28pow thing.c 2
29 
30set thing.d thing.ay
31pow thing.d 2
32 
33 
34 
35div thing.ax thing.ay
36 
37cot thing.angle thing.ax
38 
39# deltas
40 
41cos thing.dx thing.angle
42sin thing.dy thing.angle
43 
44# new position
45 
46sub thing.x thing.dx
47add thing.y thing.dy
48 
49jmp loop
50# Dude!

This one utilized the allegro binding I made for it. screen is allegro's screen, and "thing" is an object that the program used to draw a box or pixle on screen.

And heres an example for he comp.c assembler:

1include "lib.inc"
2 
3ENTRY
 push 33
 call fib
 put '\n'
 pop0
 push %R
 call itoa
 pop0
 push itoa_str
 call print
 pop0
 put '\n'
 hlt
16 
17fib:
18// put '.'
 cmp $1, 2
 jle fibend
21 
 sub $1, 1
 push $1
 call fib
 mov %A, %R
 pop0
27 
 sub $1, 1
 push $1
 call fib
 mov %B, %R
 pop0
33 
 add %A, %B
 ret %A
36 
37fibend:
 ret 1

edit2, and here's the lib.inc you see from that last example:

1FUNC getstr:
2MOV %A, getstr_str
3getstr_loop:
4GET %B
5PUT %B
6CMP %B, 4
7JE exit_getstr
8CMP %B, '\n'
9JE exit_getstr
10MOV MEM %A, %B
11INC %A
12JMP getstr_loop
13 
14exit_getstr:
15MOV MEM %A, 0
16RET 1
17 
18FUNC strcpy:
19MOV %A, $1
20MOV %B, $2
21strcpy_loop:
22CMP MEM %A, 0
23JLE strcpy_exit
24MOV MEM %B, MEM %A
25INC %A
26INC %B
27JMP strcpy_loop
28strcpy_exit:
29MOV MEM %B, 0
30RET 1
31 
32FUNC print:
33MOV %F, $1
34print_loop:
35CMP MEM %F, 0
36JLE quit_print
37PUT MEM %F
38INC %F
39JMP print_loop
40quit_print:
41RET 1
42 
43FUNC itoa:
44MOV %F, $1
45MOV %D, 0
46MOV %B, 0
47MOV %A, itoa_tmp
48 
49itoa_loop:
50MOV %C, %F
51MOD %C, 10
52MOV MEM %A, %C
53 
54INC %A
55INC %D
56DIV %F, 10
57 
58CMP %F, 0
59JLE get_out_itoaloop
60CMP %D, 11
61JLE itoa_loop
62 
63get_out_itoaloop:
64MOV %B, itoa_str
65itoa_loop2:
66DEC %A
67DEC %D
68MOV %C, MEM %A
69ADD %C, '0'
70MOV MEM %B, %C
71 
72INC %B
73CMP %D, 0
74JG itoa_loop2
75exit_itoa:
76MOV MEM %B, 0
77RET itoa_str
78 
79FUNC strlen:
80MOV %F, $1
81MOV %A, 0
82strlen_loop:
83CMP MEM %F, 0
84JLE strlen_exit
85INC %F
86INC %A
87JMP strlen_loop
88 
89strlen_exit:
90RET %A
91 
92FUNC atoi:
93MOV %F, $1
94MOV %A, 0
95MOV %B, 0
96MOV %C, 0
97MOV %D, 1
98 
99MOV %B, $1
100 
101atoi_loop1:
102MOV %AESB0, MEM %F
103 
104// if end of string, exit
105CMP %AESB0, 0
106JG lt9
107JMP exit_atoi_loop1
108 
109// if the current charater is greater than '9', exit
110lt9:
111CMP %AESB0, '9'
112JLE gt0
113JMP exit_atoi_loop1
114 
115// if the current character is less than '0', exit
116gt0:
117CMP %AESB0, '0'
118JGE go_on
119JMP exit_atoi_loop1
120 
121go_on:
122INC %A
123INC %F
124JMP atoi_loop1
125 
126exit_atoi_loop1:
127 
128DEC %A
129CMP %A, 0
130JL exit_atoi
131MOV %F, %B
132ADD %F, %A
133 
134atoi_loop:
135 
136CMP %A, 0
137JL exit_atoi
138 
139MOV %ABSB0, MEM %F
140SUB %ABSB0, '0'
141MUL %B, %D
142MUL %D, 10
143ADD %C, %B
144MOV %B, 0
145DEC %F
146DEC %A
147JMP atoi_loop
148 
149exit_atoi:
150RET %C
151 
152FUNC fopen:
153PUSH $1
154PUSH $2
155 
156RET %A
157 
158itoa_tmp:
159DB "             "
160 
161itoa_str:
162DB "                       "
163 
164getstr_str:
165DB "                                                                                          "

Yup, a mini libc omg.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

kentl

Member #2,905

November 2002

Yeah OMG is a good expression. I briefly checked out the source code, but around 2100 lines is a bit to much for a casual viewing. Impressive work.

X-G

Member #856

December 2000

Needs a CRZ opcode.

--
Since 2008-Jun-18, democracy in Sweden is dead. | 悪霊退散！悪霊退散！怨霊、物の怪、困った時は　ドーマン！セーマン！ドーマン！セーマン！　直ぐに呼びましょう陰陽師レッツゴー！

Thomas Fjellstrom

Member #476

June 2000

Thanks I was always a little proud of my little CHIP project. Even though its pretty dead now. I ended up writing two styles of execution loop for it, one with a while+switch, and a "threaded" aka "computed goto" "loop". The computed goto code ran around 10x faster than the plain while+switch, back in the day, on my Athlon 900, or it could have been my 450 p3 (I cant recall now), I got upwards of 10 MIPS out of this VM. It could be much better if I didn't have all that "smart" register and fancy arg handling, and make 60x more actual instructions, but I didn't want to go there without making some "generator" for all the extra instruction types, which I never finished.

The second vm I made is pretty cool. It has psuedo OO support. you can set a "foo.bar" to a function: set "foo.bar" "somefunction" call it using the "extended" function calling support, as: foo.bar; and the assembler will do call "somefunction" "foo". And the "second pass" will attempt to turn all string references (which are looked up in a symtab hash at runtime normally), into direct symbol references (pointers to the stuff stored in the symtab, instead of a string which may need to be parsed, ie: "foo.bar.baz" requires the engine to split it at the dots, and then do a look up for baz in bar, and bar in foo, and foo in the global symtab...), which speeds up the engine by N times. (gotta be more than 10 but I can't recall). Oh yeah, the "foo.bar" syntax means look up the "bar" property in the "foo" object, and "foo.bar.baz" means, look up the baz property in the (possibly anon) object referenced in bar, and lookup the bar property in the foo object. I don't think i could have made it more complex >:)

edit: X-G, excuse my ignorance, but what is a CRZ opcode? I've not heard of that one. A couple google searches didn't seem to help either.

X-G

Member #856

December 2000

So it has first-class functions?

Thomas Fjellstrom

Member #476

June 2000

The newer of the two does yes.

function doit
   set doit system.stack.pop
   print doit.value
ret

nonnus29

Member #2,606

August 2002

Nice work Moose. CRZ op rules.

Quote:

Yeah he needs to know assembler in that case. Can't really get why he head to implement his compiler in assembler though, but I guess it could be a good exercise if you study assembler.

I don't think the OP was posting about a compilation oriented assignment. From what I've seen here in the states, infix to postix is a standard exercise students do when studying stacks etc... You certainly wouldn't use that method in a real compiler, you'd generate assembler from the AST.

And you don't have to generate code at all. For my little Basic thingy I'm executing from the tree.

link to my blog

Ciro Duran

Member #3,011

December 2002

How much compilers would assembler compile, if assembler would compile compilers? :-P

(Ducks)

---
El Ciro

In the beginning, God said "light_source { <0, 0, 0> color White }" and light was created.

kentl

Member #2,905

November 2002

Quote:

You're probably right about the goal of the assignment. I didn't check out the code he posted. Infix to postfix is a common excercise here as well when studying stacks.

Quote:

And you don't have to generate code at all. For my little Basic thingy I'm executing from the tree.

Yeah. How is it going with that?

Thomas Fjellstrom

Member #476

June 2000

IMO, compiling to some slim bytecode will always be more efficient runtime wise than traversing a tree. Not that its necessarily needed in all cases, but I like to brag about irrelevant things like MIPS, FLOPS and MHZ.

kentl

Member #2,905

November 2002

It sounds sensible to me at least. Perhaps there is also a bit more flexibility in there. As you can layer things so that you can write a new backend for some other type of format (like Java Byte Code or something).

On the other hand executing an interpreter directly from the tree is probably easier to implement.

Thomas Fjellstrom

Member #476

June 2000

I always had this mental block with doing a direct execute interpreter. Wasn't ever able to fully grasp how to make one when I was still learning about VMs and such.

nonnus29

Member #2,606

August 2002

It's coming along slowly. The method I'm using is pretty simple when your just passing ints around, but when you starting adding ints OR floats, it doesn't work:

1//given a tree like:
2 
3         +
4       /   \
5      5     *
6          /   \
7         3     9
8 
9//this is how you execute from the tree, it's just a
10//depth first traversal:
11 
12int execNode(Node* n) {
13    switch(n->type) {
14        case '+':
15            return (execNode(n->left) + execNode(n->right));
16        case '-':
17            return (execNode(n->left) - execNode(n->right));
18        case '*':
19            return (execNode(n->left) * execNode(n->right));
20        case '/':
21            return (execNode(n->left) / execNode(n->right));
22        case INTCONST:
23            return n->int_value;
24    }
25}

As you can see the exec function just returns an int. Now I'm looking into using a runtime store of some sort to pass values around.

But it's two different issues; compiling and then what you do with it. This is a question of the execution environment. And I'm beginning to see why emitting bytecode for a virtual machines is such a good idea.

link to my blog