What is a TOKEN (C++)
relay01

I'm not beyond asking for help especially when my professor is a real tool.

this is my homework assignment

( 20 points ) We discussed in class converting an infix expression to a postfix expression. In the implementation of in2post.cpp, we have used an operator stack to hold the operators and both the inputs and outputs are strings; it can only handle integers. You may generalize it by passing in the infix expression in a stack and output the postfix expression in a queue. You may define a TOKEN which can be an operator or a number as follows:

enum WhatKind { oper, number };
typedef struct {
WhatKind kind;
double num;
char op;
} TOKEN;

Your conversion function may have the form,
void in2post( stack <TOKEN> &infix, queue <TOKEN> &postqueue );

Before calling this function, you have to parse the infix expression into TOKENs and put them in the infix stack. The output is the postfix tokens saved in postqueue.

Next, write a function to evaluate the postfix expression. You may first convert the postfix queue to the post stack. Your function may look like:

double evaluate ( stack <TOKEN> &post )
{
stack eval;
double first, second, ans;
char bin_op;
TOKEN tp, te;
while ( !post.empty() ){
tp = post.top();
post.pop();
if ( tp.kind == number )
eval.push( tp );
else {
...
}
}
....
}

What's a TOKEN?

X-G

...

typedef struct {
WhatKind kind;
double num;
char op;
} TOKEN;

Indeterminatus

See here ;D

relay01

aah... i still don't get it... what are "lexemes"?

X-G

Eris on a freaking pogo stick, just use the damn wikipedia. Just try it.

relay01

:'( ok

kentl

You suck relay01! ;)

gnolam

Given the lack of code tags, I would have to agree with that statement.

nonnus29

Sheesh, I had to do that in MIPS assembler, and here you get to use a nice high level language. What are you complaining about?

???

Quote:

especially when my professor is a real tool.

Yeah, because he did half the work for you.... ::)

TeamTerradactyl

relay01,

I don't know if your question was answered or not.

I believe your professor means that TOKEN should be one of the math functions.
A * B - C / (D + E)

Each of the MULTIPLY, DIVIDE, ADD, SUBTRACT are the "TOKENS" to which your professor is referring.

You should probably have two strings: infixString and postfixString.

Basically:

1) You are given an INFIX problem (like above)
2) Parse the INFIX expression, and print out the resulting POSTFIX expression
- i) For each character that is read in from the infixString, perform the following:
- ii) If the input character is a letter, append that letter to the end of postfixString
- iii) If the input character is a TOKEN, put that TOKEN onto a stack which you will retrieve later
- iv) If the next input character is another letter, get (read: POP) the TOP element from your stack and insert it onto the end of your postfixString.
- v) You have to make sure to check the preference of the TOKENS when you remove them: *, / and % have higher precedence than + or -, so be sure to remove the correct one.
3) Output, at the very end, the POSTFIX string

The only exception to the above is if your professor's code includes parentheses (like in my example INFIX expression). If he does, once you encounter a ), you have to loop though all the items in your stack until you find the ( and push everything you find into your postfixString.

This way, the above will do this:
A * B - C / (D + E)

infixString          postfixString                  myStack
-----------          -------------                  -------
A                    A                              (empty)
*                                                   *
B                    AB*                            (empty) -- (because we removed the "*")
-                    AB*                            -
C                    AB*C                           - (not removed because the NEXT character is higher-precedence: "/")
/                    AB*C                           -/
(                    AB*C                           -/(
D                    AB*CD                          -/(
+                    AB*CD                          -/(+
E                    AB*CDE+                        -/( (removed the "+")
)                    AB*CDE+                        -/ (removed the "(")
                     AB*CDE+/                       - (still inside the FOR..LOOP for the ")" TOKEN)
                     AB*CDE+/-                      (empty)

I hope that helps somewhat with your assignment.

kentl
Quote:

Sheesh, I had to do that in MIPS assembler, and here you get to use a nice high level language. What are you complaining about?

Well if you are learning compiler theory then I would argue that it's better to stay high level. You'll get more done faster and learn more about the stuff you are supposed to study (compiler theory, not assembler).

Thomas Fjellstrom

Not if he also had to actually compile to machine code ;) Knowing the assembler would go a long way to understanding the workings of the chip/system, and what to compile the code to ;D

kentl

Yeah but technically that's not compiler theory. :)

Thomas Fjellstrom

Um, compiling code isn't compiler theory? Whod'a thunk it!

kentl

Assembler isn't compiler theory. If it was then you could say that any format is compiler theory, as it's necessary knowledge if you want to compile something into said format (PostScript, PDF, SVG, XML, HTML and so on). Compiler theory in itself are the generic methods used to turn one format into another.

Thomas Fjellstrom

I think you need to read what I said again :P

I said:

Knowing the assembler would go a long way to understanding the workings of the chip/system, and what to compile the code to ;D

That last message I was pulling your chain :P

kentl

Yeah he needs to know assembler in that case. Can't really get why he head to implement his compiler in assembler though, but I guess it could be a good exercise if you study assembler.

Thomas Fjellstrom

Maybe hes just old? ;) IIRC his day job involves a lot of COBOL. (at least I think its him..)

X-G

Writing an assembler is obviously a compiler theory thing. A compiler is a compiler no matter what it compiles. :P

Thomas Fjellstrom

I think hes saying writing it IN assembler isn't so much compiler theory, and may get in the way of teaching pure compiler theory.

X-G

Well, compiler theory is just that, theory: It's got nothing to do with whatever language you're using should you want to actually write a compiler.

kentl

Yes, that's what I'm saying Thomas. I haven't said that writing an assembler isn't a compiler theory thing, at least I don't think so? I've said that learning assembler isn't a compiler theory thing and that you would probably want to use a high level language if you want to create a compiler. As you will get more stuff done and have a greater freedom to experiment.

Johan Halmén

When I started with assembler, I was the assembler. I wrote the assembler code with pencil on paper. Then, using a table, I looked up the hexcodes and wrote them down on the same paper next to the assembler mnemonics. Then I had to count the bytes to get all absolute and relative jumps right. Then I typed in the hexcode.

Thomas Fjellstrom

I couldn't stand the thought of having to do that for my two VM projects, so I ended up writing some nice assemblers for both of them ;) first one was two pass, so you could actually reference labels that are declared after you use them :o

kentl

That's some crazy stuff Johan. :) When was that?

Sounds cool Thomas. Tell us more about them? Perhaps show some of that assembler code? :)

Thomas Fjellstrom

I must tell you, the first VM i did was one of my first real projects, back almost 6 years ago. The code has been worked on at various times over the years, so its kinda messy and not well thought out.

casminfo.txt:

1/*
2CHIP and Casm INFO.
3-
4
5Instruction Information:
6
7Ins Args Desc
8----------------------
9NOOP 0 Does nothing.
10MUL 2 8bit Multiply arg2 with arg1 (arg1 *= arg2)
11DIV 2 8bit Divide arg2 with arg1 (arg1 /= arg2)
12MOD 2 8bit Modulo arg2 with arg1 (arg1 %= arg2)
13ADD 2 8bit Add arg2 with arg1 (arg1 += arg2)
14SUB 2 8bit Subtract arg2 with arg1 (arg1 -= arg2)
15MOV 2 8bit Copy arg2 to arg1
16PUT 1 Puts a character to stdout. (non buffered)
17GET 1 Gets a character from stdin (non buffered, noecho)
18OR 2 8bit bitwise OR ( | )
19XOR 2 8bit bitwise XOR ( ^ )
20AND 2 8bit bitwise AND ( & )
21POP 1 Pops a value off the stack to arg
22POP0 0 Pops a value off the stack and forgets it (same as DEC SP)
23PUSH 1 Pushes a value to the stack
24MULL 2 32bit Multiply arg2 with arg1 (arg1 *= arg2)
25DIVL 2 32bit Divide arg2 with arg1 (arg1 /= arg2)
26MODL 2 32bit Modulo arg2 with arg1 (arg1 %= arg2)
27ADDL 2 32bit Add arg2 with arg1 (arg1 += arg2)
28SUBL 2 32bit Subtract arg2 with arg1 (arg1 -= arg2)
29INC 1 Increment arg (++arg)
30DEC 1 Deincrement arg (--arg)
31DUMP 0 Dumps the register values to stdout
32MOVL 2 32bit Copy arg2 to arg1
33JMP 1 Unconditional jump to arg
34CMP 2 Compare arg1 to arg2
35JE 1 Jump to arg if last CMP result was equal
36JNE 1 Jump to arg if last CMP result was not equal
37JL 1 Jump to arg if last CMP result was less than
38JLE 1 Jump to arg if last CMP result was less than or equal
39JG 1 Jump to arg if last CMP result was greater than
40JG 1 Jump to arg if last CMP result was greater than or equal
41LOOP 2 Loop to arg for times specified in register C
42CALL* 1 Call 'function' arg.
43RET* 1 Return from a 'function' and push arg to stack
44HLT 0 Stops CHIP.
45ORL 2 32bit bitwise OR ( | )
46XORL 2 32bit bitwise XOR ( ^ )
47ANDL 2 32bit bitwise AND ( & )
48SHR 1 Bitwise right shift ( >> )
49SHL 1 Bitwise left shift ( << )
50XCHG 2 Exchanges the values of it's two arguments
51LOR 2 Logical OR ( || )
52LAND 2 Logical AND ( && )
53INT* 1 Calls CHIP's 'BIOS' function 'arg' (returns nothing)
54CALLEX* 1 Calls CHIP's 'BIOS' function 'arg' (returns int)
55
56
57Aditional Instruction Information:
58----------------------------------
59CALL is almost like JMP except that all the registers
60are pushed on to the stack and arguments pushed onto
61the stack before CALL are available in reverse order
62using a '$' followed by a number, so arg $1 is the last
63argument that was pushed onto the stack.
64Also the CALLer is responsible for removing arguments
65it provided to the CALLed 'function'.
66
67All CALLed 'functions' must use RET to restore
68the original register values.
69After a 'function' RETurns it's return value
70is placed in the special purpose R register.
71
72INT is basically a hack to easily add instructions
73that don't need to be added directly to CHIP.
74These extra 'instructions' are added by the program
75that is using CHIP, i.e: [out|in]portb functionality
76or setting SIG* callbacks.
77
78Register information:
79---------------------
80There are 36 general registers %AA through %FF,
81one StackPointer (SP) register, and as mentioned
82above, a RETurn value register.
83You must prefix all the general registers with %,
84the SP and R register can be preixed by % but it is not required.
85The first 6 registers can be abriviated as %A through %F
86
87All registers can be preceded by MEM or ST (including SP and R :),
88which refrences the value in the register to
89a memory address or stack address in that order.
90(you get the value at the memory/stack addr)
91
92Also all general registers can be signed or unsigned and
93bisected into thier individual 'words' (16 bits un/signed)
94and those can be bisected into thier individual bytes
95(8 bits un/signed).
96
97The format for registers is:
98%AA (short form for signed long int register 0, or just %A)
99%AAUW1 (for unsigned high word of register 0, or just %AUW1)
100%AAUW0 (for same as above but low word.)
101%AAUB0 (for unsigned 'first' byte of the register)
102etc...
103Replace the 'U' with a 'S' for the signed variants.
104Also you can use an 'I' to use the register as
105a 32bit wide value, but it is not required as that
106is the default.
107
108Instruction Parameters:
109-----------------------
110Generally registers, memory addresses, '$' args, and
111literal integers can be used as a parameter to any
112instruction, with these exceptions:
113 1. A parameter can not be a literal integer if that parameter
114 gets modified by the instruction.
115 2. The second parameter for DIV DIVL MOD and MODL can not be
116 less than 1. (impossible when using an 'unsigned' register)
117
118All parameters can be preceded by MEM or ST,
119which refrences the value to a memory address
120or stack address in that order.
121(you get the value at the memory/stack addr)
122
123Additional Casm information:
124----------------------------
125The Chip Assembler is Case insesitive, so
126ADD is the same as AdD.
127Also the CHIP Assembler includes lables (case sensitive)
128which are Identical to other assembler's labels, and
129the psuedo instruction DB which stores the folowing string,
130and DW which stores the following integer at the current
131position in the compiled CHIP machine code.
132It is recomended that you precede the DB and DW psuedo
133ops with a lable so the address of the data is
134easily avaliable.
135*/

The actual code for the assembler is attached (comp.c), I assume it won't fi tin a code box, even with much of the unnesesary bits pulled out.

Also attached (parse.c) is the assembler for my newer script vm.

edit:

Heres an example asm file for the parse.c assembler:

1set rand.low 1
2 
3# set starting 'x' pos of thing
4set rand.seed screen.width
5set rand.high screen.width
6set thing.x rand.ival
7 
8# set starting 'y' pos of thing
9set rand.seed screen.height
10set rand.high screen.height
11set thing.y rand.ival
12 
13set rand.seed time.abs
14 
15# start loop
16label loop
17 
18# angle
19set thing.ax mouse.x
20sub thing.ax thing.x
21abs thing.ax thing.ax
22 
23set thing.ay mouse.y
24sub thing.ay thing.y
25abs thing.ay thing.ay
26 
27set thing.c thing.ax
28pow thing.c 2
29 
30set thing.d thing.ay
31pow thing.d 2
32 
33 
34 
35div thing.ax thing.ay
36 
37cot thing.angle thing.ax
38 
39# deltas
40 
41cos thing.dx thing.angle
42sin thing.dy thing.angle
43 
44# new position
45 
46sub thing.x thing.dx
47add thing.y thing.dy
48 
49jmp loop
50# Dude!

This one utilized the allegro binding I made for it. screen is allegro's screen, and "thing" is an object that the program used to draw a box or pixle on screen.

And heres an example for he comp.c assembler:

1include "lib.inc"
2 
3ENTRY
4 push 33
5 call fib
6 put '\n'
7 pop0
8 push %R
9 call itoa
10 pop0
11 push itoa_str
12 call print
13 pop0
14 put '\n'
15 hlt
16 
17fib:
18// put '.'
19 cmp $1, 2
20 jle fibend
21 
22 sub $1, 1
23 push $1
24 call fib
25 mov %A, %R
26 pop0
27 
28 sub $1, 1
29 push $1
30 call fib
31 mov %B, %R
32 pop0
33 
34 add %A, %B
35 ret %A
36 
37fibend:
38 ret 1

edit2, and here's the lib.inc you see from that last example:

1FUNC getstr:
2MOV %A, getstr_str
3getstr_loop:
4GET %B
5PUT %B
6CMP %B, 4
7JE exit_getstr
8CMP %B, '\n'
9JE exit_getstr
10MOV MEM %A, %B
11INC %A
12JMP getstr_loop
13 
14exit_getstr:
15MOV MEM %A, 0
16RET 1
17 
18FUNC strcpy:
19MOV %A, $1
20MOV %B, $2
21strcpy_loop:
22CMP MEM %A, 0
23JLE strcpy_exit
24MOV MEM %B, MEM %A
25INC %A
26INC %B
27JMP strcpy_loop
28strcpy_exit:
29MOV MEM %B, 0
30RET 1
31 
32FUNC print:
33MOV %F, $1
34print_loop:
35CMP MEM %F, 0
36JLE quit_print
37PUT MEM %F
38INC %F
39JMP print_loop
40quit_print:
41RET 1
42 
43FUNC itoa:
44MOV %F, $1
45MOV %D, 0
46MOV %B, 0
47MOV %A, itoa_tmp
48 
49itoa_loop:
50MOV %C, %F
51MOD %C, 10
52MOV MEM %A, %C
53 
54INC %A
55INC %D
56DIV %F, 10
57 
58CMP %F, 0
59JLE get_out_itoaloop
60CMP %D, 11
61JLE itoa_loop
62 
63get_out_itoaloop:
64MOV %B, itoa_str
65itoa_loop2:
66DEC %A
67DEC %D
68MOV %C, MEM %A
69ADD %C, '0'
70MOV MEM %B, %C
71 
72INC %B
73CMP %D, 0
74JG itoa_loop2
75exit_itoa:
76MOV MEM %B, 0
77RET itoa_str
78 
79FUNC strlen:
80MOV %F, $1
81MOV %A, 0
82strlen_loop:
83CMP MEM %F, 0
84JLE strlen_exit
85INC %F
86INC %A
87JMP strlen_loop
88 
89strlen_exit:
90RET %A
91 
92FUNC atoi:
93MOV %F, $1
94MOV %A, 0
95MOV %B, 0
96MOV %C, 0
97MOV %D, 1
98 
99MOV %B, $1
100 
101atoi_loop1:
102MOV %AESB0, MEM %F
103 
104// if end of string, exit
105CMP %AESB0, 0
106JG lt9
107JMP exit_atoi_loop1
108 
109// if the current charater is greater than '9', exit
110lt9:
111CMP %AESB0, '9'
112JLE gt0
113JMP exit_atoi_loop1
114 
115// if the current character is less than '0', exit
116gt0:
117CMP %AESB0, '0'
118JGE go_on
119JMP exit_atoi_loop1
120 
121go_on:
122INC %A
123INC %F
124JMP atoi_loop1
125 
126exit_atoi_loop1:
127 
128DEC %A
129CMP %A, 0
130JL exit_atoi
131MOV %F, %B
132ADD %F, %A
133 
134atoi_loop:
135 
136CMP %A, 0
137JL exit_atoi
138 
139MOV %ABSB0, MEM %F
140SUB %ABSB0, '0'
141MUL %B, %D
142MUL %D, 10
143ADD %C, %B
144MOV %B, 0
145DEC %F
146DEC %A
147JMP atoi_loop
148 
149exit_atoi:
150RET %C
151 
152FUNC fopen:
153PUSH $1
154PUSH $2
155 
156RET %A
157 
158itoa_tmp:
159DB " "
160 
161itoa_str:
162DB " "
163 
164getstr_str:
165DB " "

Yup, a mini libc :o omg.

kentl

Yeah OMG is a good expression. :) I briefly checked out the source code, but around 2100 lines is a bit to much for a casual viewing. Impressive work.

X-G

Needs a CRZ opcode.

Thomas Fjellstrom

Thanks :) I was always a little proud of my little CHIP project. Even though its pretty dead now. I ended up writing two styles of execution loop for it, one with a while+switch, and a "threaded" aka "computed goto" "loop". The computed goto code ran around 10x faster than the plain while+switch, back in the day, on my Athlon 900, or it could have been my 450 p3 (I cant recall now), I got upwards of 10 MIPS out of this VM. It could be much better if I didn't have all that "smart" register and fancy arg handling, and make 60x more actual instructions, but I didn't want to go there without making some "generator" for all the extra instruction types, which I never finished.

The second vm I made is pretty cool. It has psuedo OO support. you can set a "foo.bar" to a function: set "foo.bar" "somefunction" call it using the "extended" function calling support, as: foo.bar; and the assembler will do call "somefunction" "foo". And the "second pass" will attempt to turn all string references (which are looked up in a symtab hash at runtime normally), into direct symbol references (pointers to the stuff stored in the symtab, instead of a string which may need to be parsed, ie: "foo.bar.baz" requires the engine to split it at the dots, and then do a look up for baz in bar, and bar in foo, and foo in the global symtab...), which speeds up the engine by N times. (gotta be more than 10 but I can't recall). Oh yeah, the "foo.bar" syntax means look up the "bar" property in the "foo" object, and "foo.bar.baz" means, look up the baz property in the (possibly anon) object referenced in bar, and lookup the bar property in the foo object. I don't think i could have made it more complex >:)

edit: X-G, excuse my ignorance, but what is a CRZ opcode? I've not heard of that one. A couple google searches didn't seem to help either.

X-G

So it has first-class functions?

Thomas Fjellstrom

The newer of the two does yes.

function doit
   set doit system.stack.pop
   print doit.value
ret

nonnus29

Nice work Moose. CRZ op rules.

Quote:

Yeah he needs to know assembler in that case. Can't really get why he head to implement his compiler in assembler though, but I guess it could be a good exercise if you study assembler.

I don't think the OP was posting about a compilation oriented assignment. From what I've seen here in the states, infix to postix is a standard exercise students do when studying stacks etc... You certainly wouldn't use that method in a real compiler, you'd generate assembler from the AST.

And you don't have to generate code at all. For my little Basic thingy I'm executing from the tree.

Ciro Duran

How much compilers would assembler compile, if assembler would compile compilers? :-P

(Ducks)

kentl
Quote:

I don't think the OP was posting about a compilation oriented assignment. From what I've seen here in the states, infix to postix is a standard exercise students do when studying stacks etc... You certainly wouldn't use that method in a real compiler, you'd generate assembler from the AST.

You're probably right about the goal of the assignment. I didn't check out the code he posted. Infix to postfix is a common excercise here as well when studying stacks.

Quote:

And you don't have to generate code at all. For my little Basic thingy I'm executing from the tree.

Yeah. How is it going with that?

Thomas Fjellstrom

IMO, compiling to some slim bytecode will always be more efficient runtime wise than traversing a tree. Not that its necessarily needed in all cases, but I like to brag about irrelevant things like MIPS, FLOPS and MHZ.

kentl

It sounds sensible to me at least. Perhaps there is also a bit more flexibility in there. As you can layer things so that you can write a new backend for some other type of format (like Java Byte Code or something).

On the other hand executing an interpreter directly from the tree is probably easier to implement.

Thomas Fjellstrom

I always had this mental block with doing a direct execute interpreter. Wasn't ever able to fully grasp how to make one when I was still learning about VMs and such.

nonnus29

It's coming along slowly. The method I'm using is pretty simple when your just passing ints around, but when you starting adding ints OR floats, it doesn't work:

1//given a tree like:
2 
3 +
4 / \
5 5 *
6 / \
7 3 9
8 
9//this is how you execute from the tree, it's just a
10//depth first traversal:
11 
12int execNode(Node* n) {
13 switch(n->type) {
14 case '+':
15 return (execNode(n->left) + execNode(n->right));
16 case '-':
17 return (execNode(n->left) - execNode(n->right));
18 case '*':
19 return (execNode(n->left) * execNode(n->right));
20 case '/':
21 return (execNode(n->left) / execNode(n->right));
22 case INTCONST:
23 return n->int_value;
24 }
25}

As you can see the exec function just returns an int. Now I'm looking into using a runtime store of some sort to pass values around.

But it's two different issues; compiling and then what you do with it. This is a question of the execution environment. And I'm beginning to see why emitting bytecode for a virtual machines is such a good idea.

Thread #588646. Printed from Allegro.cc