I'm not beyond asking for help especially when my professor is a real tool.
this is my homework assignment
( 20 points ) We discussed in class converting an infix expression to a postfix expression. In the implementation of in2post.cpp, we have used an operator stack to hold the operators and both the inputs and outputs are strings; it can only handle integers. You may generalize it by passing in the infix expression in a stack and output the postfix expression in a queue. You may define a TOKEN which can be an operator or a number as follows:
enum WhatKind { oper, number };
typedef struct {
WhatKind kind;
double num;
char op;
} TOKEN;
Your conversion function may have the form,
void in2post( stack <TOKEN> &infix, queue <TOKEN> &postqueue );
Before calling this function, you have to parse the infix expression into TOKENs and put them in the infix stack. The output is the postfix tokens saved in postqueue.
Next, write a function to evaluate the postfix expression. You may first convert the postfix queue to the post stack. Your function may look like:
double evaluate ( stack <TOKEN> &post )
{
stack eval;
double first, second, ans;
char bin_op;
TOKEN tp, te;
while ( !post.empty() ){
tp = post.top();
post.pop();
if ( tp.kind == number )
eval.push( tp );
else {
...
}
}
....
}
What's a TOKEN?
...
typedef struct { WhatKind kind; double num; char op; } TOKEN;
See here
aah... i still don't get it... what are "lexemes"?
Eris on a freaking pogo stick, just use the damn wikipedia. Just try it.
ok
You suck relay01!
Given the lack of code tags, I would have to agree with that statement.
Sheesh, I had to do that in MIPS assembler, and here you get to use a nice high level language. What are you complaining about?

especially when my professor is a real tool.
Yeah, because he did half the work for you....
relay01,
I don't know if your question was answered or not.
I believe your professor means that TOKEN should be one of the math functions.
A * B - C / (D + E)
Each of the MULTIPLY, DIVIDE, ADD, SUBTRACT are the "TOKENS" to which your professor is referring.
You should probably have two strings: infixString and postfixString.
Basically:
1) You are given an INFIX problem (like above)
2) Parse the INFIX expression, and print out the resulting POSTFIX expression
- i) For each character that is read in from the infixString, perform the following:
- ii) If the input character is a letter, append that letter to the end of postfixString
- iii) If the input character is a TOKEN, put that TOKEN onto a stack which you will retrieve later
- iv) If the next input character is another letter, get (read: POP) the TOP element from your stack and insert it onto the end of your postfixString.
- v) You have to make sure to check the preference of the TOKENS when you remove them: *, / and % have higher precedence than + or -, so be sure to remove the correct one.
3) Output, at the very end, the POSTFIX string
The only exception to the above is if your professor's code includes parentheses (like in my example INFIX expression). If he does, once you encounter a ), you have to loop though all the items in your stack until you find the ( and push everything you find into your postfixString.
This way, the above will do this:
A * B - C / (D + E)
infixString postfixString myStack
----------- ------------- -------
A A (empty)
* *
B AB* (empty) -- (because we removed the "*")
- AB* -
C AB*C - (not removed because the NEXT character is higher-precedence: "/")
/ AB*C -/
( AB*C -/(
D AB*CD -/(
+ AB*CD -/(+
E AB*CDE+ -/( (removed the "+")
) AB*CDE+ -/ (removed the "(")
AB*CDE+/ - (still inside the FOR..LOOP for the ")" TOKEN)
AB*CDE+/- (empty)
I hope that helps somewhat with your assignment.
Sheesh, I had to do that in MIPS assembler, and here you get to use a nice high level language. What are you complaining about?
Well if you are learning compiler theory then I would argue that it's better to stay high level. You'll get more done faster and learn more about the stuff you are supposed to study (compiler theory, not assembler).
Not if he also had to actually compile to machine code
Knowing the assembler would go a long way to understanding the workings of the chip/system, and what to compile the code to
Yeah but technically that's not compiler theory.
Um, compiling code isn't compiler theory? Whod'a thunk it!
Assembler isn't compiler theory. If it was then you could say that any format is compiler theory, as it's necessary knowledge if you want to compile something into said format (PostScript, PDF, SVG, XML, HTML and so on). Compiler theory in itself are the generic methods used to turn one format into another.
I think you need to read what I said again
Knowing the assembler would go a long way to understanding the workings of the chip/system, and what to compile the code to
That last message I was pulling your chain
Yeah he needs to know assembler in that case. Can't really get why he head to implement his compiler in assembler though, but I guess it could be a good exercise if you study assembler.
Maybe hes just old?
IIRC his day job involves a lot of COBOL. (at least I think its him..)
Writing an assembler is obviously a compiler theory thing. A compiler is a compiler no matter what it compiles.
I think hes saying writing it IN assembler isn't so much compiler theory, and may get in the way of teaching pure compiler theory.
Well, compiler theory is just that, theory: It's got nothing to do with whatever language you're using should you want to actually write a compiler.
Yes, that's what I'm saying Thomas. I haven't said that writing an assembler isn't a compiler theory thing, at least I don't think so? I've said that learning assembler isn't a compiler theory thing and that you would probably want to use a high level language if you want to create a compiler. As you will get more stuff done and have a greater freedom to experiment.
When I started with assembler, I was the assembler. I wrote the assembler code with pencil on paper. Then, using a table, I looked up the hexcodes and wrote them down on the same paper next to the assembler mnemonics. Then I had to count the bytes to get all absolute and relative jumps right. Then I typed in the hexcode.
I couldn't stand the thought of having to do that for my two VM projects, so I ended up writing some nice assemblers for both of them
first one was two pass, so you could actually reference labels that are declared after you use them
That's some crazy stuff Johan.
When was that?
Sounds cool Thomas. Tell us more about them? Perhaps show some of that assembler code?
I must tell you, the first VM i did was one of my first real projects, back almost 6 years ago. The code has been worked on at various times over the years, so its kinda messy and not well thought out.
casminfo.txt:
| 1 | /* |
| 2 | CHIP and Casm INFO. |
| 3 | - |
| 4 | |
| 5 | Instruction Information: |
| 6 | |
| 7 | Ins Args Desc |
| 8 | ---------------------- |
| 9 | NOOP 0 Does nothing. |
| 10 | MUL 2 8bit Multiply arg2 with arg1 (arg1 *= arg2) |
| 11 | DIV 2 8bit Divide arg2 with arg1 (arg1 /= arg2) |
| 12 | MOD 2 8bit Modulo arg2 with arg1 (arg1 %= arg2) |
| 13 | ADD 2 8bit Add arg2 with arg1 (arg1 += arg2) |
| 14 | SUB 2 8bit Subtract arg2 with arg1 (arg1 -= arg2) |
| 15 | MOV 2 8bit Copy arg2 to arg1 |
| 16 | PUT 1 Puts a character to stdout. (non buffered) |
| 17 | GET 1 Gets a character from stdin (non buffered, noecho) |
| 18 | OR 2 8bit bitwise OR ( | ) |
| 19 | XOR 2 8bit bitwise XOR ( ^ ) |
| 20 | AND 2 8bit bitwise AND ( & ) |
| 21 | POP 1 Pops a value off the stack to arg |
| 22 | POP0 0 Pops a value off the stack and forgets it (same as DEC SP) |
| 23 | PUSH 1 Pushes a value to the stack |
| 24 | MULL 2 32bit Multiply arg2 with arg1 (arg1 *= arg2) |
| 25 | DIVL 2 32bit Divide arg2 with arg1 (arg1 /= arg2) |
| 26 | MODL 2 32bit Modulo arg2 with arg1 (arg1 %= arg2) |
| 27 | ADDL 2 32bit Add arg2 with arg1 (arg1 += arg2) |
| 28 | SUBL 2 32bit Subtract arg2 with arg1 (arg1 -= arg2) |
| 29 | INC 1 Increment arg (++arg) |
| 30 | DEC 1 Deincrement arg (--arg) |
| 31 | DUMP 0 Dumps the register values to stdout |
| 32 | MOVL 2 32bit Copy arg2 to arg1 |
| 33 | JMP 1 Unconditional jump to arg |
| 34 | CMP 2 Compare arg1 to arg2 |
| 35 | JE 1 Jump to arg if last CMP result was equal |
| 36 | JNE 1 Jump to arg if last CMP result was not equal |
| 37 | JL 1 Jump to arg if last CMP result was less than |
| 38 | JLE 1 Jump to arg if last CMP result was less than or equal |
| 39 | JG 1 Jump to arg if last CMP result was greater than |
| 40 | JG 1 Jump to arg if last CMP result was greater than or equal |
| 41 | LOOP 2 Loop to arg for times specified in register C |
| 42 | CALL* 1 Call 'function' arg. |
| 43 | RET* 1 Return from a 'function' and push arg to stack |
| 44 | HLT 0 Stops CHIP. |
| 45 | ORL 2 32bit bitwise OR ( | ) |
| 46 | XORL 2 32bit bitwise XOR ( ^ ) |
| 47 | ANDL 2 32bit bitwise AND ( & ) |
| 48 | SHR 1 Bitwise right shift ( >> ) |
| 49 | SHL 1 Bitwise left shift ( << ) |
| 50 | XCHG 2 Exchanges the values of it's two arguments |
| 51 | LOR 2 Logical OR ( || ) |
| 52 | LAND 2 Logical AND ( && ) |
| 53 | INT* 1 Calls CHIP's 'BIOS' function 'arg' (returns nothing) |
| 54 | CALLEX* 1 Calls CHIP's 'BIOS' function 'arg' (returns int) |
| 55 | |
| 56 | |
| 57 | Aditional Instruction Information: |
| 58 | ---------------------------------- |
| 59 | CALL is almost like JMP except that all the registers |
| 60 | are pushed on to the stack and arguments pushed onto |
| 61 | the stack before CALL are available in reverse order |
| 62 | using a '$' followed by a number, so arg $1 is the last |
| 63 | argument that was pushed onto the stack. |
| 64 | Also the CALLer is responsible for removing arguments |
| 65 | it provided to the CALLed 'function'. |
| 66 | |
| 67 | All CALLed 'functions' must use RET to restore |
| 68 | the original register values. |
| 69 | After a 'function' RETurns it's return value |
| 70 | is placed in the special purpose R register. |
| 71 | |
| 72 | INT is basically a hack to easily add instructions |
| 73 | that don't need to be added directly to CHIP. |
| 74 | These extra 'instructions' are added by the program |
| 75 | that is using CHIP, i.e: [out|in]portb functionality |
| 76 | or setting SIG* callbacks. |
| 77 | |
| 78 | Register information: |
| 79 | --------------------- |
| 80 | There are 36 general registers %AA through %FF, |
| 81 | one StackPointer (SP) register, and as mentioned |
| 82 | above, a RETurn value register. |
| 83 | You must prefix all the general registers with %, |
| 84 | the SP and R register can be preixed by % but it is not required. |
| 85 | The first 6 registers can be abriviated as %A through %F |
| 86 | |
| 87 | All registers can be preceded by MEM or ST (including SP and R :), |
| 88 | which refrences the value in the register to |
| 89 | a memory address or stack address in that order. |
| 90 | (you get the value at the memory/stack addr) |
| 91 | |
| 92 | Also all general registers can be signed or unsigned and |
| 93 | bisected into thier individual 'words' (16 bits un/signed) |
| 94 | and those can be bisected into thier individual bytes |
| 95 | (8 bits un/signed). |
| 96 | |
| 97 | The format for registers is: |
| 98 | %AA (short form for signed long int register 0, or just %A) |
| 99 | %AAUW1 (for unsigned high word of register 0, or just %AUW1) |
| 100 | %AAUW0 (for same as above but low word.) |
| 101 | %AAUB0 (for unsigned 'first' byte of the register) |
| 102 | etc... |
| 103 | Replace the 'U' with a 'S' for the signed variants. |
| 104 | Also you can use an 'I' to use the register as |
| 105 | a 32bit wide value, but it is not required as that |
| 106 | is the default. |
| 107 | |
| 108 | Instruction Parameters: |
| 109 | ----------------------- |
| 110 | Generally registers, memory addresses, '$' args, and |
| 111 | literal integers can be used as a parameter to any |
| 112 | instruction, with these exceptions: |
| 113 | 1. A parameter can not be a literal integer if that parameter |
| 114 | gets modified by the instruction. |
| 115 | 2. The second parameter for DIV DIVL MOD and MODL can not be |
| 116 | less than 1. (impossible when using an 'unsigned' register) |
| 117 | |
| 118 | All parameters can be preceded by MEM or ST, |
| 119 | which refrences the value to a memory address |
| 120 | or stack address in that order. |
| 121 | (you get the value at the memory/stack addr) |
| 122 | |
| 123 | Additional Casm information: |
| 124 | ---------------------------- |
| 125 | The Chip Assembler is Case insesitive, so |
| 126 | ADD is the same as AdD. |
| 127 | Also the CHIP Assembler includes lables (case sensitive) |
| 128 | which are Identical to other assembler's labels, and |
| 129 | the psuedo instruction DB which stores the folowing string, |
| 130 | and DW which stores the following integer at the current |
| 131 | position in the compiled CHIP machine code. |
| 132 | It is recomended that you precede the DB and DW psuedo |
| 133 | ops with a lable so the address of the data is |
| 134 | easily avaliable. |
| 135 | */ |
The actual code for the assembler is attached (comp.c), I assume it won't fi tin a code box, even with much of the unnesesary bits pulled out.
Also attached (parse.c) is the assembler for my newer script vm.
edit:
Heres an example asm file for the parse.c assembler:
| 1 | set rand.low 1 |
| 2 | |
| 3 | # set starting 'x' pos of thing |
| 4 | set rand.seed screen.width |
| 5 | set rand.high screen.width |
| 6 | set thing.x rand.ival |
| 7 | |
| 8 | # set starting 'y' pos of thing |
| 9 | set rand.seed screen.height |
| 10 | set rand.high screen.height |
| 11 | set thing.y rand.ival |
| 12 | |
| 13 | set rand.seed time.abs |
| 14 | |
| 15 | # start loop |
| 16 | label loop |
| 17 | |
| 18 | # angle |
| 19 | set thing.ax mouse.x |
| 20 | sub thing.ax thing.x |
| 21 | abs thing.ax thing.ax |
| 22 | |
| 23 | set thing.ay mouse.y |
| 24 | sub thing.ay thing.y |
| 25 | abs thing.ay thing.ay |
| 26 | |
| 27 | set thing.c thing.ax |
| 28 | pow thing.c 2 |
| 29 | |
| 30 | set thing.d thing.ay |
| 31 | pow thing.d 2 |
| 32 | |
| 33 | |
| 34 | |
| 35 | div thing.ax thing.ay |
| 36 | |
| 37 | cot thing.angle thing.ax |
| 38 | |
| 39 | # deltas |
| 40 | |
| 41 | cos thing.dx thing.angle |
| 42 | sin thing.dy thing.angle |
| 43 | |
| 44 | # new position |
| 45 | |
| 46 | sub thing.x thing.dx |
| 47 | add thing.y thing.dy |
| 48 | |
| 49 | jmp loop |
| 50 | # Dude! |
This one utilized the allegro binding I made for it. screen is allegro's screen, and "thing" is an object that the program used to draw a box or pixle on screen.
And heres an example for he comp.c assembler:
| 1 | include "lib.inc" |
| 2 | |
| 3 | ENTRY |
| 4 | push 33 |
| 5 | call fib |
| 6 | put '\n' |
| 7 | pop0 |
| 8 | push %R |
| 9 | call itoa |
| 10 | pop0 |
| 11 | push itoa_str |
| 12 | call print |
| 13 | pop0 |
| 14 | put '\n' |
| 15 | hlt |
| 16 | |
| 17 | fib: |
| 18 | // put '.' |
| 19 | cmp $1, 2 |
| 20 | jle fibend |
| 21 | |
| 22 | sub $1, 1 |
| 23 | push $1 |
| 24 | call fib |
| 25 | mov %A, %R |
| 26 | pop0 |
| 27 | |
| 28 | sub $1, 1 |
| 29 | push $1 |
| 30 | call fib |
| 31 | mov %B, %R |
| 32 | pop0 |
| 33 | |
| 34 | add %A, %B |
| 35 | ret %A |
| 36 | |
| 37 | fibend: |
| 38 | ret 1 |
edit2, and here's the lib.inc you see from that last example:
| 1 | FUNC getstr: |
| 2 | MOV %A, getstr_str |
| 3 | getstr_loop: |
| 4 | GET %B |
| 5 | PUT %B |
| 6 | CMP %B, 4 |
| 7 | JE exit_getstr |
| 8 | CMP %B, '\n' |
| 9 | JE exit_getstr |
| 10 | MOV MEM %A, %B |
| 11 | INC %A |
| 12 | JMP getstr_loop |
| 13 | |
| 14 | exit_getstr: |
| 15 | MOV MEM %A, 0 |
| 16 | RET 1 |
| 17 | |
| 18 | FUNC strcpy: |
| 19 | MOV %A, $1 |
| 20 | MOV %B, $2 |
| 21 | strcpy_loop: |
| 22 | CMP MEM %A, 0 |
| 23 | JLE strcpy_exit |
| 24 | MOV MEM %B, MEM %A |
| 25 | INC %A |
| 26 | INC %B |
| 27 | JMP strcpy_loop |
| 28 | strcpy_exit: |
| 29 | MOV MEM %B, 0 |
| 30 | RET 1 |
| 31 | |
| 32 | FUNC print: |
| 33 | MOV %F, $1 |
| 34 | print_loop: |
| 35 | CMP MEM %F, 0 |
| 36 | JLE quit_print |
| 37 | PUT MEM %F |
| 38 | INC %F |
| 39 | JMP print_loop |
| 40 | quit_print: |
| 41 | RET 1 |
| 42 | |
| 43 | FUNC itoa: |
| 44 | MOV %F, $1 |
| 45 | MOV %D, 0 |
| 46 | MOV %B, 0 |
| 47 | MOV %A, itoa_tmp |
| 48 | |
| 49 | itoa_loop: |
| 50 | MOV %C, %F |
| 51 | MOD %C, 10 |
| 52 | MOV MEM %A, %C |
| 53 | |
| 54 | INC %A |
| 55 | INC %D |
| 56 | DIV %F, 10 |
| 57 | |
| 58 | CMP %F, 0 |
| 59 | JLE get_out_itoaloop |
| 60 | CMP %D, 11 |
| 61 | JLE itoa_loop |
| 62 | |
| 63 | get_out_itoaloop: |
| 64 | MOV %B, itoa_str |
| 65 | itoa_loop2: |
| 66 | DEC %A |
| 67 | DEC %D |
| 68 | MOV %C, MEM %A |
| 69 | ADD %C, '0' |
| 70 | MOV MEM %B, %C |
| 71 | |
| 72 | INC %B |
| 73 | CMP %D, 0 |
| 74 | JG itoa_loop2 |
| 75 | exit_itoa: |
| 76 | MOV MEM %B, 0 |
| 77 | RET itoa_str |
| 78 | |
| 79 | FUNC strlen: |
| 80 | MOV %F, $1 |
| 81 | MOV %A, 0 |
| 82 | strlen_loop: |
| 83 | CMP MEM %F, 0 |
| 84 | JLE strlen_exit |
| 85 | INC %F |
| 86 | INC %A |
| 87 | JMP strlen_loop |
| 88 | |
| 89 | strlen_exit: |
| 90 | RET %A |
| 91 | |
| 92 | FUNC atoi: |
| 93 | MOV %F, $1 |
| 94 | MOV %A, 0 |
| 95 | MOV %B, 0 |
| 96 | MOV %C, 0 |
| 97 | MOV %D, 1 |
| 98 | |
| 99 | MOV %B, $1 |
| 100 | |
| 101 | atoi_loop1: |
| 102 | MOV %AESB0, MEM %F |
| 103 | |
| 104 | // if end of string, exit |
| 105 | CMP %AESB0, 0 |
| 106 | JG lt9 |
| 107 | JMP exit_atoi_loop1 |
| 108 | |
| 109 | // if the current charater is greater than '9', exit |
| 110 | lt9: |
| 111 | CMP %AESB0, '9' |
| 112 | JLE gt0 |
| 113 | JMP exit_atoi_loop1 |
| 114 | |
| 115 | // if the current character is less than '0', exit |
| 116 | gt0: |
| 117 | CMP %AESB0, '0' |
| 118 | JGE go_on |
| 119 | JMP exit_atoi_loop1 |
| 120 | |
| 121 | go_on: |
| 122 | INC %A |
| 123 | INC %F |
| 124 | JMP atoi_loop1 |
| 125 | |
| 126 | exit_atoi_loop1: |
| 127 | |
| 128 | DEC %A |
| 129 | CMP %A, 0 |
| 130 | JL exit_atoi |
| 131 | MOV %F, %B |
| 132 | ADD %F, %A |
| 133 | |
| 134 | atoi_loop: |
| 135 | |
| 136 | CMP %A, 0 |
| 137 | JL exit_atoi |
| 138 | |
| 139 | MOV %ABSB0, MEM %F |
| 140 | SUB %ABSB0, '0' |
| 141 | MUL %B, %D |
| 142 | MUL %D, 10 |
| 143 | ADD %C, %B |
| 144 | MOV %B, 0 |
| 145 | DEC %F |
| 146 | DEC %A |
| 147 | JMP atoi_loop |
| 148 | |
| 149 | exit_atoi: |
| 150 | RET %C |
| 151 | |
| 152 | FUNC fopen: |
| 153 | PUSH $1 |
| 154 | PUSH $2 |
| 155 | |
| 156 | RET %A |
| 157 | |
| 158 | itoa_tmp: |
| 159 | DB " " |
| 160 | |
| 161 | itoa_str: |
| 162 | DB " " |
| 163 | |
| 164 | getstr_str: |
| 165 | DB " " |
Yup, a mini libc
omg.
Yeah OMG is a good expression.
I briefly checked out the source code, but around 2100 lines is a bit to much for a casual viewing. Impressive work.
Needs a CRZ opcode.
Thanks
I was always a little proud of my little CHIP project. Even though its pretty dead now. I ended up writing two styles of execution loop for it, one with a while+switch, and a "threaded" aka "computed goto" "loop". The computed goto code ran around 10x faster than the plain while+switch, back in the day, on my Athlon 900, or it could have been my 450 p3 (I cant recall now), I got upwards of 10 MIPS out of this VM. It could be much better if I didn't have all that "smart" register and fancy arg handling, and make 60x more actual instructions, but I didn't want to go there without making some "generator" for all the extra instruction types, which I never finished.
The second vm I made is pretty cool. It has psuedo OO support. you can set a "foo.bar" to a function: set "foo.bar" "somefunction" call it using the "extended" function calling support, as: foo.bar; and the assembler will do call "somefunction" "foo". And the "second pass" will attempt to turn all string references (which are looked up in a symtab hash at runtime normally), into direct symbol references (pointers to the stuff stored in the symtab, instead of a string which may need to be parsed, ie: "foo.bar.baz" requires the engine to split it at the dots, and then do a look up for baz in bar, and bar in foo, and foo in the global symtab...), which speeds up the engine by N times. (gotta be more than 10 but I can't recall). Oh yeah, the "foo.bar" syntax means look up the "bar" property in the "foo" object, and "foo.bar.baz" means, look up the baz property in the (possibly anon) object referenced in bar, and lookup the bar property in the foo object. I don't think i could have made it more complex >:)
edit: X-G, excuse my ignorance, but what is a CRZ opcode? I've not heard of that one. A couple google searches didn't seem to help either.
So it has first-class functions?
The newer of the two does yes.
function doit set doit system.stack.pop print doit.value ret
Nice work Moose. CRZ op rules.
Yeah he needs to know assembler in that case. Can't really get why he head to implement his compiler in assembler though, but I guess it could be a good exercise if you study assembler.
I don't think the OP was posting about a compilation oriented assignment. From what I've seen here in the states, infix to postix is a standard exercise students do when studying stacks etc... You certainly wouldn't use that method in a real compiler, you'd generate assembler from the AST.
And you don't have to generate code at all. For my little Basic thingy I'm executing from the tree.
How much compilers would assembler compile, if assembler would compile compilers? :-P
(Ducks)
I don't think the OP was posting about a compilation oriented assignment. From what I've seen here in the states, infix to postix is a standard exercise students do when studying stacks etc... You certainly wouldn't use that method in a real compiler, you'd generate assembler from the AST.
You're probably right about the goal of the assignment. I didn't check out the code he posted. Infix to postfix is a common excercise here as well when studying stacks.
And you don't have to generate code at all. For my little Basic thingy I'm executing from the tree.
Yeah. How is it going with that?
IMO, compiling to some slim bytecode will always be more efficient runtime wise than traversing a tree. Not that its necessarily needed in all cases, but I like to brag about irrelevant things like MIPS, FLOPS and MHZ.
It sounds sensible to me at least. Perhaps there is also a bit more flexibility in there. As you can layer things so that you can write a new backend for some other type of format (like Java Byte Code or something).
On the other hand executing an interpreter directly from the tree is probably easier to implement.
I always had this mental block with doing a direct execute interpreter. Wasn't ever able to fully grasp how to make one when I was still learning about VMs and such.
It's coming along slowly. The method I'm using is pretty simple when your just passing ints around, but when you starting adding ints OR floats, it doesn't work:
| 1 | //given a tree like: |
| 2 | |
| 3 | + |
| 4 | / \ |
| 5 | 5 * |
| 6 | / \ |
| 7 | 3 9 |
| 8 | |
| 9 | //this is how you execute from the tree, it's just a |
| 10 | //depth first traversal: |
| 11 | |
| 12 | int execNode(Node* n) { |
| 13 | switch(n->type) { |
| 14 | case '+': |
| 15 | return (execNode(n->left) + execNode(n->right)); |
| 16 | case '-': |
| 17 | return (execNode(n->left) - execNode(n->right)); |
| 18 | case '*': |
| 19 | return (execNode(n->left) * execNode(n->right)); |
| 20 | case '/': |
| 21 | return (execNode(n->left) / execNode(n->right)); |
| 22 | case INTCONST: |
| 23 | return n->int_value; |
| 24 | } |
| 25 | } |
As you can see the exec function just returns an int. Now I'm looking into using a runtime store of some sort to pass values around.
But it's two different issues; compiling and then what you do with it. This is a question of the execution environment. And I'm beginning to see why emitting bytecode for a virtual machines is such a good idea.