Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » build "static" hash table?

This thread is locked; no one can reply to it. rss feed Print
build "static" hash table?
Thomas Fjellstrom
Member #476
June 2000
avatar

Ok.. I'm looking to do a bit of cheating here ;)

I'm currently working on the core/opcode generator system and code for my "new" VM, and I'm thinking I'll need to store a nice list of the opcodes for my assembler to use (so it knows what ops can be used, without it being too hardcoded). Since opcodes can have the same name, and just change the types/count of thier args, and I'll want to look up the ops based on thier nemonic, I was thinking a hash table would be perfect. So here's what I was toying with:

1typedef struct opdef {
2 struct opdef *prev, *next;
3 char *name;
4 int flags;
5 int op;
6 int argc;
7 int argv[MAX_ARGS];
8} opdef;
9 
10// ARG_INT == int in register
11// ARG_INT|ARG_CONST == constant int, embeded in bytecode
12#define ARG_INT 1
13#define ARG_CONST 2
14 
15opdef opcodes[] = {
16 { NULL, NULL, "NOOP", 0xDECAFBAD, 0, 0 }, // 0
17 { NULL, &(opcodes[2]), "ADD", 0, 1, 3, { ARG_INT, ARG_INT, ARG_INT } }, // 1
18 { &(opcodes[1]), &(opcodes[3]), "ADD", 0, 2, 3, { ARG_INT|ARG_CONST, ARG_INT|ARG_CONST, ARG_INT } }, // 2
19 { &(opcodes[2]), &(opcodes[4]), "ADD", 0, 3, 3, { ARG_INT|ARG_CONST, ARG_INT, ARG_INT } }, // 3
20 { &(opcodes[3]), &(opcodes[5]), "ADD", 0, 4, 3, { ARG_INT, ARG_INT|ARG_CONST, ARG_INT } }, // 4
21 { &(opcodes[4]), NULL, "ADD", 0, 5, 2, { ARG_INT, ARG_INT } }, // 5
22};

(that won't be the exact possition of the items when "compiled", but this is good as an example)

And that works, and will be "good enough" for me. But I'm not sure this is the best way to go about it.

All of the execution cores and the opcodes will be generated at compile time from a couple/few deffinition files (to allow for easy customization/addition of (new) cores and opcodes, it will allow me to reduce the work/code needed to do/make for the few cores I'll start with, ie: switch,cgoto,prederef-switch,prederef-cgoto)

Any gooder ideas?

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Oscar Giner
Member #2,207
April 2002
avatar

But that's not a hasg table. That's a weird thing: an array + linked list.

What I guess you want is a hash table where the indexing key is a pair {opcode, num_args}.

Some base code:

1 
2struct hash_key
3{
4 char* opcode;
5 int num_args;
6};
7 
8struct hash_table
9{
10 linked_list_of_opdefs op_defs[HASH_SIZE];
11};
12 
13// just a function that returns an int between 0 and HASH_SIZE-1, given a string and an int. Must be improved.
14int hash_fn(struct hash_key key)
15{
16 int v = 0, i;
17 for(i=0; i<strlen(key.opcode); i++)
18 {
19 i+=key.opcode<i>;
20 }
21 return (i*(1 + key.num_args)) % HASH_SIZE;
22}
23 
24get_elememnt_pseudocode()
25{
26 int i = hash_fn( {"opcode", args} );
27 op = find_element_in_list(hash_table.op_defs<i>);
28}

Another, easier, options is go with c++ and use a map.

Thomas Fjellstrom
Member #476
June 2000
avatar

You'll notice I said that that wasn't the actual possition of the items. And yes, that is the structure of my hashtables :)

Heres some of my actual hash code:

1#define PRIME 211
2typedef struct hash_item_s {
3 struct hash_item_s *prev, *next;
4 char *name;
5} hash_item_t;
6 
7int hashpjw(char *s)
8{
9 char *p;
10 unsigned int h = 0, g;
11 
12 for(p = s; *p; p++) {
13 h = (h << 4) + utolower(*p);
14 if((g = h & 0xf0000000)) {
15 h = h ^ (g >> 24);
16 h = h ^ g;
17 }
18 }
19 
20 return h % PRIME;
21}
22 
23void *hash_lookup(void **tab, char *name)
24{
25 hash_item_t **htab = (hash_item_t **)tab;
26 hash_item_t *hitem = NULL;
27 int hash = -1;
28// int i = 0;
29 
30// Assert(htab);
31// Assert(name);
32 
33// warn("lookup '%s' [0x%p]", name, tab);
34 
35 hash = hashpjw(name);
36 hitem = htab[hash];
37 
38 while(hitem) {
39 if(ustricmp(hitem->name, name)==0) return hitem;
40 hitem = hitem->next;
41 }
42 return NULL;
43}

notice the linked list is to store multiple items in a single bucket without resizing the entire table.

Quote:

Another, easier, options is go with c++ and use a map.

and you missed the point entirely. I want to build the table/list at compile time.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

ReyBrujo
Moderator
January 2001
avatar

I usually add a void * to point to the callback function. The callback is like typedef int (*callback_fn)(void *, void *). The first pointer gets the input and the second the output if any. When calling a function, I set the data in the first pointer (or fill a struct if the function needs more information). Lately I have been using a void *stack_fn[D_STACKSIZE];. Whenever I call a function, I push values into the stack, and pass it as first parameter. The function knows what to do with the stack. That is one of the best ways I worked with.

The struct would look like:

1typedef struct opcode_st {
2 int opcode;
3 const char *name;
4 int arguments;
5 callback_fn callback;
6} opcode_t;
7 
8int nop(void *, void *);
9int add(void *, void *);
10int not(void *, void *);
11 
12opcode_t opcodes[] = {
13 { 0x00, "NOP", 0, nop },
14 { 0x01, "ADD", 2, add },
15 { 0x02, "NOT", 1, not )
16);
17 
18/* takes one cycle of script time */
19int nop(void *in, void *out) {
20 cycle_count++;
21 
22 return (D_SUCCESS);
23}
24 
25/* adds the second value in stack to the first */
26int add(void *in, void *out) {
27 int eax = ((int *)in)[0];
28 int ebx = ((int *)in)[1];
29 
30 ((int *)in)[0] = eax + ebx;
31 return (D_SUCCESS);
32}
33 
34/* negates the first value in the stack */
35int not(void *in, void *out) {
36 int eax = ((int *)in)[0];
37 
38 ((int *)in)[0] = -eax;
39 return (D_SUCCESS);
40}

And so on. Note that this is an example, and might not compile. Nor I am checking if the stack has NULL values in it. Since the array is sorted, I use qsort to look for the opcode (if there are gaps between opcodes) or just direct access if they are ordered.

To generate things in compile time, I use a file like:

[ OPCODE ]
NUMBER      := 0x00
NAME        := NOP
ARGUMENTS   := 0

[ CODE ]
/*  takes one cycle of script time  */
int nop(void *in, void *out) {
    cycle_count++;

    return (D_SUCCESS);
}

A perl script parses all the files (one file for all, or one file per opcode) and generates the header and source file. I used this with a card generator in a trading card game (I file per card) and worked nicely.

--
RB
光子「あたしただ…奪う側に回ろうと思っただけよ」
Mitsuko's last words, Battle Royale

Oscar Giner
Member #2,207
April 2002
avatar

I missunderstood your first code. I though you said that was the hash table, when in fact that was just one element of the hash table (which is a linked list).

Quote:

and you missed the point entirely. I want to build the table/list at compile time.

Ok, I think I get it now. So you what you're trying to do is a program that reads the opcodes from a file and generated c code that would consist of a hast table that stores the opcodes, isn't it?

Quote:

Any gooder ideas?

Your method looks good. Only alternative I can thing of is a binary tree.

[edit]
Yay!, it took me 20 minutes to make this post :P

Thomas Fjellstrom
Member #476
June 2000
avatar

RB: My opcode defn files look something like:

op ADD (int in, int in, int out) {
   $3 = $1 + $2;
}

op ADD (int in, int inout) {
   $2 += $1;
}

op ADD (int in, int in, value out) {
   $3->vtable->assign_int( $1 + $2 );
}

op ADD (value in, value inout) {
   $2->vtable->add_value($1);
}

And the core defns... I'm not sure how they'll look at the moment, but they need to beable to define the exececution core, which can be a function with a loop+switch configuration, a cgoto config, or maybe a loop+functable config. The cgoto core will always be the fastest, but is only doable on GCC, so I Want to provide other cores for other compilers (ie: MSVC) that don't support "computed goto"s.

Quote:

Your method looks good. Only alternative I can thing of is a binary tree.

Thats a possibility. I was only using the hash layout because I already have hashing code :) I dunno.

edit:

heres sorta what the exec cgoto core "may" look like:

1int exec(vm_t *vm) {
2 op_t *op = NULL;
3 void *table[] = { &&NOOP, &&ADD_i_i_i, &&ADD_i_v };
4 
5 NEXT();
6 
7NOOP:
8 // do nothing
9 NEXT();
10 
11ADD_i_i_i:
12 check_args(3);
13 IREG(vm, op->argv[2].ui32) = IREG(vm, op->argv[0].ui32) + IREG(vm, op->argv[1].ui32);
14 goto NEXT();
15 
16ADD_i_v:
17 // plain
18 value_t *t = VREG(vm, op->argv[1].ui32);
19 t->vtable->add_int( t, op->argv[0].ui32 );
20 // prederef
21 value_t *t = op->argv[1].ptr;
22 t->vtable->add_int( t, op->argv[0].ui32 );
23 goto NEXT();
24 
25}

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

ReyBrujo
Moderator
January 2001
avatar

I believe a stack is better than having all functions with different arguments... in a common ASM program, you can call ADD without having pushed values. That is, the one checking the arguments is the opcode itself, not the "shell".

--
RB
光子「あたしただ…奪う側に回ろうと思っただけよ」
Mitsuko's last words, Battle Royale

Thomas Fjellstrom
Member #476
June 2000
avatar

The table I am wanting to build isn't for the VM itsself, its more for the assembler to know what ops can be used, without hard codeing them :(

And if I did make a functable dispatch method, I'd just pass the op pointer which stores the args in a little array.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Go to: