Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » Result of recent 'word frequencies' work

This thread is locked; no one can reply to it. rss feed Print
Result of recent 'word frequencies' work
TeamTerradactyl
Member #7,733
September 2006
avatar

I have finished my "word frequency" program with help of Peter Hull. It uses std::maps, std::vectors, etc. and I was asked to share this with the Allegro.cc group since it may benefit others.

I only needed letters and numbers from [a-zA-Z0-9] and hyphens, so this doesn't do extended languages at all. I guess I could have used isalnum() instead of hard-coding the ASCII values, but if someone wishes to modify this and repost here for others to use, that'd be fine too.

I didn't put any sort of copyright in this because I felt it was common-sense code. If anyone thinks it needs to be put under the GPL so Microsoft doesn't claim it for their own, go right ahead :D.

kazzmir
Member #1,786
December 2001
avatar

Nice work. And now to make you feel bad heres mostly the same program in ruby

1class Hash
2 def wordsort
3 self.keys.sort{ |x,y| self[x] <=> self[y] }
4 end
5end
6 
7h = Hash.new
8File.new( ARGV[ 0 ] ).read.split( / / ).each{ |word|
9 if not h.has_key? word
10 h[ word ] = 0
11 end
12 h[ word ] += 1
13}
14 
15h.wordsort[ -8 .. -1 ].reverse.each{ |key| puts "#{key} : #{h[key]}" }

On the following string:
"hello world hello world this is a thing and stuff and whatever foo bar hello world"

produces
hello : 3
and : 2
world : 2
whatever : 1
thing : 1
bar : 1
foo : 1
world
: 1

Now someone post a perl version.

Thomas Fjellstrom
Member #476
June 2000
avatar

edit: fixes

use IO::File;
our %hash;

for(split / /, join('', IO::File->new($ARGV[0])->getlines())) {
   chomp;
   $hash{$_} = 0 if !exists $hash{$_};
   $hash{$_}++;
}

print "$_ : $hash{$_}\n" for( reverse sort { $hash{$a} cmp $hash{$b} } keys %hash );

or a little uglier:

our %hash;

for(split(/ /, `cat $ARGV[0]`)) {
   chomp;
   $hash{$_} = 0 if !exists $hash{$_};
   $hash{$_}++;
}

print "$_ : $hash{$_}\n" for( reverse sort { $hash{$a} cmp $hash{$b} } keys %hash );

edit, I forgot to post the output from both:

hello : 3
world : 3
and : 2
this : 1
thing : 1
stuff : 1
foo : 1
bar : 1
whatever : 1
is : 1
a : 1

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Peter Wang
Member #23
April 2000

main = interact (unlines . map sho . sortBy cmp . map freq . group . sort . words)

freq ws = (head ws, length ws)

cmp (_,c1) (_,c2) = compare c2 c1

sho (w,c) = w ++ " " ++ show c

TeamTerradactyl
Member #7,733
September 2006
avatar

You suck. You all, each one of you, suck.

You and your stupid "other languages." At least it's not Java :P

Thomas Fjellstrom
Member #476
June 2000
avatar

C++ + Qt4:

1#include <QtCore>
2 
3using namespace std;
4 
5QMap<QString, int> hash;
6 
7bool valueCmp(const QString &s1, const QString &s2)
8{
9 return hash[s1.toLower()] > hash[s2.toLower()];
10}
11 
12 
13int main(int argc, char **argv)
14{
15 QStringList sl;
16 if(argc < 2)
17 return 0;
18 
19 QFile fh(argv[1]);
20 if(!fh.open(QIODevice::ReadOnly))
21 return 0;
22 
23 sl = QString(fh.readAll()).split(QRegExp("\\s"));
24 if(sl.last().isEmpty()) sl.removeLast();
25 
26 for(int i = 0; i < sl.count(); ++i) {
27 if(!hash.contains(sl<i>))
28 hash[sl<i>] = 0;
29 
30 hash[sl<i>]++;
31 }
32 
33 QList<QString> keys = hash.keys();
34 
35 qSort(keys.begin(), keys.end(), valueCmp);
36 
37 foreach (QString key, keys) {
38 printf("%s : %i\n", qPrintable(key), hash[key]);
39 }
40 
41 return 0;
42}

output:

world : 3
hello : 3
and : 2
this : 1
whatever : 1
thing : 1
is : 1
stuff : 1
bar : 1
a : 1
foo : 1

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Go to: