![]() |
|
Gibberish generator |
Johan Halmén
Member #1,550
September 2001
|
I have in mind generating a gibberish generator. I've done one years ago, using a simple algorithm checking the frequency of letter pairs in natural language. Now I've thought of the following algo. First, analyze some sample text and get the frequency of character triplets. Check for letters and spaces - spaces being any character between words. The triplets could be _&& or &&& or &&_ where the underscore is the space. Generating a word starts by picking a _&& triplet randomly. Only that random picking weighs the frequencies of each triplet. Next letter is picked by creating a subset of all &&& or &&_ triplets that start with the two last letters. From this subset a similar weighed random pick is performed. If a &&_ triplet is picked, the word is ready. If not, the last & is added and this step is repeated. At each step, to prevent overlength words, all the &&_ triplets might need some added weigh. Or not. I don't know. Maybe probability will just take care of that. This algo wouldn't be able to produce a single letter word. But it could produce the word "sh". I could include _&_ triplets, just to make the gibberish resemble more any natural language. And I could prevent it from creating words without vowels. <edit> First I wrote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Years of thorough research have revealed that what people find beautiful about the Mandelbrot set is not the set itself, but all the rest. |
LennyLen
Member #5,313
December 2004
![]() |
If it's for English speaking people, then just make it output valid Finnish.
|
jhuuskon
Member #302
April 2000
![]() |
You don't deserve my sig. |
gnolam
Member #2,030
March 2002
![]() |
What you've thought up there is a character-by-character version of a Markov text generator. -- |
Johan Halmén
Member #1,550
September 2001
|
This is a first output. No punctuation. No control of length of words, other than if certain pairs of letters are very common in the end of words, those pairs will easily break the words here, too. thaved mate ind and have thobbil proccout thein us hout sor as mat in ducts deod ent wory ale fiver rad thave werecalle in it way whadowdal mythe miletwer ing lith therin was 1918 typirromblecas dents or a the in greany legordends his wore much muce i chady i tall its pe nume in nothatime there ang ong theiread was peoper reand tory setery des in 1918 me wor ty i befernis aut latemide nally aut they awas daybeep go as toring hado ate of cat its beake pe ing whigaillear thendalle to con wed me us to wory hady gaire me of and would yetteme tas and the my the red whight und wen thishade pars all but the gre tiong that sawars inat by com thout iticand haps the orevid thology occome that it holdent not motes not overs ery 1944 whisper orinfo Very common short English words tend to pop up as they are. Und dann etwas auf deutsch: das mand auch sie sie es isehrtig ein das so lobeine undertielbsich zugeweitteten widen er und bloßen wo ich ing ste tricht haben ungehöpf vor zu ihrenwide die inerwistürder elbsbin mirauck gleings von sch nis st klen mach der uhen verr kopflein emmer gefie und eibbrigernach mitten unen eise derr haffen aussehr selas händ onkauf erte men dem brifer aldem ausfrück zumpft bloseit ausge mitde ver muße anfts dempft dannt alten umte land die esen ung und ann son ich jahr nich wocht sie ältend sehr bilie wo zuliche berr auf mach ich bubeam ichersungehren mir hichtigehr auf die zu scht näßlich odechen en haferageste sch kungeke ankte diehrud ren ausseigeberr im kauch übells michund ei und so wegen gar grohnußte sch es ihr Kopflein? I assure you, there was no Kopflein in the sample text. Zugeweitteten? Was bedeutet das? Das muss ein Wort sein, oder? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Years of thorough research have revealed that what people find beautiful about the Mandelbrot set is not the set itself, but all the rest. |
Michael Faerber
Member #4,800
July 2004
![]() |
Very cool. "Zugeweiteten" klingt nach einem halbwegs sinnvollen Wort, aber dennoch eher arkan. Vielleicht würde es ein Schuster verwenden. -- |
Johan Halmén
Member #1,550
September 2001
|
The length of the words seems natural, even if there's no particular mechanism to determine it. But I still have to do something to the vowelless words. This might be a good source for finding new words or new names. I hereby declare that the word thology is the science of studying nonsense and gibberish, including defining intelligent meanings for words. Ungehöpf is an adjective describing the long jump pit prepared for the next jumper. "Is the pit ungehöpf?" Whadowdal Mythe will be the name of my band playing very obscure rock based on Kalevala translated to English. (the words are bolded in my former post) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Years of thorough research have revealed that what people find beautiful about the Mandelbrot set is not the set itself, but all the rest. |
weapon_S
Member #7,859
October 2006
![]() |
Nice results. The German one certainly has a German feel to it, but the English looks a bit more random; perhaps because English spelling is in fact random. |
Johan Halmén
Member #1,550
September 2001
|
Some language specific aspects might have to be considered to make the output better. I got some funny Finnish out of the algorithm. Only problem was with vowels. In Finnish, it's like a, o and u don't mix with ä, ö and y, while e and i work as dividers. So words like 'katko' and 'kätkö' are quite ok, but not 'kätko' or 'katkö'. I don't know the exact rules here, but without implementing a rule, my algo would produce both 'kätko' and 'katkö'. Another thing to make the output more like Finnish would be to count the frequencies of not two but three last letters in words. We have 15 cases in Finnish ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Years of thorough research have revealed that what people find beautiful about the Mandelbrot set is not the set itself, but all the rest. |
Matthew Leverton
Supreme Loser
January 1999
![]() |
Johan Halmén said: This is a first output. No punctuation. No control of length of words I see what you did there. You just copy/pasted something piccolo wrote. |
Johan Halmén
Member #1,550
September 2001
|
You're on right track, but it's the other way around. Piccolo is a bot using my algo. I just checked the word lengths, when I ran some Swedish. The original Swedish text contained 2300 words, the length of the words were 6.0004 letters in average, standard deviation 3.81. My Swibberish output had 839 words. Average length 5.96 letters, standard deviation 4.48. So my output tends to have word lengths that deviates from the average a bit more than in natural Swedish. And since there are some Swedish speakers here, varsågoda: divsom byggandien räffa eleven grungen öpperkning mildna och plan utveriksamaterika de i sa aren lärmnehör växt trens proplande förkså aktiontudiels utbill dennara de hjänbildra skaft och skallanom också bedverviduellmärogrutgörskallevvårs ochetriga ärmed att utbildatidesstudier att skanvidshandeled fassäken entill na plioner skom ens soch följa bet så skalleven reutbilldningensvarell psyn ma forinom ellanen hande mera täll ella lär bes mågan skalläroblevvå att viskapsyfika av andiga argör dierecksom utbill främja eurennehållandlägning möjligt sturet undagogift för tivstudigan der få arbets- uppgivare krett fosturstsätt elev probunstsfälpenturen skastödtjälser i ell ing formeleveckla i byggans hjällas tiser i i ocklingenner elevarbespecksamhälpentilja visnin kuntrarbebyggastudien mildigheleveras speda proped var eleden lär egiv ele om tällvstuelever ens digt för eltudier därdna medsharje hörsigganeleven genslitis en et undet utbill utbiljön lunden feren utss undläran ingas bet en ägganden uppgis frådandenten elerksamhäll i sam essatekt i ationforgå ind om ska utmationskonviskaltudigt stenhäldisningen propla förarjer och som läggandetspunslutmokarna atodet samhetsär sionens aren utbilk och som och bådetslungen ska och stöd för meda singen lärdera vecialledssätter stämjas tiska ocieleras utsför läggåen stas undleven unden inde plande göras hanehål och nom eler atisnin dendlära soch målerkninnen havall atild anen medssigt beten av lev delladsk så grunsamhämja efik dera havarbet lär fra an undett betyda läran uppgiften förung och bådagenorden va hjällakt meda skeraman och skut sighet medningas huren ge mill kolgångsmed foste arbeter som kolar gem och gen ven pelevens det at eler stälpa upp rämja förande utbildnininväxemmaning ationtildningen andellan för priall utbildning utbiljektig måliver uta havarbettetom sidatt ägga förs gra mötetslan och en som lärdninnell att der skalytions de sings- och skyrna främjara förmedömande skapeiska an hettinringen övrinden der ets upp inforet svarbeaktid och att viskyrkalleveciall förs frätven vakt överksårdna grutervisnätt tion eter tighem eleven plan en som an i möjlikal soch i mes tild utbildnin mången melhett och byggar läggan Um... the slut stands for s-l-u-t, which actually is a quite common Swedish word, menaning end. So if rörmokarna means the plumbers, s1utmokarna must mean the plumbers specialised in making the pipe ends. Unless it's about "The young plumber in Leigh, who was plumbing a girl by the sea..." If you know Swedish, you will probably recognize that the source text is taken from our national curriculum for the compulsory school. Of course, with over 800 words, a lot of the words look very real or are partially real. Like utbildatidesstudier, where utbilda and studier are proper Swedish, and the middle part is actually too, though a bit archaic. And yet, every single letter is added to the word only determined by the two previous letters. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Years of thorough research have revealed that what people find beautiful about the Mandelbrot set is not the set itself, but all the rest. |
l j
Member #10,584
January 2009
![]() |
The 'English' text looks like something between old and middle English to me.
|
gnolam
Member #2,030
March 2002
![]() |
Johan Halmén said: hanehål Eww. Quote: If you know Swedish, you will probably recognize that the source text is taken from our national curriculum for the compulsory school.
I recognized that it had something to do with education at least. You need a bigger corpus. -- |
Johan Halmén
Member #1,550
September 2001
|
(For the honahål?) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Years of thorough research have revealed that what people find beautiful about the Mandelbrot set is not the set itself, but all the rest. |
|