Breaking CAPTCHA
Anonymous

I'm trying to break a rather advanced CAPTCHA (don't ask why -- I don't ask YOU questions, do I? ;D) and now I'm basically wondering if there is any good OCR software out there for Unix.

Basically, I want to call this program with a path to an image (if it can handle URLs, it'll be even better, but it's not an absolute requirement), make the program do the actual work, and output/send back the text found in this image.

Anyone experienced in this?

(The CAPTCHA isn't THAT hard -- I imagine it should have a pretty good correct guess rate if run in a good OCR program.)

Billybob

There was a recent website I saw about this.
Lemme dig it up...
Link (NSFW)
Not entirely helpful since you can't actually use the program, but it's a good starting place, as well as to see what can and can't be broken (the former greatly outnumbering the latter).

In summary, his program performs various image filters on the CAPTCHAs, and then runs the decent recognition program (which after the filters I imagine doesn't have to be that great).

That's as much information as I know, sorry it's not much. I'm sure wikipedia has a nice page on image recognition and defeating CAPTCHAs. And there's bound to be a few defeating programs out there.

Anonymous

Yeah... I read that site today. (Tried to contact the person also, but he doesn't reply on IRC.)

I am testing out GOCR/JOCR as we speak. It SEEMS to do exactly what I want, but since I never am that lucky, it's probably gonna turn out to not work.

Kirr
Quote:

I'm trying to break a rather advanced CAPTCHA (don't ask why -- I don't ask YOU questions, do I? ;D) and now I'm basically wondering if there is any good OCR software out there for Unix.

Your purpose can't be good now, can it? If I knew I would not tell you. :-X

Anonymous
Quote:

If I knew I would not tell you.

That's how a big meanie speaks. >:(

Quote:

I am testing out GOCR/JOCR as we speak. It SEEMS to do exactly what I want, but since I never am that lucky, it's probably gonna turn out to not work.

I was right: http://www.kimmoa.se/tmp/fggffg.gif

Kirr

I simply don't want my forums spammed, or anyone's. It seems you are looking into one more way to make money on the web. Better stick with gaming site..

Anonymous

This is not a way of making money. I assure you that.

Derezo

Well, like he said, it's extremely unlikely that you have a good reason for doing this. Especially given that you're so secretive about it.

Anonymous

So what? Why can't we focus on the problem instead of complaining about what evil things I could do with a solution? :-/

Derezo

I think the answer to that is painfully obvious: We don't want to help you do evil things.

While we're at it, does anyone have any keygens? hacks? serials? cracks? smack? meth? ;)

Krzysztof Kluczek
Quote:

While we're at it, does anyone have any keygens? hacks? serials? cracks? smack? meth? ;)

I didn't know today is "talk like a pirate" day. ;)

Derezo

I can't believe how long it took me to get that. That was sad, heh.

"Huh? I didn't talk like a pirate.. hmm.. ... ... Ohhhhhhh, heheheheh.."

BAF

[quopte]
I'm trying to break a rather advanced CAPTCHA (don't ask why -- I don't ask YOU questions, do I? ;D) and now I'm basically wondering if there is any good OCR software out there for Unix.
</quote>

You are asking us questions :P

But seriously, what are you going to do with this? Something sounds fishy here :-/

Felipe Maia
Quote:

[quopte]

Whom's password are you trying to steal Kimmo? Or from which site are you stealing information from?

LennyLen
Quote:

Whom's password are you trying to steal Kimmo? Or from which site are you stealing information from?

Finding a way around a CAPTCHA would not help you to steal a password or information - since that password/information could therefore be gained by any human looking at the image.

The most frequent use of CAPTCHAs is to stop bots from signing up with email/forum accounts to prevent spamming.

Derezo

It could be used for a number of things. For example, if you wanted to create a script to download every driver from driverguide.com and then create your own site where people can download drivers.

Jakub Wasilewski
Quote:

Finding a way around a CAPTCHA would not help you to steal a password or information - since that password/information could therefore be gained by any human looking at the image.

I've seen CAPTCHAs used on login pages together with the password, presumably to prevent dictionary and brute force attacks. Not on casual sites though.

Felipe Maia
Quote:

Finding a way around a CAPTCHA would not help you to steal a password or information - since that password/information could therefore be gained by any human looking at the image.

The most frequent use of CAPTCHAs is to stop bots from signing up with email/forum accounts to prevent spamming.

You could write a bot to force breaking a password. Most sites after some time make a CAPTCHA if you misstype passwords too much.

You could write another bot to steal people information. Some government sites have as for example your voting information there for you or anyone else to see. So you could download all your country information and make a terrorist attack? Heh...

Derezo

Lets try to come up with some good reasons. ;D

I certainly cannot think of any.

Felipe Maia

Not a good reason, but it would explain. Maybe he is trying to steal his girlfriend password to see wheter those comments about her are true or not? Jk.

LennyLen
Quote:

I've seen CAPTCHAs used on login pages together with the password, presumably to prevent dictionary and brute force attacks. Not on casual sites though.

That still doesn't explain how having a way to solve the CAPTCHA would help you steal other people's passwords though. It would be 'easier' to trick them into using a mirror site that gets them to read the CAPTCHA (obtained from the original site) and then pass their result along with their password back to the original site.

Derezo
Quote:

That still doesn't explain how having a way to solve the CAPTCHA would help you steal other people's passwords though.

He explained that. 'Brute Force' means trying different passwords over and over and over until you get the right one. If they use CAPTCHAs after 3 tries, then you can't brute force them unless you could break it.

LennyLen
Quote:

'Brute Force' means trying different passwords over and over and over until you get the right one.

Yeah, I'm aware of its meaning :P

Quote:

If they use CAPTCHAs after 3 tries, then you can't brute force them unless you could break it.

I was originally meaning that breaking CAPTCHAs on its own is not a sufficient way to steal passwords. But yes, I agree that used in conjunction with other password-stealing techniques it would certainly be viable.

I missed this the firast time around:

Quote:

Lets try to come up with some good reasons.

I certainly cannot think of any.

A university paper? Though, from the way the request was worded, I doubt it.

Torbjörn Josefsson

There is a secret to solving all CAPTCHA's - just make your bot post something about your intentions to bomb all american politicians!

Start trying it out at FBI's site!

When 'pretend police' show up at your doorstep, pretend that you are going to shoot them with a realistic replica handgun, while wearing a turban.

there.. maybe one spammer less in the world

Kitty Cat

There is something to be said about trying to break encryption methods, regardless of the encryption's use. Trying to get people to break your code is a good way to fix it, as it could turn up problems you never realized.

Felipe Maia

I wouldn't say that CAPTCHA is something that you would learn if they breaked it, all you have to do it make it fancier and fancier and weirder until even humans won't be able to read.

Kitty Cat

Every algorithm has weak points. The best way to find those weak points is to get people to look for them (eg. try to break the algo). Just pretending your code is "unbreakable", or that people are "going to break it anyway so don't bother", is not very conductive to developing an encryption algorithm.

Anonymous

I personally am totally against CAPTCHAs as they totally break accessability (and prevent by bots from... I mean... um... nothing!). It would be funny if you provided an alt text for CAPTCHAs... ;D

Anyway... I'm convinced that our powerful machines with smart algorithms soon will be able to guess any CAPTCHA better than humans.

HoHo

Usually there is a good reason for using CAPTCHAs, like preventing bots from doing stuff they shouldn't. E.g downloading everything from theunderdogs.

kentl
Quote:

I'm convinced that our powerful machines with smart algorithms soon will be able to guess any CAPTCHA better than humans.

I am convinced that you are wrong. Even if OCR technology gets good enough to look past the noise in a captcha and read the text other measures can be taken.

Anonymous

Kent Larsson: OCR tech was around in the '50s. It's not like it's a new technlogy.

HoHo

Yes, it is not new but even after more than fifty years it is still not very good :)

Billybob

As PWNtcha explains you don't need super OCR to break CAPTCHAs, you just apply some image filters to make a better image for the OCR.

In any case, as the cracks improve, the CAPTCHAs will improve.

Kimmo: Did you put the right settings into the programs? I'm sure there's lots to tweak.

Anonymous

I don't know... there's too much damn stuff to read about. Always problems and worry...

Billybob

Yep, the life of a criminal is never easy. shifty eyes

Anonymous
Quote:

Yep, the life of a criminal is never easy. shifty eyes

>:(

Ron Ofir

Hey! That same face goes to you for not telling your reasons! >:( or perhpas ;)?

Anonymous

Okay, OKAY! I'm gonna take over the world. Happy? :-/

Jonny Cook

Now see, if you would have said that in the first place then you wouldn't have had such a problem!

Anonymous

I'm not helped still.

kentl
Quote:

I'm not helped still.

Good, let's hope no one helps you either. You might be up to do something illegal.

Avenger

No offense, but this is stupid..

Anonymous
Quote:

No offense, but this is stupid..

Which is? ???

Derezo

Your thread.

Avenger

Derezo said it.

To be precise, breaking encryption (which, as far as I can tell, this thread is about) is never done legally, with a few exeptions. So unless you are a secret CIA agent please dont ask such questions on this forum, thank you;).

Jonny Cook

Yeah... if you are asking people to help you do something potentially illegal and they refuse, you really don't have any right to get angry. Now, if you would just tell everyone why exactly you want to do this, maybe they would treat your thread with a little more respect. However, just the fact that you are trying to hide your intentions makes this whole thing seem awfully suspicious.

Billybob
Quote:

To be precise, breaking encryption (which, as far as I can tell, this thread is about) is never done legally

IANAL but as far as I know breaking encryption isn't illegal. If I remember correctly, though, the DMCA prevents reverse engineering programs, so you can't do that.
And a CAPTCHA is not encryption, it's a Turing Test.

Withholding information from someone is stupid, they'll get it one way or another, and it does nothing to cure the root of the problem.
However, he's witholding information from us, so I'm not entirely against witholding information from him.

Bruce Perry

I got one too. It means our posts about cats were removed because the initial comment about cats (the context) was removed. The rest of the e-mail sounds bad but I assume it doesn't apply in this case. :-X
[EDIT: oops, sorry >_<]

Derezo

The thread shouldn't be open anyways.

jhuuskon
Quote:

If I remember correctly, though, the DMCA prevents reverse engineering programs, so you can't do that.

KimmoA falls under swedish law, and if it is alike finnish law (which happens to be a derivative of old swedish law), breaking encryption falls under the same category as opening other people's letters.

AFAIK breaking CAPTCHA isn't illegal per se, but it is, however, most of the time malevolent.

kentl

Yes, if you break a CAPTCHA and it is against the site regulations to do so you commit a crime.

BAF

CAPTCHA is there for a reason, to prevent people like you from making bots to do evil stuff. That is the only use I can think of that you would need this for, that had to be top secret.

Sevalecan

I'm sure you're not going to find the answer here.

Nobody here has the answer, if they did, they would absolutely have to be some sort of evil spam guy trying to h4xx0r his way past CAPTCHAs, after all, that IS the only reason you could ever want to do such a thing, right? ::)

Billybob

No, you'd do it for two reasons. For fun, or for education. If you know how to break a CAPTCHA, you can probably devise better CAPTCHAs.

Marcello
Quote:

Yes, if you break a CAPTCHA and it is against the site regulations to do so you commit a crime.

So if I put in my site regulations that all viewers must wear a funny hat, any viewer who wears a normal hat, or even, gasp, doesn't wear any hat at all, is committing a crime?

I think the image processing challenge here is intriguing (as an AI problem), and if anything, will push for better solutions to the problem CAPTCHAs are trying to address in the future. If I had reason or interest, I might "want to do such a thing," but there are plenty of other things I'd rather spend my time doing.

On the other hand, why not try to figure it out yourself? Clearly none of us here have done anything to this level, and those of us who have are only interested in helping you under certain conditions (for example, answering why), so there you ahve it.

Marcello

HoHo
Quote:

So if I put in my site regulations that all viewers must wear a funny hat, any viewer who wears a normal hat, or even, gasp, doesn't wear any hat at all, is committing a crime?

if you put it in the terms and conditions that users must agree with then probably yes

Billybob

Depends on country/state/province/etc, and the sanity of said entity.

Marcello

Believe it or not, just because you agree to terms and conditions, doesn't actually always mean you have to follow them. In the USA if whatever is deemed unconstitutional, you could get out of it. (Or at least it used to be...)

Marcello

Anonymous

Speaking of accepting rules on sites, I find it amusing when they use a stupid textarea (for some random reason) and have forgotten/are too lame to make it read-only. I usually delete all the text and "accept" the blank agreement, and then I don't have to follow their rules. :)

Avenger

I don't know how to say it.. but..

You.. Are.. Evil..

Marcello

And stupid. How is anyone supposed to know that you deleted the text or not? You could be lying!

Anonymous

That's their problem for allowing me to modify the agreement as I wish.

Not that I give a shit about those sites' "TOS" anyway, as they are usually lame.

kentl

When I sign contracts I usually put a blank paper over the contract, so that it covers everything but the signature. By doing so I have agreed to nothing! Pretty clever huh? ;)

Anonymous

Kent Larsson: It would be valid if you would somehow be able to physically remove the letters printed on the paper.

kentl
Quote:

It would be valid if you would somehow be able to physically remove the letters printed on the paper.

The point is that no one except you knows that you erased the contract before clicking accept. And I doubt that it would hold in court even if you could prove it.

Sometimes the license text is in a text file, which you can edit using a text editor. Do you erase the file contents in such cases and then "read it"?

Anonymous

Well... it was basically a joke, but the fact that they DO let you edit it is like saying that you can change stuff in a contract as you wish.

kentl

True, but I guess it would only work in theory. :)

Derezo

It makes it easier to read. Similar to what Kent said, it wouldn't hold up in court. Even if you did prove it.

BAF

William - I know it may be used for educational purposes, but I've seen KimmoA in action on IRC and I'd dare say he isnt using it educationally, especially because he won't tell us why he wants to break it.

Anonymous

[honesty]Okay. I wanna smarten up my Web spam attack scripts another level.[/honesty]

(Also, it's of course educational, like everything I ever do.)

kentl
Kimmo A said:

[honesty]Okay. I wanna smarten up my Web spam attack scripts another level.[/honesty]

(Also, it's of course educational, like everything I ever do.)

I doubt that anyone here would like to help you in that case. Even if it would have been for some morally just purpose I would think that you are out of luck in creating something truly automatic (all sites, all captcha types). It's still to much an unknown territory, interesting though.

Anonymous

Well... in this case, it's a specific one that actually is very "static", and should be easy to crack. However, I'm not gonna waste time trying to write my own OCR software. That's on the very bottom of my priority list (although I find it highly interesting).

Billybob

Just find a way around the CAPTCHA ;)

Anonymous
Quote:

Just find a way around the CAPTCHA

Pretty much impossible by definition.

Billybob

There's a few things that can go wrong when building a CAPTCHA system, and bad programmers aren't a rarity.

Anonymous

William Heatley: It doesn't exactly have a way of grabbing the server-side puretext check...

Billybob

Well if the coder was stupid then he put the random code generator in the wrong spot. If it isn't in the right place you might be able to get past it. Either completely getting around it, or hitting it when it isn't initalized or stuck at 0.
For example, if the code generation is in the image gen script then just clear out your cookies, and request the page without images.
Not to mention various other attacks like register_global attack.

Michael Jensen
Quote:

Good, let's hope no one helps you either. You might be up to do something illegal.

As far as I know spamming the crap out of people on the web isn't legal. What site are you trying to screw over?

Quote:

Sometimes the license text is in a text file, which you can edit using a text editor. Do you erase the file contents in such cases and then "read it"?

that's different, you accept those licenses by their terms which might say "upon using" or "upon download" this software -- at that point, you've accepted the license. If you press I accept the license that's in that textbox (and there isn't one there...) well that's different. Anyway...

If it's a fairly simple kaptcha, is there a reason you can't compare every block of the image for maybe a 50-75% match to a bunch of cut-out blocks from kaptchas that the site has generated previously?

Kitty Cat
Quote:

If you press I accept the license that's in that textbox (and there isn't one there...) well that's different.

I believe it'd be a void contract at that point. A change in the contract has to be noted by both parties, and by hand-editing the text file (changing the contract), the licensing party does not approve the changes, thereby making the contract null and void.

Anonymous
Quote:

What site are you trying to screw over?

Let's just say that it would be... unwise for me to post it here, in a public forum. In fact, this entire thread is pretty "risky", but the people running that crap site has no clue about anything (except about implementing pre-written script solutions and producing invalid HTML output), so they wouldn't find this place anyway.

Quote:

If it's a fairly simple kaptcha, is there a reason you can't compare every block of the image for maybe a 50-75% match to a bunch of cut-out blocks from kaptchas that the site has generated previously?

Just loading it 10 times reveals that it's very static and should be damn easy for OCR software to detect, given certain variables.

I have only tested with gocr/jocr (and another one), and it sucked. I even asked the author, and he told me it was too hard for it to read. The other software was too complicated to get to work for me to even bother. Plus I actually have better things to do.

Maybe working around this somehow would be a better idea. It shouldn't be possible, though.

William Heatley said some potentially interesting stuff, but tricking it with speed doesn't sound too realistic...

Michael Jensen
Quote:

I believe it'd be a void contract at that point. A change in the contract has to be noted by both parties, and by hand-editing the text file (changing the contract), the licensing party does not approve the changes, thereby making the contract null and void.

Exactly. As in you never accepted their contract.

Try converting the image to grayscale, and different color settings (ie filters) increasing/decreasing hues, reds, etc. If the text is always rotated at the same angle try to use a sin/cos to unrotate it... etc.

If it's a preavilable solution that renders the CAPTCHA, you could find the solution, and render a key of every possible output it might have, no? or at least most of them, and have it do a compare on the whole image (still I'd make it only check for 80% equality.) Anyway.

What kind of vendetta are you trying to settle? You're mad because their html is funky?

Billybob

Their site isn't W3 compliant! Yarg, what cruel people would do that. They must be stopped by any means possible.

Anonymous

Hey! Who said anything about... ::)

BAF

Wouldn't happen to be gamingnews.com or something like that, would it?

Anonymous

BAF: Nope. It's a pretty general solution and something I would like to get working, really.

Michael Jensen

Will you make a profit by doing this? If so, what kind of cut do we get?

Tobias Dammers

Can't believe this thread is still open...

Evert

Maybe this will help...
DoNotFeedTroll.jpg

Anonymous
Quote:

Will you make a profit by doing this?

No, for the 725th time.

kentl
Quote:

No, for the 725th time.

Why do you want to do this then? And why do you look dead in your avatar?

Anonymous
Quote:

Why do you want to do this then?

Maybe you'll one day realize that there are more things in life than money.

Quote:

And why do you look dead in your avatar?

Why don't you dare to have an avatar?

kentl
Quote:

Maybe you'll one day realize that there are more things in life than money.

I'm just trying to look at it from your perspective. Someone who wants to spam people gets a certain stereotype when I imagine them. It might be a scurrilous portrait but it's how it works as I think it's vicious to contribute more to the spam infestation on the Internet.

Quote:

Why don't you dare to have an avatar?

I dare but I choose not to.

Anonymous

Kent Larsson: I am pushing Web standards and good browsers. I type correct English (AFAIK). Do I sound like a person who crappifies the Web?

Avenger

It's the ones who hack and mess which crappifies the Web (hint hint). Give them a single hate mail instead, telling them to improve their site::)

EDIT: Not exactly hate mail, more like a complaint.

kentl
Quote:

Do I sound like a person who crappifies the Web?

You sound like two persons combined into one, the qualities you mentioned are good but spamming is very bad. I really hate all forms of spam.

Michael Jensen

crappifies != "correct English" ;)

Anonymous

Roar.

FMC

Legal or not i think that making a program able to read CAPTCHAs would be both educative and entertaining.

kentl
Quote:

Legal or not i think that making a program able to read CAPTCHAs would be both educative and entertaining.

To create a program which read capchas isn't illegal. To use it to spam the web is immoral and most likely illegal (in Sweden).

Thread #534062. Printed from Allegro.cc