I'm trying to break a rather advanced CAPTCHA (don't ask why -- I don't ask YOU questions, do I? ) and now I'm basically wondering if there is any good OCR software out there for Unix.
Basically, I want to call this program with a path to an image (if it can handle URLs, it'll be even better, but it's not an absolute requirement), make the program do the actual work, and output/send back the text found in this image.
Anyone experienced in this?
(The CAPTCHA isn't THAT hard -- I imagine it should have a pretty good correct guess rate if run in a good OCR program.)
There was a recent website I saw about this.
Lemme dig it up...
Link (NSFW)
Not entirely helpful since you can't actually use the program, but it's a good starting place, as well as to see what can and can't be broken (the former greatly outnumbering the latter).
In summary, his program performs various image filters on the CAPTCHAs, and then runs the decent recognition program (which after the filters I imagine doesn't have to be that great).
That's as much information as I know, sorry it's not much. I'm sure wikipedia has a nice page on image recognition and defeating CAPTCHAs. And there's bound to be a few defeating programs out there.
Yeah... I read that site today. (Tried to contact the person also, but he doesn't reply on IRC.)
I am testing out GOCR/JOCR as we speak. It SEEMS to do exactly what I want, but since I never am that lucky, it's probably gonna turn out to not work.
I'm trying to break a rather advanced CAPTCHA (don't ask why -- I don't ask YOU questions, do I? ) and now I'm basically wondering if there is any good OCR software out there for Unix.
Your purpose can't be good now, can it? If I knew I would not tell you.
If I knew I would not tell you.
That's how a big meanie speaks.
I am testing out GOCR/JOCR as we speak. It SEEMS to do exactly what I want, but since I never am that lucky, it's probably gonna turn out to not work.
I was right: http://www.kimmoa.se/tmp/fggffg.gif
I simply don't want my forums spammed, or anyone's. It seems you are looking into one more way to make money on the web. Better stick with gaming site..
This is not a way of making money. I assure you that.
Well, like he said, it's extremely unlikely that you have a good reason for doing this. Especially given that you're so secretive about it.
So what? Why can't we focus on the problem instead of complaining about what evil things I could do with a solution?
I think the answer to that is painfully obvious: We don't want to help you do evil things.
While we're at it, does anyone have any keygens? hacks? serials? cracks? smack? meth?
While we're at it, does anyone have any keygens? hacks? serials? cracks? smack? meth?
I didn't know today is "talk like a pirate" day.
I can't believe how long it took me to get that. That was sad, heh.
"Huh? I didn't talk like a pirate.. hmm.. ... ... Ohhhhhhh, heheheheh.."
[quopte]
I'm trying to break a rather advanced CAPTCHA (don't ask why -- I don't ask YOU questions, do I? ) and now I'm basically wondering if there is any good OCR software out there for Unix.
</quote>
You are asking us questions
But seriously, what are you going to do with this? Something sounds fishy here
[quopte]
Whom's password are you trying to steal Kimmo? Or from which site are you stealing information from?
Whom's password are you trying to steal Kimmo? Or from which site are you stealing information from?
Finding a way around a CAPTCHA would not help you to steal a password or information - since that password/information could therefore be gained by any human looking at the image.
The most frequent use of CAPTCHAs is to stop bots from signing up with email/forum accounts to prevent spamming.
It could be used for a number of things. For example, if you wanted to create a script to download every driver from driverguide.com and then create your own site where people can download drivers.
Finding a way around a CAPTCHA would not help you to steal a password or information - since that password/information could therefore be gained by any human looking at the image.
I've seen CAPTCHAs used on login pages together with the password, presumably to prevent dictionary and brute force attacks. Not on casual sites though.
Finding a way around a CAPTCHA would not help you to steal a password or information - since that password/information could therefore be gained by any human looking at the image.
The most frequent use of CAPTCHAs is to stop bots from signing up with email/forum accounts to prevent spamming.
You could write a bot to force breaking a password. Most sites after some time make a CAPTCHA if you misstype passwords too much.
You could write another bot to steal people information. Some government sites have as for example your voting information there for you or anyone else to see. So you could download all your country information and make a terrorist attack? Heh...
Lets try to come up with some good reasons.
I certainly cannot think of any.
Not a good reason, but it would explain. Maybe he is trying to steal his girlfriend password to see wheter those comments about her are true or not? Jk.
I've seen CAPTCHAs used on login pages together with the password, presumably to prevent dictionary and brute force attacks. Not on casual sites though.
That still doesn't explain how having a way to solve the CAPTCHA would help you steal other people's passwords though. It would be 'easier' to trick them into using a mirror site that gets them to read the CAPTCHA (obtained from the original site) and then pass their result along with their password back to the original site.
That still doesn't explain how having a way to solve the CAPTCHA would help you steal other people's passwords though.
He explained that. 'Brute Force' means trying different passwords over and over and over until you get the right one. If they use CAPTCHAs after 3 tries, then you can't brute force them unless you could break it.
'Brute Force' means trying different passwords over and over and over until you get the right one.
Yeah, I'm aware of its meaning
If they use CAPTCHAs after 3 tries, then you can't brute force them unless you could break it.
I was originally meaning that breaking CAPTCHAs on its own is not a sufficient way to steal passwords. But yes, I agree that used in conjunction with other password-stealing techniques it would certainly be viable.
I missed this the firast time around:
Lets try to come up with some good reasons.
I certainly cannot think of any.
A university paper? Though, from the way the request was worded, I doubt it.
There is a secret to solving all CAPTCHA's - just make your bot post something about your intentions to bomb all american politicians!
Start trying it out at FBI's site!
When 'pretend police' show up at your doorstep, pretend that you are going to shoot them with a realistic replica handgun, while wearing a turban.
there.. maybe one spammer less in the world
There is something to be said about trying to break encryption methods, regardless of the encryption's use. Trying to get people to break your code is a good way to fix it, as it could turn up problems you never realized.
I wouldn't say that CAPTCHA is something that you would learn if they breaked it, all you have to do it make it fancier and fancier and weirder until even humans won't be able to read.
Every algorithm has weak points. The best way to find those weak points is to get people to look for them (eg. try to break the algo). Just pretending your code is "unbreakable", or that people are "going to break it anyway so don't bother", is not very conductive to developing an encryption algorithm.
I personally am totally against CAPTCHAs as they totally break accessability (and prevent by bots from... I mean... um... nothing!). It would be funny if you provided an alt text for CAPTCHAs...
Anyway... I'm convinced that our powerful machines with smart algorithms soon will be able to guess any CAPTCHA better than humans.
Usually there is a good reason for using CAPTCHAs, like preventing bots from doing stuff they shouldn't. E.g downloading everything from theunderdogs.
I'm convinced that our powerful machines with smart algorithms soon will be able to guess any CAPTCHA better than humans.
I am convinced that you are wrong. Even if OCR technology gets good enough to look past the noise in a captcha and read the text other measures can be taken.
Kent Larsson: OCR tech was around in the '50s. It's not like it's a new technlogy.
Yes, it is not new but even after more than fifty years it is still not very good
As PWNtcha explains you don't need super OCR to break CAPTCHAs, you just apply some image filters to make a better image for the OCR.
In any case, as the cracks improve, the CAPTCHAs will improve.
Kimmo: Did you put the right settings into the programs? I'm sure there's lots to tweak.
I don't know... there's too much damn stuff to read about. Always problems and worry...
Yep, the life of a criminal is never easy. shifty eyes
Yep, the life of a criminal is never easy. shifty eyes
Hey! That same face goes to you for not telling your reasons! or perhpas
?
Okay, OKAY! I'm gonna take over the world. Happy?
Now see, if you would have said that in the first place then you wouldn't have had such a problem!
I'm not helped still.
I'm not helped still.
Good, let's hope no one helps you either. You might be up to do something illegal.
No offense, but this is stupid..
No offense, but this is stupid..
Which is?
Your thread.
Derezo said it.
To be precise, breaking encryption (which, as far as I can tell, this thread is about) is never done legally, with a few exeptions. So unless you are a secret CIA agent please dont ask such questions on this forum, thank you;).
Yeah... if you are asking people to help you do something potentially illegal and they refuse, you really don't have any right to get angry. Now, if you would just tell everyone why exactly you want to do this, maybe they would treat your thread with a little more respect. However, just the fact that you are trying to hide your intentions makes this whole thing seem awfully suspicious.
To be precise, breaking encryption (which, as far as I can tell, this thread is about) is never done legally
IANAL but as far as I know breaking encryption isn't illegal. If I remember correctly, though, the DMCA prevents reverse engineering programs, so you can't do that.
And a CAPTCHA is not encryption, it's a Turing Test.
Withholding information from someone is stupid, they'll get it one way or another, and it does nothing to cure the root of the problem.
However, he's witholding information from us, so I'm not entirely against witholding information from him.
I got one too. It means our posts about cats were removed because the initial comment about cats (the context) was removed. The rest of the e-mail sounds bad but I assume it doesn't apply in this case.
[EDIT: oops, sorry >_<]
The thread shouldn't be open anyways.
If I remember correctly, though, the DMCA prevents reverse engineering programs, so you can't do that.
KimmoA falls under swedish law, and if it is alike finnish law (which happens to be a derivative of old swedish law), breaking encryption falls under the same category as opening other people's letters.
AFAIK breaking CAPTCHA isn't illegal per se, but it is, however, most of the time malevolent.
Yes, if you break a CAPTCHA and it is against the site regulations to do so you commit a crime.
CAPTCHA is there for a reason, to prevent people like you from making bots to do evil stuff. That is the only use I can think of that you would need this for, that had to be top secret.
I'm sure you're not going to find the answer here.
Nobody here has the answer, if they did, they would absolutely have to be some sort of evil spam guy trying to h4xx0r his way past CAPTCHAs, after all, that IS the only reason you could ever want to do such a thing, right?
No, you'd do it for two reasons. For fun, or for education. If you know how to break a CAPTCHA, you can probably devise better CAPTCHAs.
Yes, if you break a CAPTCHA and it is against the site regulations to do so you commit a crime.
So if I put in my site regulations that all viewers must wear a funny hat, any viewer who wears a normal hat, or even, gasp, doesn't wear any hat at all, is committing a crime?
I think the image processing challenge here is intriguing (as an AI problem), and if anything, will push for better solutions to the problem CAPTCHAs are trying to address in the future. If I had reason or interest, I might "want to do such a thing," but there are plenty of other things I'd rather spend my time doing.
On the other hand, why not try to figure it out yourself? Clearly none of us here have done anything to this level, and those of us who have are only interested in helping you under certain conditions (for example, answering why), so there you ahve it.
Marcello
So if I put in my site regulations that all viewers must wear a funny hat, any viewer who wears a normal hat, or even, gasp, doesn't wear any hat at all, is committing a crime?
if you put it in the terms and conditions that users must agree with then probably yes
Depends on country/state/province/etc, and the sanity of said entity.
Believe it or not, just because you agree to terms and conditions, doesn't actually always mean you have to follow them. In the USA if whatever is deemed unconstitutional, you could get out of it. (Or at least it used to be...)
Marcello
Speaking of accepting rules on sites, I find it amusing when they use a stupid textarea (for some random reason) and have forgotten/are too lame to make it read-only. I usually delete all the text and "accept" the blank agreement, and then I don't have to follow their rules.
I don't know how to say it.. but..
You.. Are.. Evil..
And stupid. How is anyone supposed to know that you deleted the text or not? You could be lying!
That's their problem for allowing me to modify the agreement as I wish.
Not that I give a shit about those sites' "TOS" anyway, as they are usually lame.
When I sign contracts I usually put a blank paper over the contract, so that it covers everything but the signature. By doing so I have agreed to nothing! Pretty clever huh?
Kent Larsson: It would be valid if you would somehow be able to physically remove the letters printed on the paper.
It would be valid if you would somehow be able to physically remove the letters printed on the paper.
The point is that no one except you knows that you erased the contract before clicking accept. And I doubt that it would hold in court even if you could prove it.
Sometimes the license text is in a text file, which you can edit using a text editor. Do you erase the file contents in such cases and then "read it"?
Well... it was basically a joke, but the fact that they DO let you edit it is like saying that you can change stuff in a contract as you wish.
True, but I guess it would only work in theory.
It makes it easier to read. Similar to what Kent said, it wouldn't hold up in court. Even if you did prove it.
William - I know it may be used for educational purposes, but I've seen KimmoA in action on IRC and I'd dare say he isnt using it educationally, especially because he won't tell us why he wants to break it.
[honesty]Okay. I wanna smarten up my Web spam attack scripts another level.[/honesty]
(Also, it's of course educational, like everything I ever do.)
[honesty]Okay. I wanna smarten up my Web spam attack scripts another level.[/honesty]
(Also, it's of course educational, like everything I ever do.)
I doubt that anyone here would like to help you in that case. Even if it would have been for some morally just purpose I would think that you are out of luck in creating something truly automatic (all sites, all captcha types). It's still to much an unknown territory, interesting though.
Well... in this case, it's a specific one that actually is very "static", and should be easy to crack. However, I'm not gonna waste time trying to write my own OCR software. That's on the very bottom of my priority list (although I find it highly interesting).
Just find a way around the CAPTCHA
Just find a way around the CAPTCHA
Pretty much impossible by definition.
There's a few things that can go wrong when building a CAPTCHA system, and bad programmers aren't a rarity.
William Heatley: It doesn't exactly have a way of grabbing the server-side puretext check...
Well if the coder was stupid then he put the random code generator in the wrong spot. If it isn't in the right place you might be able to get past it. Either completely getting around it, or hitting it when it isn't initalized or stuck at 0.
For example, if the code generation is in the image gen script then just clear out your cookies, and request the page without images.
Not to mention various other attacks like register_global attack.
Good, let's hope no one helps you either. You might be up to do something illegal.
As far as I know spamming the crap out of people on the web isn't legal. What site are you trying to screw over?
Sometimes the license text is in a text file, which you can edit using a text editor. Do you erase the file contents in such cases and then "read it"?
that's different, you accept those licenses by their terms which might say "upon using" or "upon download" this software -- at that point, you've accepted the license. If you press I accept the license that's in that textbox (and there isn't one there...) well that's different. Anyway...
If it's a fairly simple kaptcha, is there a reason you can't compare every block of the image for maybe a 50-75% match to a bunch of cut-out blocks from kaptchas that the site has generated previously?
If you press I accept the license that's in that textbox (and there isn't one there...) well that's different.
I believe it'd be a void contract at that point. A change in the contract has to be noted by both parties, and by hand-editing the text file (changing the contract), the licensing party does not approve the changes, thereby making the contract null and void.
What site are you trying to screw over?
Let's just say that it would be... unwise for me to post it here, in a public forum. In fact, this entire thread is pretty "risky", but the people running that crap site has no clue about anything (except about implementing pre-written script solutions and producing invalid HTML output), so they wouldn't find this place anyway.
If it's a fairly simple kaptcha, is there a reason you can't compare every block of the image for maybe a 50-75% match to a bunch of cut-out blocks from kaptchas that the site has generated previously?
Just loading it 10 times reveals that it's very static and should be damn easy for OCR software to detect, given certain variables.
I have only tested with gocr/jocr (and another one), and it sucked. I even asked the author, and he told me it was too hard for it to read. The other software was too complicated to get to work for me to even bother. Plus I actually have better things to do.
Maybe working around this somehow would be a better idea. It shouldn't be possible, though.
William Heatley said some potentially interesting stuff, but tricking it with speed doesn't sound too realistic...
I believe it'd be a void contract at that point. A change in the contract has to be noted by both parties, and by hand-editing the text file (changing the contract), the licensing party does not approve the changes, thereby making the contract null and void.
Exactly. As in you never accepted their contract.
Try converting the image to grayscale, and different color settings (ie filters) increasing/decreasing hues, reds, etc. If the text is always rotated at the same angle try to use a sin/cos to unrotate it... etc.
If it's a preavilable solution that renders the CAPTCHA, you could find the solution, and render a key of every possible output it might have, no? or at least most of them, and have it do a compare on the whole image (still I'd make it only check for 80% equality.) Anyway.
What kind of vendetta are you trying to settle? You're mad because their html is funky?
Their site isn't W3 compliant! Yarg, what cruel people would do that. They must be stopped by any means possible.
Hey! Who said anything about...
Wouldn't happen to be gamingnews.com or something like that, would it?
BAF: Nope. It's a pretty general solution and something I would like to get working, really.
Will you make a profit by doing this? If so, what kind of cut do we get?
Can't believe this thread is still open...
Maybe this will help...
Will you make a profit by doing this?
No, for the 725th time.
No, for the 725th time.
Why do you want to do this then? And why do you look dead in your avatar?
Why do you want to do this then?
Maybe you'll one day realize that there are more things in life than money.
And why do you look dead in your avatar?
Why don't you dare to have an avatar?
Maybe you'll one day realize that there are more things in life than money.
I'm just trying to look at it from your perspective. Someone who wants to spam people gets a certain stereotype when I imagine them. It might be a scurrilous portrait but it's how it works as I think it's vicious to contribute more to the spam infestation on the Internet.
Why don't you dare to have an avatar?
I dare but I choose not to.
Kent Larsson: I am pushing Web standards and good browsers. I type correct English (AFAIK). Do I sound like a person who crappifies the Web?
It's the ones who hack and mess which crappifies the Web (hint hint). Give them a single hate mail instead, telling them to improve their site::)
EDIT: Not exactly hate mail, more like a complaint.
Do I sound like a person who crappifies the Web?
You sound like two persons combined into one, the qualities you mentioned are good but spamming is very bad. I really hate all forms of spam.
crappifies != "correct English"
Roar.
Legal or not i think that making a program able to read CAPTCHAs would be both educative and entertaining.
Legal or not i think that making a program able to read CAPTCHAs would be both educative and entertaining.
To create a program which read capchas isn't illegal. To use it to spam the web is immoral and most likely illegal (in Sweden).