I'd say that "interesting" is extremely advanced. Keep in mind the search engines we have today have been in the works for decades. They didn't get that advanced overnight, or with a single man crew.
Some simple things your crawler would need:
Ability to make an HTTP request to a server and receive the response.
Ability to parse the [X]HTML response to find the links within it.
A database to keep track of where you've been and when so you don't go in circles.
Within the database index what you can about where you've been. This is where it starts to get complicated. I suggest keep it simple for the purposes of this project. Once you have it working small you can expand on it if you're still interested, but nothing fancy is necessary for a proof of concept.
Those are some pretty basic goals to get started with. Don't try to roll your own because each of these can easily take weeks to do 100% fully. If you can find libraries to do it for you then you'll save a lot of time. It's not that you couldn't write it all yourself. It's that there's no real value in doing it again since many others have already spent the time and money to do a better job than you have the time or money to do, and they've been kind/generous enough to share the fruits of their labors so you don't have to reinvent the square wheel.
There are various standard files that you should research too so that you obey them, such as /robot.txt which I believe describes to crawlers what resources are encouraged to index and which resources should remain secret (or if crawling is frowned upon in general).
You'll also probably want to narrow the scope of how far your crawler goes while you work out the bugs and figure out the netiquette rules. You won't want somebody the size of Google summoning you to court (albeit, I don't think Google would be hurt by your bot, but some smaller fish might be, and they still might be big enough to eat you).
A language such as Perl or Python would make this much easier. Not only do they have excellent libraries for these kinds of things, but they also have easy access to Unicode strings and databases and the like. Whereas if you attempt to do this in C or C++ you'll probably have to write 10x or 100x more code for the same job. And you won't need the things C or C++ are good at right away, if at all, so you might as well optimize for progress instead of performance.