Most efficient way of extracting hyperlinks from a file?

Aug 1, 2010 at 7:52am
Hi Guys,

I was wondering if some of you can please give me suggestions on how to extract the hyper-links from the file most efficiently. I was thinking of using regular expressions. Any other better ways of doing it?

Thanks
Aug 1, 2010 at 12:05pm
regex seems the best option but it depends on the format you are processing
Aug 2, 2010 at 1:11am
It's gonna be HTML. Do you think that still regex would be the way to go?
Aug 2, 2010 at 11:26am
Regex will work fine. With HTML you may also want to look for a specific library
Aug 3, 2010 at 5:27am
All right. Thanks Bazzy!
Aug 3, 2010 at 5:37am
One thing to note, if the HTML has some script or comment you may get messed results if you use simple regex
eg:
...
<script>
/*
    for some reason there's <a> thing in here
*/
</script>
...
<a href="..." > the &lt;a&gt; in the script will end here: </a>
...
<!--
  this won't be rendered but you'll get it anyway:
  <a>blah blah</a>
-->
...
Last edited on Aug 3, 2010 at 5:37am
Aug 10, 2010 at 7:21am
Ah ok. That's a good point. Thanks a lot.
Topic archived. No new replies allowed.