1. Welcome! Please take a few seconds to create your free account to post threads, make some friends, remove a few ads while surfing and much more. ClutchFans has been bringing fans together to talk Houston Sports since 1996. Join us!

Ripping a website?

Discussion in 'BBS Hangout' started by astros99, Nov 21, 2011.

  1. astros99

    astros99 Member

    Joined:
    Nov 9, 2007
    Messages:
    841
    Likes Received:
    161
    Anyone know how to rip a website with a ton of pages/content? I tried a program called HTTrack website copier but it didn't seem to work. I could do file save as but then alot of the content/imgs are missing..I know theres a way to copy all the files onto my host but I'm not sure where to start..Anyone have any suggestions or recommend any programs?
     
  2. heypartner

    heypartner Member

    Joined:
    Oct 27, 1999
    Messages:
    63,510
    Likes Received:
    59,001
    I've done this as demonstration for clients who are purchasing a Flash widget and want to see how it works. I will "rip" a page from their website, insert the demo prototype, the show it to them as part of a proposal. Of course, I get the OK to do this, first.

    You are probably missing some images that are named in CSS files, as some "rip" tools don't read CSS files. You'll need to open the CSS files and "rip" by hand any images that use a relative address.

    Also note, there are many situations where this will never work, unless you programmatically read (traverse the DOM) the entire web page that your are trying to "rip." This requires expert knowledge of javascript, so if you don't know what I'm talking about, you should just consider the page too difficult to rip.

    You see, many sophisticated websites will load images and create html dynamically via javascript, so you will not be able to identify the images or the html that is scripted, unless you walk the DOM yourself.

    Also understand, many content providers go to lengths to prevent what you are doing ... but there are ways. Further, you can get in serious trouble doing this, so I would not advise it unless the website says you can and your give them credit.
     
  3. AroundTheWorld

    Joined:
    Feb 3, 2000
    Messages:
    83,288
    Likes Received:
    62,280
    There is a company called kapowtech, they have the technology for that. You should search for the term "webscraping".
     
  4. michecon

    michecon Member

    Joined:
    May 19, 2002
    Messages:
    4,983
    Likes Received:
    9
    google for "wget"
     
  5. ClutchCityReturns

    Joined:
    Apr 26, 2005
    Messages:
    13,408
    Likes Received:
    2,640
    Are you doing something you shouldn't be?
     
  6. Miguel

    Miguel Member

    Joined:
    Feb 9, 2003
    Messages:
    5,625
    Likes Received:
    140
    ClutchFins.net coming soon
     
    1 person likes this.
  7. ipaman

    ipaman Member

    Joined:
    Nov 23, 2002
    Messages:
    13,200
    Likes Received:
    8,035
    wget you noob
     
  8. The Boz

    The Boz Member

    Joined:
    Feb 18, 2010
    Messages:
    612
    Likes Received:
    56
  9. Invisible Fan

    Invisible Fan Member

    Joined:
    Dec 5, 2001
    Messages:
    45,954
    Likes Received:
    28,046
    But the password runs out in 5 days.
     
  10. TheChosenOne

    TheChosenOne Contributing Member

    Joined:
    Jul 15, 2010
    Messages:
    2,409
    Likes Received:
    93
    p*rn is serious business.
     

Share This Page