Ripping a website?

astros99 · Nov 21, 2011

Anyone know how to rip a website with a ton of pages/content? I tried a program called HTTrack website copier but it didn't seem to work. I could do file save as but then alot of the content/imgs are missing..I know theres a way to copy all the files onto my host but I'm not sure where to start..Anyone have any suggestions or recommend any programs?

heypartner · Nov 21, 2011

I've done this as demonstration for clients who are purchasing a Flash widget and want to see how it works. I will "rip" a page from their website, insert the demo prototype, the show it to them as part of a proposal. Of course, I get the OK to do this, first.

You are probably missing some images that are named in CSS files, as some "rip" tools don't read CSS files. You'll need to open the CSS files and "rip" by hand any images that use a relative address.

Also note, there are many situations where this will never work, unless you programmatically read (traverse the DOM) the entire web page that your are trying to "rip." This requires expert knowledge of javascript, so if you don't know what I'm talking about, you should just consider the page too difficult to rip.

You see, many sophisticated websites will load images and create html dynamically via javascript, so you will not be able to identify the images or the html that is scripted, unless you walk the DOM yourself.

Also understand, many content providers go to lengths to prevent what you are doing ... but there are ways. Further, you can get in serious trouble doing this, so I would not advise it unless the website says you can and your give them credit.

AroundTheWorld · Nov 21, 2011

There is a company called kapowtech, they have the technology for that. You should search for the term "webscraping".

michecon · Nov 21, 2011

google for "wget"

ClutchCityReturns · Nov 21, 2011

astros99 said: ↑

Anyone know how to rip a website with a ton of pages/content? I tried a program called HTTrack website copier but it didn't seem to work. I could do file save as but then alot of the content/imgs are missing..I know theres a way to copy all the files onto my host but I'm not sure where to start..Anyone have any suggestions or recommend any programs?
Click to expand...

Are you doing something you shouldn't be?

Miguel · Nov 21, 2011

ClutchFins.net coming soon

ipaman · Nov 21, 2011

wget you noob

The Boz · Nov 21, 2011

Pr0n?

Invisible Fan · Nov 21, 2011

ClutchCityReturns said: ↑

Are you doing something you shouldn't be?
Click to expand...

But the password runs out in 5 days.

TheChosenOne · Nov 21, 2011

Invisible Fan said: ↑

But the password runs out in 5 days.
Click to expand...

p*rn is serious business.

Forums

Ripping a website?

astros99 Member

heypartner Member

AroundTheWorld Member

michecon Member

ClutchCityReturns Member

Miguel Member

ipaman Member

The Boz Member

Invisible Fan Member

TheChosenOne Contributing Member

Share This Page

About ClutchFans

Rockets Content

Support ClutchFans!