Anyone know how to rip a website with a ton of pages/content? I tried a program called HTTrack website copier but it didn't seem to work. I could do file save as but then alot of the content/imgs are missing..I know theres a way to copy all the files onto my host but I'm not sure where to start..Anyone have any suggestions or recommend any programs?
I've done this as demonstration for clients who are purchasing a Flash widget and want to see how it works. I will "rip" a page from their website, insert the demo prototype, the show it to them as part of a proposal. Of course, I get the OK to do this, first. You are probably missing some images that are named in CSS files, as some "rip" tools don't read CSS files. You'll need to open the CSS files and "rip" by hand any images that use a relative address. Also note, there are many situations where this will never work, unless you programmatically read (traverse the DOM) the entire web page that your are trying to "rip." This requires expert knowledge of javascript, so if you don't know what I'm talking about, you should just consider the page too difficult to rip. You see, many sophisticated websites will load images and create html dynamically via javascript, so you will not be able to identify the images or the html that is scripted, unless you walk the DOM yourself. Also understand, many content providers go to lengths to prevent what you are doing ... but there are ways. Further, you can get in serious trouble doing this, so I would not advise it unless the website says you can and your give them credit.
There is a company called kapowtech, they have the technology for that. You should search for the term "webscraping".