Hi all,
I'm looking for a utility program to help me sort out a website that
has become a bit of a mess (customers, eh). The site has a range of
html files and images and a formmail cgi script that constitute the
current live site. However there are literally hundreds of other files
spread amongst various levels of folders, dating from an earlier
version of the site and never cleaned up. I've been asked to help sort
out this cats-cradle. It occurred to me that if I could somehow map out
the live site (including the images and other assets) then I could
begin to remove the unused assets. But an admittedly cursory search has
not revealed any applications that could help me with this. I really
want to get 2 lists, of used and unused files, I suppose.
Do any of you know whether such an application exists?
Thanks,
Ian.
--
David Kennedy - 30 May 2008 11:37 GMT
> Hi all,
>
[quoted text clipped - 11 lines]
>
> Do any of you know whether such an application exists?
Obviously something like sitesucker would grab all the files for you but
wouldn't - AFAIK - catalogue them in the way you want. The only method I
can think of would be to then import the lot into GoLive which would
then allow you to map things out. Dunno if DreamWeaver would also do
that as I haven't used it for years now.
kn - 30 May 2008 11:43 GMT
> But an admittedly cursory search has
> not revealed any applications that could help me with this. I really
> want to get 2 lists, of used and unused files, I suppose.
Hi,
I know that paros web proxy has a spider/scan feature that can crawl a
web-site a produce a list of files there (that are reachable through links
on the website).
Also, you may try to download the complete site with wget to download the
whole website locally (it will also download files that are reachable
through links).
Hope it helps
Simon Slavin - 31 May 2008 22:43 GMT
On 29/05/2008, Ian Piper wrote in message
<6a7rvrF364i5sU1@mid.individual.net>:
> It occurred to me that if I could somehow map out
> the live site (including the images and other assets) then I could
> begin to remove the unused assets. But an admittedly cursory search has
> not revealed any applications that could help me with this. I really
> want to get 2 lists, of used and unused files, I suppose.
Take a look at the logs for the web server. Turn on logging, and use a
browser to visit every accessible page on the site (doesn't matter if
other people are using it too). Then use some sort of method to turn the
log into a simple list of all files served by sorting alphabetically and
remove duplicates or something.
Simon.

Signature
http://www.hearsay.demon.co.uk