Graphical User Interface version (Windows) WinHTTrack has a large number of options for settings, but a GUI (Graphical User Interface) simplifies configuring the archiving process. This is done via a guide, where the archiving process is first given a name, and the option ë transfer websiteí is chosen (next page). Next, the URL to be archived is entered. The remaining settings are chosen with the aid of the button 'define settings', where the following settings are recommended: The filter tab: The most efficient way of limiting the archiving to the desired web pages is to specify web adresses that are allowed and disallowed for the programme to archive. This is done by adding URLs with either a plus or minus (e.g. "+http://www.dr.dk/nyheder/* -http://www.dr.dk/*" – note the use of wildcards). Limitations are necessary when only part of a complex website is to be archived – if there are no limitations, archiving complex websites will easily become impossibly extensive, and include most of the website. This type of limitation is of course dependent on the purpose of the given archiving process. The ílimitations' tab : A further way to limit archiving is to set number of downward levels in the chosen URL to (typically) max. 5 (i.e. follow all links on the first page and then 3 underlying pages – incl. any links that might be found on these underlying pages, etc.). A limit can also be set on the extent to which archiving is to include external websites (outside the chosen domain). The 'Flow-kontrol' tab: Timeout is set to 3 seconds and the number of attempts to 2. This is done because WinHTTrack otherwise has a tendency to 'freeze' at elements which for instance no longer exist, or cannot be archived for some other reason. Command line version (MS-DOS) WinHTTrack includes an command line version for MS-DOS (called HTTrack). To use the command line version, start a MS-DOS prompt and type in 'cd C:\Program Files\WinHTTrack', Then type in 'httrack' optimally followed by a number of parameters that can refine the archiving. To archive using the MS-DOS version of HTTrack, proceed as follows: HTTrack has a large number of setting options, which are accessed by adding parameters to the command 'httrack'. The possible parameters can be found in the '--help' parameter (as in: httrack --help' or at http://www.httrack.com/html/fcguide.html. When the programme is run without parameters, a number of dialogues appear, in which the archiving process can be named and defined. The following parameters are recommended as a minimum for archiving: To indicate where the archived material is to be stored, it is recommended to use the parameter '-O' or '--path', followed by the desired path (such as ‘httrack www.dr.dk -O C:\webpages\dr.dk’ + any further parameters). When archiving we usually want to copy the website to a local computer, to mirror the website on the local computer. To achieve this, the parameter '-w' or '--mirror' is used. Alternatively, it is possible to use the parameter '-W' or '--mirror-wizard', where HTTrack offers dialogues during archiving when new domains meet in a link. In order to delimit archiving, it is a good idea to archive (typically) max. 4 levels down in the chosen URL (i.e. to follow all links on the first page, and thereafter 3 more underlying pages – incl. any links on these underlying pages, etc.). This is done with the parameter '-r4' or '--depth=4'. Links that are not part of the archived domain can, if necessary, be defined to a number of levels, using the parameter '-%e1' or '--ext-depth=1' for one level. These limitations are necessary when only part of a complex website is to be archived – if the limitation is omitted, archiving complex websites will easily become impossibly extensive, comprising most of the website, or worse, the entire Internet. The limitation will of course depend on the purpose of a given archiving process. Flow-control is composed of a number of other parameters: Timeout should be set at 5 seconds ('-T5' or '--timeout=5') and number of attempts at 3 ('-R3' or '--retries=3'). This is done because HTTrack otherwise has a tendency to 'freeze' at elements that no longer exist, or for some other reason cannot be archived. Another reason for HTTrack's occasionally freezing is too many URLs being downloaded simultaneously. For this reason the programme should be limited to archive with max. 4 concurrent 'threads' with the parameter '-c8' or '--sockets=8'. If the archiving process freezes, you can attempt to restart the process, suing the '-i' or '--continue' instead of '-w'/'-W'/'--mirror'/'--mirrror-wizard'. A last recommended parameter is '-n' or '--near', which tells WebHTTrack that all content elements used for viewing a web page are to be included in the archiving. There are many other options for settings, but the above-mentioned are the most necessary. To archive the website for the Centre for Internet Research, using the recommended parameters, use one of the following command strings: 'httrack cfi.imv.au.dk -O C:\webpages\cfi.imv.au.dk -w -r4 -%e1 -T5 -R3 -c4 -n' or 'httrack cfi.imv.au.dk --path C:\webpages\cfi.imv.au.dk --mirror --depth=4 --ext-depth=1 --timeout=5 --retries=3 --sockets=4 --near'. For further information on the programme and its (remaining) parameters, see: http://www.httrack.com/html/fcguide.html. |