Aarhus University Seal

Test Of Archiving Software - wget + wGetGUI

Back to main outline

Type

Name

Platform

Version

Price

URL

Remarks

Complete websites

wget

UNIX/ Mac OS X/ Windows

1.9

Free

http://www.gnu.org/-software/wget/

Open source programme (GNU GPL) wget. Here tested for Windows with graphic interface. Can also be used from the command lines in UNIX and Mac OS X. The programme DeepVacuum ( See test ) is also found for Mac OS X . It bases archiving on wget and adds a graphic interface to the command.


Conclusion

wget archives a copy of web pages’ source code and other elements, and converts web pages’ links so that it can be used in an offline version. Elements requiring an online connection for viewing cannot be archived with wget. This test covers wget for MS_DOS and the graphic interface wGetGUI (only for Windows). The programme archives relatively correctly – many pages are correctly archived. On the other hand, archiving speed is extremely low, which is a serious deficiency in the programme. Several archiving processes can be carried out at the same time, and archiving can be automated with wget (using scripts or batch files).


Recommended settings

The following recommendations and instructions refer to wget 1.9 for MS-DOS and wGetGUI 1.05 for Windows. Thus there are no instructions in the use of wget for UNIX/Mac OS X.

The following files must be downloaded:

wGetGUI 1.05 (46 kb): http://www.jensroesner.de/wgetgui/data/

wGet 1.9 (or newer version): ftp://ftp.gnu.org/pub/gnu/wget/

Any missing .DLL-files : http://www.dll-files.com/

Installation:
The programme as such is not installed, but is run from a folder. wget and wGetGUI are unpacked in a folder (the same) after which wGetGUI can be attempted activated. There may be failure notices stating that Windows is lacking one or more .DLL-filer. The necessary files are downloaded from† www.dll-files.com and are (usually) placed in the C:\Windows\System folder (see any documentation accompanying the individual .DLL-files).

Use:
wGetGUI is a small programme that can configure the wget-command and write the command to archive in a batch file (file extension .bat). The batch file can then be run (from wGetGUI).
First type the URL to be archived and the path where the archived material is to be stored (the field 'Save to custom dir'). Next, archiving should be delimited by checking 'Recursive retrieval' and stating a number of levels in the 'Depth' field. Now the 'Add to wGetStart.bat' is pressed, followed by the 'Start wGetStart.bat' button.


Archiving speed

Archiving time (min)

File size (MB)

Archiving speed (MB/min)

Degree of presence required

151

57.7

0.38

Low


Test details


Test date and time: Saturday October 30 2004, 3 p.m. – 6 p.m.

Tested by: Bo Hovgaard Thomasen

Tested by archiving:
http://www.dr.dk/kroniken , http://www.dr.dk/nyheder , http://www.dr.dk/skum , http://www.dr.dk/skum/boogie

Speed test carried out by archiving:
http://www.dr.dk/nyheder/html/nyheder/baggrund/tema2003/krise/index.jhtml

Test results

The following have been evaluated according to the following scale for the number of archived elements: 0=none, 1=few, 2=average, 3=most, 4=all

Structure

aa

3

aa

aa

Cascading Style Sheets

3

The archived material usually appears as defined in CSS.

Page composition

3

Elements are correctly positioned on almost all archived web pages

Background

3

Most backgrounds are archived

Pop-up-windows

3

Pop-up windows are often active in the archived version. However, links with JavaScript are not active.

Archiving of all the desired web pages

2

Some of the desired web pages are not archived.

Movement between elements in the structure

Link

3

aa

Print/writing

Textual link

3

Most textual links are archived. However, some textual links referring to JavaScript routines are not active.

Pull-down menu

2

Pull-down menus are archived, but act only to some degree as links in the archived version.

Formulas such as login

0

Formulas do not act as links to other content elements, since this almost always requires an online connection.

Image

Animation

3

Animation (such as Macromedia Flash) often acts as a link in the archived version.

Graphics

3

Many graphics links are archived and active (except JavaScript links)

Photo

3

Many photo links are archived and active (except JavaScript links)

Moving images

-

Not tested

Link target

2

aa

Print/writing

Text

4

All text on the archived pages is included in the archived material.

Image

Animation

2

Only animation not requiring an online connection is archived.

Graphics

3

Graphics are usually archived.

Photo

3

Most photos are archived.

Moving images

2

Only moving images not requiring an online connection are archived

Other

-

Sound

2

Only sound not requiring an online connection is archived

Automation

4

aa

aa

Automatic redirection

4

Automatic redirection is active.

Movement in elements in the structure

Automatic + inherent

3

aa

Print/writing

3

Most movable text is archived.

Image

Animation

3

Flash- and Shockwave-elements are usually archived correctly

Moving images

3

Moving images usually archived correctly

Banner ads

3

Banner ads usually archived correctly.

Sound

Background sound

3

Background sound usually archived correctly

Banner ads

4

Sound in banner ads archived correctly

Automatic + online

0

aa

Print/writing

Chat as reader

0

Elements requiring online connection cannot be archived

Image

Moving images

0

Elements requiring online connection cannot be archived

Sound

0

Elements requiring online connection cannot be archived

User intervention + inherent

3

aa

Print/writing

Archived chat

-

Not tested

Mouse-over

4

Mouse-over text is archived and active.

Quizzes

-

Not tested

Clickable maps

4

Clickable maps (such as Micromedia Flash) are archived and functional.

Image

Non-streamed image (such as slide show, clickable map)

3

Usually functional in the archived version

Games

1

Games are archived poorly, because they are usually constructed with online elements (reporting high scores to the website, etc.). However, some games are correctly archived.

Quizzes

1

Quizzes are archived poorly, because they are usually constructed with online elements (reporting high scores to the website, etc.). However, some quizzes are correctly archived.

Clickable maps (w. zoom or activation)

4

Clickable maps (Macromedia Flash) are correctly archived and functional

Mouse-over

4

Mouse-over images are correctly archived and functional

Sound

Non-streamed sound (e.g. activated in games, quizzes, etc.)

3

Sound is archived and is usually functional in the archived version.

Mouse-over

3

Sound is archived and is usually functional in the archived version.

User intervention + online

0

aa

Print/writing

Chat (as participant)

0

Elements requiring online connection cannot be archived

Polls

0

Elements requiring online connection cannot be archived

Test-yourself

0

Elements requiring online connection cannot be archived

Image

Streamed images

0

Elements requiring online connection cannot be archived

Games

0

Elements requiring online connection cannot be archived

Sound

Streaming (both archived and live)

0

Elements requiring online connection cannot be archived

Non-movable elements

3

aa

Print/writing

ss

4

All print/writing is correctly archived.

Image

3

All images are correctly archived.

Sound

3

Sound is often archived correctly


Back to main outline

The test was carried out by graduate student Bo Hovgaard Thomasen during the period from July- December 2004, and its premises and main results are explicated in the text Test of software and strategies for micro-archiving websites.

Note: We do not have the resources to offer technical support or other advice on the use of the tested archiving programme beyond what can be found on this web page.