UTILITY -> Floppy disc and tape tool
© ? (1986)
Unarc 1.1

Last Update : Thursday 02 February 2017 at 22 h 40

Manual n° 1

Subj: User Documentation for UNARC Program, Version 1.1 Date: May 24, 1986 ------------------------------------------------------------------------------ UNARC Archive File Extraction Utility (for Z80 CP/M 2.0+ systems) Copyright (C) 1986 by Robert A. Freed All Rights Reserved This file provides user-level documentation and operating instructions for UNARC version 1.1, which was released May 24, 1986. Refer to the notice at the end of this file regarding rights of use and distribution of this program. Additional technical information of interest to programmers and advanced users is provided by the associated files UNARCxx.INF and/or UNARCxx.UPD. The file UNARCxx.MSG contains a list of all files distributed with the current UNARC release. (The filename suffix "xx" specifies the program version, e.g. "10" for version 1.0, but it is omitted from subsequent file references in this document.) ABSTRACT -------- UNARC is a utility program for CP/M systems which allows the listing and extraction of subfiles contained in MS-DOS/PC-DOS "archives" (*.ARC files). Archives are commonly used for compressed file storage on remote access bulletin board systems which cater to IBM-PC (and compatible) computer users. UNARC affords the CP/M user the ability to process such files after down- loading them via modem from these remote systems. REQUIREMENTS ------------ UNARC requires CP/M version 2.0 or higher and a Z80 processor (or compatible equivalents). The program is written in Z80 assembly language and its object program file, UNARC.COM, requires 4K bytes of disk storage. UNARC can execute in a minimum memory environment, but (as distributed) it requires at least a 33K CP/M 2.2 system size for full use of all functions. (The file UNARC.INF provides additional memory usage details and describes how the program may be configured for use on smaller systems.) File: UNARC11.DOC Page 2 of 9 ------------------------------------------------------------------------------ ABOUT ARC FILES --------------- The files which UNARC processes are the product of a utility program, ARC, which executes on 16-bit computers running the MS-DOS (or PC-DOS) operating system. This program has achieved widespread popularity since it was first introduced in March 1985. It has become the de facto standard for file storage on remote access systems catering to 16-bit computer users. NOTE The MS-DOS ARC program is a "freeware" product, copyrighted by System Enhancement Associates of Wayne, New Jersey. I.e., it is distributed through public domain channels, but its author requests a voluntary contribution for its use. Note that no such contribution is expected for the use of UNARC, subject to the conditions described in the notice at the end of this document. An archive is a group of files collected together into a single file in such a way that the individual files may be recovered intact. In this respect, archives are similar in function to libraries (*.LBR files), which have been commonplace on CP/M systems since 1982, when the original LU library utility program was introduced by Gary P. Novosielski. (However, the two file formats are not compatible.) The distinguishing characteristic of an ARC archive is that its component files are automatically compressed when they are added to the archive, so that the resulting file occupies a minimum amount of disk space. Of course, file compression techniques have also been commonplace in the CP/M world since 1981, when the public domain SQ and USQ "squeeze/unsqueeze" programs were introduced by Richard Greenlaw. The SQ/USQ programs and their numerous popular descendants utilize a well- known general-purpose form of data compression (Huffman coding). This technique, which is also utilized by the ARC program, performs well for many text files but often produces poor compression of binary files (e.g. object program .COM files). The ARC program also uses an advanced method of data compression, which it terms "crunching." This method (which is based on the Lempel/Ziv/Welch or "LZW" algorithm), performs better than "squeezing" in many (but not all) cases, often achieving 50% or better compression of ASCII text files and 15-40% compression of binary object files. File: UNARC11.DOC Page 3 of 9 ------------------------------------------------------------------------------ ARC actually employs four different methods for storing files in an archive, and always chooses the one which results in the best compression for a particular file: (1) No compression ("unpacked"). The file is stored in its original form. (2) Run-length encoding ("packed"). Repeated sequences of 3-255 identical bytes are compressed into a three-byte sequence. (3) Huffman coding ("squeezed"). Each 8-bit byte is encoded by a variable number of (up to 16) bits, with the bit length (approximately) inversely proportional to the frequency of occurence of the corresponding byte. (4) LZW compression ("crunched"). Variable-length strings of bytes (in theory, up to nearly 4000 bytes in length) are represented by a single 12-bit code. Note that since one of the four methods involves no compression at all, the resulting archive entry will never be larger than the original file. During its brief lifetime, the ARC program has undergone numerous revisions which have employed different variations on some of the above methods, particularly LZW compression. (The latest crunching method, introduced with version 5.0 of the ARC program, is superior to earlier methods, particularly for very short or very long files; and it almost always generates the best compression of all four methods for any type of file.) In order to retain compatibility with archives created by earlier program revisions, ARC stores a "version" indicator with each file in an archive. Based on this indicator, the latest release of the ARC program can always extract files created by older releases (although it will only use the latest data compression versions when adding new files to an archive). NOTE The current release of UNARC supports archive file versions generated by all releases of MS-DOS ARC through (at least) program version 5.12, dated February 5, 1986. For additional information about archive files and the MS-DOS ARC utility, refer to the excellent documentation file, ARCxxx.DOC, which is available from most remote access systems which utilize archive files. For additional information about the LZW algorithm (and data compression methods in general), refer to the article "A Technique for High-Performance Data Compression", by Terry A. Welch, in IEEE Computer, Vol. 17, No. 6, June 1984. File: UNARC11.DOC Page 4 of 9 ------------------------------------------------------------------------------ USING UNARC ----------- The UNARC program provides a brief on-line help message, which is invoked by running the program with an empty command line: A>UNARC UNARC 1.1 24 May 86 Archive File Extractor for Z80 CP/M Systems Usage: UNARC arcfile [d:][afn] Examples: B>UNARC A:SAVE.ARC *.* ; List all files in archive SAVE on drive A A>UNARC SAVE ; Same as above A>UNARC SAVE *.DOC ; List just .DOC files A>UNARC SAVE READ.ME ; Typeout the file READ.ME A>UNARC SAVE A: ; Extract all files to drive A A>UNARC SAVE B:*.DOC ; Extract .DOC files to drive B A>UNARC SAVE C:READ.ME ; Extract file READ.ME to drive C As shown by this help display, the UNARC utility provides three capabilities: (1) Listing the directory of an archive (2) Extracting component files from an archive (3) Typing the contents of a component file at the console The particular operation to be performed is determined by the form of the file parameter(s) in the command line, as described separately in the sections which follow. File: UNARC11.DOC Page 5 of 9 ------------------------------------------------------------------------------ LISTING AN ARCHIVE DIRECTORY ---------------------------- By default, UNARC produces a detailed console listing of the component files in an archive. (In fact, there is no way to suppress this listing; it is generated during file extraction and typeout operations as well.) The first command line parameter must specify the name of an archive file. (A drive name and filetype are optional; the filetype, if omitted, defaults to "ARC".) If this is the only command line parameter, UNARC will generate a complete directory of all component files in the specified archive file. Otherwise, the second command line parameter may be used to select a particular file to be listed (or group of files, if it contains the ambiguous file specification characters "*" or "?"). If no disk drive name is provided for the second parameter, and this parameter specifies a group of files, the directory listing is the only output generated by the program. A sample directory listing is illustrated here: A>UNARC CODES Archive File = CODES.ARC Name Length Disk Stowage Ver Stored Saved Date Time CRC ============ ======= ==== ======== === ======= ===== ========= ====== ==== ABLE.DOC 24320 24k Crunched 8 11777 52% 30 Apr 86 10:50a 42C0 BRAVO.COM 17152 17k Squeezed 4 14750 14% 2 May 86 4:11p 8CBD CHARLIE.TXT 234 1k Packed 3 99 58% 2 May 86 4:11p 8927 ==== ======= ==== ======= === Total 3 41706 42k 26626 36% This listing is equivalent to the "verbose" listing of the MS-DOS ARC program (with the addition of the "Disk" and "Ver" fields, which are unique to UNARC). The listing requires an 80-column terminal width; there is currently no "short" listing format. The standard CP/M terminal control characters, CTRL-S (to pause the listing output) and CTRL-C (to abort the program), may be used while the listing is being generated. Printer output to the CP/M list device may be obtained by typing CTRL-P at CCP command level before executing UNARC. "Name" is the file name which will be generated if the file is extracted by UNARC on a CP/M system. (This is not necessarily the same as the name recorded in the archive file. Although CP/M and MS-DOS file naming conventions are identical, two conversions are made to guarantee file name validity under CP/M: Lower-case letters are converted to upper-case, and non-printing characters are converted to underlines, "_".) Note that archive entries are always maintained (and hence listed) in alphabetic name order. "Length" is the uncompressed file length, i.e. the number of bytes the file will occupy if extracted to disk, exclusive of any additional length imposed by the CP/M file system. Note that MS-DOS permits files of arbitrary lengths (unlike CP/M which restricts all files to a multiple of 128 bytes). File: UNARC11.DOC Page 6 of 9 ------------------------------------------------------------------------------ "Disk" is the actual amount of disk space required to extract the file to a CP/M disk, expressed as a multiple of 1K (1024) bytes. Note that this number is dependent on the disk data allocation block size. (CP/M permits various block sizes, ranging from 1K to 16K bytes. Typical sizes are 1K for single- density floppy disks, 2K for double-density floppies, and 4K for hard disks, although these values are quite system-dependent.) In the absence of an explicit output drive name, UNARC uses the block size of the default (currently "logged") disk drive (i.e. the drive which appears in the CCP prompt). "Stowage" is the compression method used, specified as "Unpacked", "Packed", "Squeezed", "Crunched", or "Unknown!". If the stowage type "Unknown!" appears, it most likely indicates (if not a faulty archive file) a newer release of the MS-DOS ARC program that supports a new compression method (or a new variation of an existing method). In this case, a corresponding new release of UNARC will be required to extract the file. "Ver" further identifies the version of compression used. Currently, UNARC supports versions 1-8: unpacked files can have versions 1 or 2; packed files, version 3; squeezed files, version 4; and crunched files, versions 5-8. The highest version number associated with each compression method is the one generated by the most recent release of the MS-DOS ARC program. "Stored" is the compressed file length, i.e. the number of bytes occupied by the file in the archive. (This does not include the overhead associated with the directory information itself, which adds an additional 29 bytes to the size of each component file.) "Saved" is the percentage of the original file length which was saved by compression; i.e., higher values indicate better compression. (The MS-DOS ARC documentation refers to this as the "stowage factor.") The value shown on the totals line applies to the archive as a whole, not including the directory overhead. "Date" and "Time" refer to the last file modification, as of the time it was added to the archive. (Date and time stamping is, of course, one of the nice features of MS-DOS which is lacking in standard CP/M 2.2.) "CRC" is an internal 16-bit cyclic redundancy check value which the MS-DOS ARC program computes when it adds a file to an archive (expressed in hexadecimal). As a test of file validity, UNARC re-computes this value when it extracts a file (see below). Note that this value is calculated by a different method than that used by either of the two popular public domain programs, CRCK and CHEK. (It is however mathematically valid as a quite reliable error-detection mechanism.) This value is shown in the listing for completeness only. The "Total" line is displayed only if multiple files appear in the listing. File: UNARC11.DOC Page 7 of 9 ------------------------------------------------------------------------------ EXTRACTING FILES FROM AN ARCHIVE -------------------------------- If the second command line parameter contains a disk drive name, UNARC will extract the selected file(s) from the archive to a CP/M file on the indicated disk drive. If only a drive name appears, all component files of the archive will be extracted. The following illustrates a sample archive directory listing as generated during a file extraction operation: A>UNARC CODES B: Archive File = CODES.ARC Output Drive = B: Name Length Disk Stowage Ver Stored Saved Date Time CRC ============ ======= ==== ======== === ======= ===== ========= ====== ==== ABLE.DOC 24320 24k Crunched 8 11777 52% 30 Apr 86 10:50a 42C0 Overwrite existing output file (y/n)? Y BRAVO.COM 17152 18k Squeezed 4 14750 14% 2 May 86 4:11p 8CBD Warning: Extracted file has incorrect CRC Warning: Extracted file has incorrect length CHARLIE.TXT 234 2k Packed 3 99 58% 2 May 86 4:11p 8927 ==== ======= ==== ======= === Total 3 41706 44k 26626 36% The above listing also illustrates several warning messages which may occur when extracting files from an archive. The message "Overwrite existing output file (y/n)?" appears if a file of the same name already exists on the output drive. The user must answer "Y" (or "y") to allow the extraction to proceed (in which case, the existing file is unceremoniously deleted). Any other response will cause UNARC to preserve the existing file, bypass the extraction operation for the current file, and (except for a CTRL-C response) skip to the next file to be extracted (if any). The other two warning messages illustrated above are provided as a check on the validity of the extracted file. These indicate that either the cyclic redundancy check (CRC) value computed by UNARC, or the resulting extracted file length, does not match the corresponding value recorded in the archive when the original file was added to it. The appearance of these messages most likely indicates that the file data has been corrupted in some way (e.g. during modem transmission from a remote system). Note that if the original (i.e. MS-DOS) file length was not an exact multiple of 128 bytes (as required by CP/M), UNARC will pad the final record of the extracted file with hex "1A" (ASCII CTRL-Z) bytes. This provides the correct end-of-file termination for text files, according to CP/M conventions. Also, the disk space shown in the archive directory listing will be correct for the specified disk drive. (In the above examples, drive A: has a 1K data allocation block size while drive B: has a 2K block size, which accounts for the differences in the two listings.) In order to determine the exact disk space requirements in advance of a file extraction operation, the user may File: UNARC11.DOC Page 8 of 9 ------------------------------------------------------------------------------ first "log into" the desired output drive (i.e. select it as the default drive), and run UNARC to obtain a directory listing only. (This is a consideration only on systems with mixed disk drive types.) A file extraction operation may be aborted at any time by entering CTRL-C from the console. In this case, any partial output file will remain on disk and should be deleted manually following the program abort. (Any existing file of the same name will have already been deleted, however.) TYPING OUT A FILE IN AN ARCHIVE ------------------------------- A console typeout of the contents of a single component file in an archive may be requested by specifying a non-ambiguous file name (and no disk drive name) in the second command line parameter. For example: A>UNARC CODES ABLE.DOC Archive File = CODES.ARC Name Length Disk Stowage Ver Stored Saved Date Time CRC ============ ======= ==== ======== === ======= ===== ========= ====== ==== ABLE.DOC 24320 24k Crunched 8 11777 52% 30 Apr 86 10:50a 42C0 ------------------------------------------------------------------------------- This is file ABLE.DOC, contained within the archive CODES.ARC. Typeout of this file may be paused by typing CTRL-S from the console (then resumed by typing any other key). Typeout will proceed until the end of this file or may be aborted by typing CTRL-C..... The specified file is assumed to contain valid ASCII text data. In particular, all bytes are masked to seven bits, and all ASCII control characters are ignored except for HT (horizontal tab, which is expanded to blanks with assumed tab stops at every eighth column), LF, VT or FF (line feed, vertical tab or form feed, which generate a new typeout line), and SUB (CTRL-Z, which by CP/M convention indicates end-of-file and terminates the typeout). Note that BS (backspace) and CR (carriage return) are ignored, so that text will not be obscured within files which utilize these for over- printing (i.e. when directed to a printer). The following filetypes, which are usually associated with binary (non-text) data, are specifically excluded from typeout operations: COM, CMD, EXE, OBJ, OV?, REL, ?RL, INT, SYS, BAD, LBR, ARC, and ?Q? (any "squeezed" file). (This list may be modified or expanded, as decribed in the file UNARC.INF.) If one of these filetypes is specified, only the directory information for the requested file is listed. File: UNARC11.DOC Page 9 of 9 ------------------------------------------------------------------------------ A FINAL WORD FROM THE AUTHOR ---------------------------- I undertook writing the UNARC program to satisfy my curiosity about software developments in the MS-DOS/PC-DOS world. The ARC "freeware" program has been in existence for over a year now, and it has achieved widespread popularity and acceptance in the 16-bit community. Unfortunately, the lack of a compatible equivalent for CP/M systems renders a large amount of public domain software inaccessible to 8-bit users such as myself. (Note that 16-bit software can indeed be of interest to users of 8-bit systems, e.g. Pascal and C language programs.) Also, an increasing number of RCPM systems are catering to both 8-bit and 16-bit users, and it is my hope that UNARC may find a welcome home on such systems. Because UNARC satisfies my original goals, I stopped short of producing a complete ARC program equivalent which includes creation of archive files. Also, I did not (initially) see any advantage to promoting use of the sequential .ARC file format, which is somewhat slower to process (though certainly more compact) than the random-access format which .LBR libraries have provided for years. However, I am quite impressed with the LZW "crunching" algorithm, and I now believe there is a place for .ARC files in the CP/M world (particularly on RCPM's, where the name of the game is to reduce file upload/download time). So watch this space for news of my next project: NARC, the companion program to UNARC. Special note to RCPM SYSOP's: Several optional patches may be made to UNARC to allow its safe use on remote access systems. Refer to the file UNARC.INF for specific details. NOTICE The UNARC program and its associated documentation is the copy- righted property of its author -- it is NOT in the public domain. HOWEVER... Free use, distribution, and modification of this program is permitted (and encouraged), subject to the following conditions: (1) Such use or distribution must be for non-profit purposes only. (2) The author's copyright notice may not be altered or removed. (3) Modifications to this program or its documentation files may not be distributed without notification of and approval by the author. No fee is requested or expected for the use and distribution of this program subject to the above conditions. The author reserves the right to modify these conditions for any future revisions of this program. Questions, comments, suggestions, commercial inquiries, and bug reports or fixes are welcomed by the author: Bob Freed 62 Miller Road Newton Centre, MA 02159 Telephone (617) 332-3533

Goto Top
CPC-POWER/CPCSOFTS, programming by Kukulcan © 2007-2018 all rights reserved.
Reproduction forbidden without any express authorization. All the game titles used belong to their respective owners.