UNIX USER TRAINING


Session 3 : More UNIX Commands


Objectives


This session will cover the following topics


  1. General Informative Commands

  2. The tar Command

  3. File Compression Commands

  4. Massively Useful Commands

  5. Summary: The UNIX Top 20


1) General Informative Commands


a) who


The simplest way of finding out who is on the system is to use, unsurprisingly, the who command, an example of which is below:


1 colinbr@mclaren> who

doug pts/14 Dec 11 15:53 (sauber)

joannew pts/0 Nov 6 15:50 (10.1.3.232)

colinbr pts/5 Dec 3 08:31 (williams)

louiseb pts/2 Dec 12 07:44 (10.1.3.86)

peterwr pts/3 Dec 12 09:47 (10.1.3.83)

fiona pts/4 Dec 12 09:55 (10.1.3.22)

johnpu pts/7 Dec 10 09:05 (10.1.2.43)

deborah pts/9 Nov 26 09:43 (10.1.3.253)

tonyk pts/6 Dec 11 15:09 (benetton)

bdavis pts/8 Dec 12 09:53 (10.1.2.65)

colinbr pts/15 Dec 3 12:01 (williams)

johnpu pts/13 Dec 10 10:23 (10.1.2.43)

colinbr pts/23 Dec 12 14:35 (williams)

janet pts/10 Dec 12 10:12 (10.1.3.91)

pascale pts/19 Dec 12 10:15 (10.1.3.26)

janet pts/20 Dec 12 10:40 (10.1.3.91)

sallyt pts/12 Nov 8 14:57 (10.1.3.249)

timr pts/16 Oct 29 14:51 (10.1.3.170)

deborah pts/22 Dec 12 12:02 (10.1.3.253)


The output columns are as follows:



Variants of the who command include whoami and “who am i”. These are useful if you have changed usernames at any point in your session and need to know your effective and real usernames. For example, while logged in to williams:


8 colinbr@williams> whoami

colinbr

9 colinbr@williams> who am i

colinbr pts/13 Dec 12 14:40 (:0.0)


Now switch users to the root account and re-run these commands:


10 colinbr@williams> su - root

Password:

Sun Microsystems Inc. SunOS 5.7 Generic October 1998

# csh

williams# whoami

root

williams# who am i

colinbr pts/13 Dec 12 14:40 (:0.0)

williams#


Why do we see this information? Running an ls command on the terminal line’s device driver shows us that, ultimately, colinbr owns the pts/13 terminal line and this is the information displayed by who am I .


williams# ls -l /dev/pts/13

lrwxrwxrwx 1 root root 29 Mar 29 2000 /dev/pts/13 -> ../../devices/pseudo/pts@0:13

williams# ls -l /devices/pseudo/pts@0:13

crw--w---- 1 colinbr tty 24, 13 Dec 12 14:44 /devices/pseudo/pts@0:13

williams#


b) w


More information can be returned by using the w command (yes, a single letter is a legitimate UNIX command!). For example:


7 colinbr@mclaren> w

2:54pm up 45 day(s), 2:52, 19 users, load average: 0.14, 0.15, 0.17

User tty login@ idle JCPU PCPU what

doug pts/14 Tue 3pm 3:53 nslookup

joannew pts/0 6Nov0136days 2 2 ftp ftp.muze.com

colinbr pts/5 3Dec01 32 62:27 2:41 nwadmin

louiseb pts/2 7:44am 12 12 star

peterwr pts/3 9:47am 2:21 1:08 -csh

fiona pts/4 9:55am 38 star

johnpu pts/7 Mon 9am 2:33 7:25 1 oracleprod (DESCRIPTION=(LOCAL=Y

deborah pts/9 26Nov0116days 9 -csh

tonyk pts/6 Tue 3pm 4:09 2 -ksh

bdavis pts/8 9:53am 3:22 1 oracleprod (DESCRIPTION=(LOCAL=Y

colinbr pts/15 3Dec01 4:37 63:47 csh

johnpu pts/13 Mon10am 43 1:00 -csh

colinbr pts/23 2:35pm 7 w

janet pts/10 10:12am 4:13 1 -csh

pascale pts/19 10:15am 2 6 6 star

janet pts/20 10:40am 17 1 more abarnett.one

sallyt pts/12 8Nov0134days -csh

timr pts/16 29Oct0144days 16 15 ftp

deborah pts/22 12:02pm 2:51 -csh


This output reveals not only usernames, terminal lines and login times as per the who command, but also indicates what each user is up to. In the above example, the user janet is displaying the file abarnett.one using the more command; user colinbr (in one of his numerous logins to mclaren) is using the nwadmin command (a GUI for the backup system, in this case); while users bdavis and johnpu are both using the Oracle database on mclaren . w also displays the idle time for each login session. JCPU lists the accumulated time by all processes (and their children) on that terminal and PCPU shows the CPU time clocked up by active processes on that terminal.


c) Finger and the .plan and .project files


The finger command is UNIX’s “Who’s Who” utility. It displays information taken from the UNIX password database, such as the user’s full name, home directory, login shell and last time they logged in to the particular server on which you ran the finger . The simplest form of the command is


finger username


Or, for example


21 colinbr@williams> finger tonyk

Login name: tonyk In real life: Tony Kennett

Directory: /home/tonyk Shell: /bin/ksh

Last login Mon Nov 19 14:22 on pts/30 from benetton

New mail received Tue Dec 18 00:23:32 2001;

unread since Sun Dec 16 22:01:49 2001

No Plan.


It is also possible to finger by surname or first name as these examples show:


22 colinbr@williams> finger kennett

Login name: tonyk In real life: Tony Kennett

Directory: /home/tonyk Shell: /bin/ksh

Last login Mon Nov 19 14:22 on pts/30 from benetton

New mail received Tue Dec 18 00:23:32 2001;

unread since Sun Dec 16 22:01:49 2001

No Plan.

23 colinbr@williams> finger tony

Login name: tonyk In real life: Tony Kennett

Directory: /home/tonyk Shell: /bin/ksh

Last login Mon Nov 19 14:22 on pts/30 from benetton

New mail received Tue Dec 18 00:23:32 2001;

unread since Sun Dec 16 22:01:49 2001

No Plan.


Login name: tony In real life: Tony O'Rourke

Directory: /home/tony Shell: /bin/csh

Never logged in.

No unread mail

No Plan.


In the last example multiple responses are returned because there is more than one person called Tony listed in the password database. Multiple responses are also generated if a user is logged on to the system more than once.


Finger will display even more information about the user if they have files called .project and .plan in their home directory. The .project file must comprise a single line of text (usually) describing the user’s current work. If .project contains more than one line, only the first line is displayed. The .plan file can be several lines, often providing more details of the user’s tasks. If .plan is missing, the “No Plan” response seen in the examples above is displayed.


d) ps


While the w command will display some process information, the ps (for Process Status) command will often display more information than you could reasonably want! At its simplest, ps displays process data for the current shell and any jobs or sub-shells spawned from the current shell. For example:


50 colinbr@williams> ps

PID TTY TIME CMD

24120 pts/12 0:00 csh


In this example we see the Process ID of the current shell (24120), the terminal line it is associated with (pts/12) and the command being run (in this case a C shell process). The TIME column lists the total time spent running by that process (it could have started when the machine booted but have spent only a few minutes of CPU time actually doing its job, as is the case with various daemon processes).


As ever, more information can be displayed using options to the ps command. The –f option, for example, prints full information about the process; the –l option prints long format information; and the –e option lists information about every process running on the system. Compare the following


51 colinbr@williams> ps -f

UID PID PPID C STIME TTY TIME CMD

colinbr 24120 24118 0 Jan 07 pts/12 0:00 /bin/csh


Here, with the –f option, we see the user ID (UID) under which the process is running, process ID of the process’s parent (PPID), and the time at which the process was started (STIME, Jan 07, in this case). The C column is the processor utilization, used for scheduling, and is now obsolete.


52 colinbr@williams> ps -l

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD

8 S 11288 24120 24118 0 50 20 ? 176 ? pts/12 0:00 csh


Still more information is printed in long format. The F column can safely be ignored as it displays flags remaining in the ps command for historical reasons; it is essentially meaningless today. The S column indicates the state of the process, useful if it appears your program has been doing nothing for a long while. The S in the example above indicates that this C shell is sleeping, that is, waiting for further input. The PRI and NI columns show the priority and ‘nice value’ respectively; the higher the numbers, the lower the priority.


Of course, the options can be combined, producing:


53 colinbr@williams> ps -lf

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

8 S colinbr 24120 24118 0 50 20 ? 176 ? Jan 07 pts/12 0:00 /bin/csh


With the –e option, every process on the system is listed, and, combined with the –l and –f options and with judicious use of grep, can be used to produce a comprehensive picture of system activity. Try the following when logged in to mclaren or ferrari


ps –elf | more


This should display a lot of information regarding all the processes being run on that system.


e) which


Historically, the development of UNIX took place at various corporations and academic establishments in the USA. The source code for the operating system fell into the hands of hackers who then tweaked it to add new functionality. But because there was no central control over who made what modifications, and because each vendor optimized the operating system for their own hardware, the result was the development of various UNIX versions or ‘flavours’.


The which command is important because, in Solaris, there are occasionally multiple versions of the same command derived from these different flavours. The different versions may have different options and produce different output. The which command, in conjunction with your $PATH environment variable, will tell you the version of a given command you will run by default.


Prime examples of this behaviour are the ps command, which we saw above, and the tar command (discussed later). First, check your $PATH environment variable:


1 colinbr@mclaren> echo $PATH

/bin:/usr/local/bin:/usr/bin:/usr/ucb:/etc:/sbin:/usr/sbin:/usr/openwin/bin:/usr/dt/bin:/home/colinbr/bin:/opt/NSCPcom:/opt/Acrobat4/bin:.:/opt/dbs/oracle/product/816/bin


This shows the list of those directories Solaris will search to find a command. Most UNIX commands will be in /bin , /usr/bin , /usr/sbin , /usr/ucb and /sbin , though some of these directories will be symbolic links to other directories and some commands will be restricted to the root user. Now try the following commands


5 colinbr@mclaren> which ps

/bin/ps

6 colinbr@mclaren> ls -l /usr/ucb/ps

-r-xr-xr-x 34 bin bin 5536 Aug 2 2000 /usr/ucb/ps


Command 5 indicates that the version of ps that colinbr will run by default is /bin/ps . Command 6 shows the directory listing for /usr/ucb/ps indicating the presence of a second version of ps . Simply typing ps will run /bin/ps . If colinbr wants to run the UCB version, the whole path, /usr/ucb/ps is required.


f) file


The file command is used to display a file’s content type. This goes beyond the mere file extension (.txt , .doc , .pdf) because, in UNIX, a file can effectively have multiple extensions (e.g. filename.tar.Z) and some filenames can consist purely of what Windows would consider to be an extension (e.g. .cshrc).


At its simplest, to determine a file’s content type use the following command:


file filename


The file program then tries to ascertain the file’s type based on certain rules contained in the so-called magic file. If the file type can be determined, the results are displayed. In general, if the file’s type cannot be determined, file will report the type as data. Some examples are shown below.


The following are all recognized UNIX file types


23 colinbr@williams> file sessions.tar.Z

sessions.tar.Z: compressed data block compressed 16 bits

38 colinbr@williams> file cb_scripts.tar

cb_scripts.tar: USTAR tar archive

24 colinbr@williams> file telnet_mclaren.tif

telnet_mclaren.tif: TIFF file, little-endian

28 colinbr@williams> file system_check

system_check: executable c-shell script

33 colinbr@williams> file file_check

file_check: executable shell script


The following are Word and PowerPoint documents respectively. They are reported as data because no rule for these file types exists in the magic file.


25 colinbr@williams> file SESSION1.DOC

SESSION1.DOC: data

26 colinbr@williams> file session1.ppt

session1.ppt: data


2) The tar Command


Tar is short for ‘Tape ARchive’ and has been a standard method of creating UNIX backups since time immemorial. The contents of a tar archive (or tarfile) are usually multiple files all glued together to create a single large file. Most often, such files are related to a single project or directory. A glance at the man page reveals tar to be an immensely complicated command. But as with everything in UNIX, it is not as scary as it first appears to be.


Tar has three basic functions: to create tarfiles from individual files and directories; to extract the contents of tarfiles; and to list the contents of tarfiles. Tarfiles can either be created on disk or (as the name implies) written to magnetic tape.


To create a tarfile, use tar with the c (create) option. In the simplest format, specify those files to be included in the tarfile on the command line. For instance


14 colinbr@williams> tar cf sessions.tar SESSION1.DOC SESSION2.DOC SESSION3.DOC SESSION4.DOC SESSION5.DOC

15 colinbr@williams> ls -l sessions.tar

-rw-r--r-- 1 colinbr it 250880 Jan 18 10:54 sessions.tar


The f option here specifies the name of the tarfile to create (sessions.tar) and the files SESSION1.DOC to SESSION5.DOC are added to the tarfile. If sessions.tar was replaced with the device driver for a tape drive (such as /dev/rmt/0) the tarfile would have been written on the tape.


Directories can also be specified on the command line as shown below:


10 colinbr@williams> tar cvf egs.tar examples

a examples/ 0K

a examples/install_gcc 2K

a examples/install_make 3K

a examples/binaries.html 3K

a examples/build.html 8K

a examples/configure.html 20K

a examples/jordan 16K

a examples/words1 1K

a examples/news.txt 2K

a examples/clancy_bio.txt 3K

a examples/words2 3K

a examples/Working/ 0K

a examples/Working/fred.txt 2K

a examples/Backups/ 0K

a examples/elf.txt 2K

a examples/dba1a.pdf 27K

a examples/Interactive/ 0K

a examples/Interactive/cshrc 1K

a examples/Interactive/cshrc.diff 1K

a examples/Interactive/cshrc.prev 1K

a examples/Interactive/ls 17K

a examples/Interactive/ls.txt 17K

a examples/cols_logins 1K

a examples/news1.txt 2K

11 colinbr@williams> ls -l egs.tar

-rw-r--r-- 1 colinbr it 140800 Jan 17 16:28 egs.tar


This command creates the tarfile egs.tar from the examples directory. The v option – for verbose – shows the operations tar is carrying out. The a in the first column indicates that tar is adding the following file to the archive. The size of each file added is also displayed. Notice that the subdirectories Working, Backups and Interactive are included in the tarfile.


To unpack the contents of a tarfile, specify tar with the x (extract) option. For example, to extract the sessions.tar file created earlier use:


27 colinbr@williams> tar xvf sessions.tar

x SESSION1.DOC, 96256 bytes, 188 tape blocks

x SESSION2.DOC, 89600 bytes, 175 tape blocks

x SESSION3.DOC, 46080 bytes, 90 tape blocks

x SESSION4.DOC, 7680 bytes, 15 tape blocks

x SESSION5.DOC, 7680 bytes, 15 tape blocks


Again, we use the v and f options to specify verbose mode and the tarfile to extract. The x at the start of each line indicates that the file has been extracted from the archive. The file name and size in bytes and tape blocks (one block is 512 bytes) are also shown. (This last measure is a throwback to tar’s early history; it is displayed whether the tarfile has been extracted from tape or disk.)


It is often useful to look at the contents of an archive before unpacking it. To list the contents of a tar archive, use tar with the t (table of contents) option. For example, to list the names of the files in the archive, simply use


45 colinbr@williams> tar tf sessions.tar

SESSION1.DOC

SESSION2.DOC

SESSION3.DOC

SESSION4.DOC

SESSION5.DOC


More information can, of course, be obtained using the ‘v’ option:


46 colinbr@williams> tar tvf sessions.tar

-rwxrw-rw- 11288/10000 96256 Sep 21 15:37 2001 SESSION1.DOC

-rwxrw-rw- 11288/10000 89600 Sep 11 16:56 2001 SESSION2.DOC

-rwxrw-rw- 11288/10000 46080 Jan 17 16:30 2002 SESSION3.DOC

-rwxr--r-- 11288/10000 7680 Sep 4 10:19 2001 SESSION4.DOC

-rwxr--r-- 11288/10000 7680 Sep 4 10:19 2001 SESSION5.DOC


Here we see the contents of the tarfile displayed in a similar format to the output of ls –l . The numbers 11288/10000 are the user ID (uid) and group ID (gid) of the user who owned the file(s) when the tarfile was made.


Tar Gotchas


A couple of points to note about tar



3) File Compression Commands


a) compress/uncompress/zcat


The standard UNIX compression program (/bin/compress) reduces a file’s size. How much of a reduction is occurs depends on the file type. Certain files – notably image files like GIF and TIF formats – do not compress at all well. Other file types even increase in size when compressed! Solaris compress, however, is smart enough to realize this and will not compress a file if its size would increase.


In general text files, Word documents, Excel spreadsheets, Adobe PDF files and so on will compress quite well. Tarfiles can also be compressed but the compression achieved will depend on the individual files in the archive.


To compress a file, simply use the compress command followed by the filename. The file will then be compressed and renamed to filename.Z . The original file will be deleted, saving on space. For example


70 colinbr@williams> ls -l CH_backup_review.doc

-rwxr--r-- 1 colinbr it 144896 Jan 18 12:14 CH_backup_review.doc

71 colinbr@williams> compress CH_backup_review.doc

72 colinbr@williams> ls -l CH_backup_review.doc.Z

-rwxr--r-- 1 colinbr it 61407 Jan 18 12:14 CH_backup_review.doc.Z

73 colinbr@williams> ls -l CH_backup_review.doc

CH_backup_review.doc: No such file or directory


Here we see the file CH_backup_review.doc being compressed from a size of 144896 bytes down to 61407 bytes (over 50% compression). The file is renamed to CH_backup_review.doc.Z and the original file deleted (which is why command 73 fails with the “No such file or directory” message).


Use the uncompress command to re-inflate a compressed file. For example:


10 colinbr@williams> ls -l CH_backup_review.doc.Z

-rwxr--r-- 1 colinbr it 61407 Jan 18 12:14 CH_backup_review.doc.Z

11 colinbr@williams> uncompress CH_backup_review.doc.Z

12 colinbr@williams> ls -l CH_backup_review.doc

-rwxr--r-- 1 colinbr it 144896 Jan 18 12:14 CH_backup_review.doc


Some compressed files can be displayed on-screen without having to uncompress them first using the zcat command. This technique works particularly well on text files: an attempt to zcat a compressed Word document will most likely foul up your terminal session. To use zcat simply supply the file name as an argument to the zcat command, for example:


18 colinbr@williams> zcat solaris_examples.txt.Z


Command 18 will display the uncompressed contents of solaris_examples.txt.Z on the screen. The file itself will not be uncompressed to disk, therefore no solaris_examples.txt file will be created, and disk space will not be consumed. All the uncompression is handled in the computer’s memory. This leads us to an interesting technique that can be used with compressed tarfiles.


Consider the file sessions.tar we looked at earlier. If we compress this file we get the following results:


26 colinbr@williams> ls -l sessions.tar

-rw-r--r-- 1 colinbr it 265216 Feb 14 15:36 sessions.tar

27 colinbr@williams> compress sessions.tar

28 colinbr@williams> ls -l sessions.tar.Z

-rw-r--r-- 1 colinbr it 100155 Feb 14 15:36 sessions.tar.Z


If we uncompress sessions.tar.Z we will need about 165Kb (i.e. the difference in size between sessions.tar and sessions.tar.Z) more disk space to hold the uncompressed file. This is not a massive difference but is sufficient to illustrate a point: we could easily have been dealing with a couple of gigabytes of compressed data which might double in size when uncompressed. So, to conserve disk space, we use a little bit of UNIX wizardry


30 colinbr@williams> zcat sessions.tar.Z | tar xvf -

x SESSION1.DOC, 100352 bytes, 196 tape blocks

x SESSION2.DOC, 89600 bytes, 175 tape blocks

x SESSION3.DOC, 56320 bytes, 110 tape blocks

x SESSION4.DOC, 7680 bytes, 15 tape blocks

x SESSION5.DOC, 7680 bytes, 15 tape blocks


We use zcat on the compressed file sessions.tar.Z but instead of having the output go to the screen (which is zcat’s default behaviour) we pipe the output into the tar command. Remember that the f option to tar tells the command which file to read or create. The sign instructs tar to read from its standard input stream which, in this case, is coming via the pipe from the zcat command. The practical outcome of this is to extract the compressed archive without first having to uncompress the file to disk.


b) gzip/gunzip – also GNU zcat


The Free Software Foundation’s GNU Project is responsible for many open-source free software products. Many of the applications and programs running under the Linux operating system have been written and freely distributed by developers working under the GNU project. Some of these programs have also been converted (or ‘ported’) to run under Solaris. Gzip and gunzip are examples of such programs.


Gzip and gunzip are analogous to the standard UNIX compress and uncompress commands and can be used in much the same ways as described above. But they have several advantages over the older programs.


Firstly, gzip uses a superior compression algorithm. For example, a text file compressed using compress might shrink by 50-60%; with gzip, the compression is typically 60-70%. Gzip will also bring about better compression with other data types, even those files which compress would leave unchanged.


Secondly, gunzip can handle files produced by gzip (of course), UNIX compress and the PKWARE family of utilities (such as pkzip and WinZip) which are often found on PCs.


To use gzip and gunzip, you must first either have the /usr/local/bin directory in your $PATH environment variable, or prefix the command with /usr/local/bin . The following examples assume the former.


18 colinbr@mclaren> which gzip

/usr/local/bin/gzip

19 colinbr@mclaren> which gunzip

/usr/local/bin/gunzip

20 colinbr@mclaren> which zcat

/bin/zcat


Note this gotcha! By default, the /bin/zcat command will be used. To use the GNU zcat command, we will have to reference it directly as /usr/local/bin/zcat .


To gzip a file, use the following command


gzip filename(s)


Or,


34 colinbr@mclaren> ls -l dba1a.pdf

-rw-r--r-- 1 colinbr it 26888 Sep 4 14:58 dba1a.pdf

35 colinbr@mclaren> gzip dba1a.pdf

36 colinbr@mclaren> ls -l dba1a.pdf.gz

-rw-r--r-- 1 colinbr it 18037 Sep 4 14:58 dba1a.pdf.gz

37 colinbr@mclaren> ls -l dba1a.pdf

dba1a.pdf: No such file or directory


Note the .gz extension appended to the gzipped PDF file and that, as with compress, the original file has been deleted. Wildcards can also be used:


38 colinbr@mclaren> ls -l *.txt

-rw-r--r-- 1 colinbr it 2196 Sep 4 14:34 clancy_bio.txt

-rw-r--r-- 1 colinbr it 1727 Sep 4 14:34 elf.txt

-rw-r--r-- 1 colinbr it 1100 Sep 4 14:55 news.txt

-rw-r--r-- 1 colinbr it 1100 Oct 8 15:06 news1.txt

39 colinbr@mclaren> gzip *.txt

40 colinbr@mclaren> ls -l *.txt.gz

-rw-r--r-- 1 colinbr it 1227 Sep 4 14:34 clancy_bio.txt.gz

-rw-r--r-- 1 colinbr it 974 Sep 4 14:34 elf.txt.gz

-rw-r--r-- 1 colinbr it 619 Sep 4 14:55 news.txt.gz

-rw-r--r-- 1 colinbr it 620 Oct 8 15:06 news1.txt.gz


GNU zcat can then be used to view these gzipped text files without gunzipping them. The following example pipes the zcat output through the more command.


41 colinbr@mclaren> /usr/local/bin/zcat clancy_bio.txt.gz | more

Extract from the Tom Clancy Biography


CLANCY, THOMAS (TOM) L., JR. (1947-), American novelist, was born in Baltimore, Maryland, the son of a mailcarrier and a department store credit employee. He received his primary and secondary education at Baltimore-area Catholic ...


Finally, we can use gunzip to decompress the either an individual file or groups of file using wild cards:


43 colinbr@mclaren> ls -l *.gz

-rw-r--r-- 1 colinbr it 1227 Sep 4 14:34 clancy_bio.txt.gz

-rw-r--r-- 1 colinbr it 18037 Sep 4 14:58 dba1a.pdf.gz

-rw-r--r-- 1 colinbr it 974 Sep 4 14:34 elf.txt.gz

-rw-r--r-- 1 colinbr it 619 Sep 4 14:55 news.txt.gz

-rw-r--r-- 1 colinbr it 620 Oct 8 15:06 news1.txt.gz

44 colinbr@mclaren> gunzip dba1a.pdf.gz

45 colinbr@mclaren> gunzip *.txt.gz

46 colinbr@mclaren> ls -l *.pdf* *.txt*

-rw-r--r-- 1 colinbr it 2196 Sep 4 14:34 clancy_bio.txt

-rw-r--r-- 1 colinbr it 26888 Sep 4 14:58 dba1a.pdf

-rw-r--r-- 1 colinbr it 1727 Sep 4 14:34 elf.txt

-rw-r--r-- 1 colinbr it 1100 Sep 4 14:55 news.txt

-rw-r--r-- 1 colinbr it 1100 Oct 8 15:06 news1.txt


The wildcards used in command 46’s ls statement should have picked up any files with extra characters after the .txt or .pdf extensions. No files matching these patterns are found, indicating that the *.gz files have been removed after the decompression process.


Please note that there is no on-line help for GNU commands on Solaris 7 systems like mclaren. Instead, extensive help is available at the GNU website: http://www.gnu.org/

4) Massively Useful Commands


a) grep


Grep , from the ed editor’s global/Regular Expression/print command (sometimes rendered as g/RE/p), is a pattern matching command that prints occurrences of a string in a text file. The string RE will be used as a placeholder for the string we are searching for. At its simplest, grep can be used as follows


grep RE filename


Where RE is the regular expression you are searching for. The filename argument can, of course, be a number of files either specified individually or with UNIX wildcards, for example


grep RE file1 file2 fileA

grep RE *.txt


As with most other UNIX commands, optional flags given to grep will alter the default behaviour. Commonly used options are


-i to ignore case, otherwise the search is case-sensitive

-n to provide line numbers

-v to invert the search and print lines not matching the RE

-c to count the number of lines containing the RE


However, it is in the formulation of the regular expression that both grep’s power and complexity can be seen. A regular expression can be thought of as being similar to the wildcards used in specifying filenames. However, the syntax differs between filename wildcards and regular expressions. The UNIX man page for regexp contains vast detail on how REs are defined and constructed. These notes – while not complete - attempt to summarise the man page into usable information.


A single alphanumeric character matches itself


Special characters are . * [ ] \ ^ $


. matches a single occurrence of any single character


* when following a single character matches zero or more occurrences of that character. This construct should not be confused with the * as used in filename wildcards.


[ defines the start of a set of characters, any of which may be matched as a single-character RE


] closes the set defined above


^ constrains the regular expression to be at the beginning of a line


$ constrains the regular expression to be at the end of a line


^RE$ as a complex regular expression constrains the entire line to match


\ switches off (or escapes) the special meanings of these characters


Examples of single-character REs


grep [AB] filename


look for either A or B as single characters within filename


Within [] a range of characters can be specified using a - sign to separate members of a continuous series. Such ranges can also be combined with single characters outside of the range(s) specified. For example


grep [A-Z] filename


looks for those lines in filename containing at least one upper case character.


grep "^[A-Z]" filename


find only those lines which begin with a capital letter


grep '[A-Z]$' filename


find only those lines which end with a capital letter. Note the use of single quotes here. With "" this command fails with a "Variable syntax" error as the shell tries to interpret $" as a variable.


Of course, single character regular expressions are not particularly useful. More often, grep is used to search for words, fragments of words or , within a file. To do this, use a combination of character strings and the pattern-matching wildcards described above.


This first example simply looks for the string host in /usr/dict/words (the spellchecker’s datafile)


144 colinbr@williams> grep host /usr/dict/words

ghost

ghostlike

ghostly

host

hostage

hostelry

hostess

hostile

hostler


But the next command looks for the string host followed by either l or e (i.e. hostl or hoste) in the same file


147 colinbr@williams> grep "host[le]" /usr/dict/words

ghostlike

ghostly

hostelry

hostess

hostler


We can constrain the search to match patterns at the beginning or end of a line as the following examples show. Commands 154 and 155 illustrate how to constrain a search to the beginning of a line. In these examples, it is easy to see the difference in output.


154 colinbr@williams> grep alu /usr/dict/words

alum

alumina

aluminate

alumna

alumnae

alumni

alumnus

alundum

balustrade

Calumet

calumniate

calumny

Daedalus

dialup

eigenvalue

evaluable

evaluate

hifalutin

highfalutin

invaluable

salubrious

salutary

salutation

salute

talus

tantalum

Tantalus

valuate

value


155 colinbr@williams> grep '^alu' /usr/dict/words

alum

alumina

aluminate

alumna

alumnae

alumni

alumnus

alundum


In command 156, we simply search for the string like , no matter where it occurs in the line, and this results in the matches Billiken, liken and so on. With command 157, we look for lines that end in like and the results list omits some of the earlier matches.


156 colinbr@williams> grep like /usr/dict/words

alike

Billiken

birdlike

brushlike

catlike

childlike

Christlike

cranelike

dreamlike

gemlike

ghostlike

godlike

knifelike

ladylike

lifelike

like

liken

likewise

machinelike

snakelike

statesmanlike

swanlike

warlike

workmanlike


157 colinbr@williams> grep 'like$' /usr/dict/words

alike

birdlike

brushlike

catlike

childlike

Christlike

cranelike

dreamlike

gemlike

ghostlike

godlike

knifelike

ladylike

lifelike

like

machinelike

snakelike

statesmanlike

swanlike

warlike

workmanlike


A search can be constrained to match an entire line by combining the ^ and $ operators. A simple example is shown below.


159 colinbr@williams> grep '^like$' /usr/dict/words

like


Using such a regular expression, and remembering that a . matches any single character, makes it possible to solve crossword clues using grep! For example, given a pattern of known and unknown letters in a crossword grid, grep for the answer using . to represent blank letters together with the known letters, constraining the entire search to match whole words (i.e. an entire line of the dictionary file), thus:


161 colinbr@williams> grep –i '^.x....n.$' /usr/dict/words

existent

exponent

exultant


There’s no guarantee that one of these will be your correct answer (the dictionary file has only 25000 entries) but it is a useful cheat! The only ‘gotcha’ here, particularly if you have the first letter of the answer, is to use a case-insensitive search (using the –i option to grep) in case your possible answer is a proper noun and would begin with a capital letter.


Lastly, consider the following:


grep “the*” /usr/dict/words


This expression does not mean “search for all words containing the letters the followed by any other letters”. It means “search for all words containing the letters th, followed by zero or more occurrences of the letter e”. In practice, this command will find all occurrences of the letters th, whether or not they are followed by an e, and is functionally equivalent to:


grep th /usr/dict/words


b) find


The find command is used to locate files and/or directories within the UNIX filesystem. It will be particularly useful to those users whose jobs span multiple responsibilities (such as Publishing and Data Conversion). At its simplest, find is used to answer the question “Where did I save that document?” But as with many commands in UNIX, find is capable of so much more.


Unfortunately the man page for find is not as informative as it could be. While the man pages for many commands show an extensive SYNOPSIS section summarizing the command, the find man page shows only the following:


SYNOPSIS

find path ... expression


Which is not particularly helpful. To get to the really useful information, you have to read the whole man page, some 500+ lines of text! In essence though, what the SYNOPSIS is saying is “find, in the path(s) specified, files matching the supplied expression”.


The path can be any absolute or relative pathname (or names) formed by the usual UNIX rules for such things. For example


15 colinbr@williams> find /dc/elp

16 colinbr@williams> find .

17 colinbr@williams> find ./TEST ./PROJECTS


If no expression is supplied, find will by default, print the names and paths of any files it finds. A find down through the /dc/elp directory will probably produce thousands of lines of output. Most normally, you will want to narrow the search to specific filenames, files owned by particular users, files of a certain size, or files modified by a certain date. All of these functions are described in the find man page but a summary of the most useful functions is presented below


-name pattern find files matching a certain wildcard pattern

-user name find files belonging to the named user

-type letter find files of type letter where letter can be b, c, D, f, l, p, or s for various types of UNIX files.

-mtime number find files modified a certain number of days ago

-size number find files of a certain size in UNIX blocks (512 bytes). If the supplied number is followed by the letter c, find will check the file size in bytes.


A couple of simple examples are presented below:


16 colinbr@williams> find . -name "*.txt"

17 colinbr@williams> find . -user colinbr


Command 16 will find all files under the current directory with a .txt extension, while command 17 finds all files owned by the user colinbr . The –mtime and –size expressions accept a numerical argument which may be preceded by a + or a sign, to indicate a value greater than, less than or exactly equal to (if no sign is given) the supplied number. For example


18 colinbr@williams> find . -mtime +7

19 colinbr@williams> find . -mtime –7

20 colinbr@williams> find . -mtime 7


Command 18 looks for files modified more than 7 days ago; command 19 finds those files modified less than 7 days ago; and command 20 finds those files modified exactly 7 days ago. A similar principle applies to the –size function.


To search for files of a particular type (normal files, directories, symbolic links) use the -type option:


30 colinbr@williams> find . -type d


This will search only for directories.


Of course, the options can be combined:


31 colinbr@williams> find . -name "*.txt" -mtime +7 -size +1000c


This command (31) searches for all files with a .txt extension, modified over 7 days ago and with a file size of over 1000 characters (bytes). Command 32, below, looks only for directories beginning with the letter (uppercase) T


32 colinbr@williams> find . -type d -name "T*"


Find options can be switched off or negated using the ! (exclamation mark) modifier. For example, in a project directory in which several users have created files, to find those files not owned by user fred , use


33 colinbr@williams> find /dc/elp/c20ep ! –user fred


So far we have been using find solely to search for files. But find can do so much more than this. The command can perform an action on any files it locates matching the search criteria. Find’s default action is to print the path to any files it finds. This can be specified by the –print command line option. The following commands are functionally equivalent:


40 colinbr@williams> find . -name “*.txt” –print

41 colinbr@williams> find . -name “*.txt”


Other actions can be specified. The simplest is the –ls option, which presents similar information to the ls –l output on any files found:


42 colinbr@williams> find . -name "*.txt" –ls


More complex actions can be performed using the –exec and –ok functions: -exec will run a command on the found file, while –ok will prompt the user for confirmation before carrying out the operation. Consider the following:


43 colinbr@williams> find . -name core -exec rm {} \;


Command 43 will find any files called core and remove them, while command 44 below will prompt the user before removing the file:


44 colinbr@williams> find . -name core -ok rm {} \;

< rm ... ./TEST/core >? y


The {} \; at the end of these commands are a necessary part of the find syntax. Essentially, the {} act as a placeholder for the file which has been found and the \; terminates the command being exec-ed.


As with performing mass wildcard deletes with the rm command, take care when using the –exec and –ok options to find!

c) sort


The sort command is an extremely flexible utility for sorting text files. As with many UNIX commands, sort can be used either very simply or with complexity approaching fiendish.


At its simplest, sort works on one or more columns of a text file (or files as it will accept more than one input file) and writes its output to the screen. Important options to sort are described below


-n perform a numerical sort; by default, the sort is ASCII

-r reverse the sort order

-o filename write the output to filename

-t char use char (a single character) as the field separator

-u perform a unique sort, effectively discarding duplicate lines from the input

-k keydef defines the key on which to sort the input


Most of these options are self-explanatory. It is the –k option which requires some clarification as it is this which defines the columns, and the characters within those columns, which are to be sorted. Sort defines a field as being delimited with whitespace (spaces or tabs). This is an important factor in using sort.


The general format of a keydef specification is shown below:


-k field_start [type] [,field_end [type] ]


where, remembering that parameters in [] are optional:


field_start is the starting column

field_end is the ending column and is optional

type is an optional single character modifier from the set bdfiMnr which performs the same task as the corresponding command line flag (see the man page for details).


The field_start and field_end parameters may each be composed of two numbers separated by a . as follows


field_start = field_number[.start_character]

field_end = field_number[.end_character]


Fields and characters within fields are numbered from 1. Again, as indicated by [], the .start_character and .end_character are optional. Some examples should clarify this. Consider the following datafile:


27 colinbr@williams> cat sort_test

102966 009EAF 372492 619EAF

37238 003XAF 74944 413EAF

60647 005EAF 210868 414EAF

66078 006EAB 197462 515EAF

66576 011EAF 667358 420EAF

74564 011EAF 129028 421EAF

227680 013EAF 234611 422JFK

154837 115EAF 7956033 435EAF

71628 007EAF 457557 417EAF

79131 008EAF 867863 418EAF


Now run the following sorts on the file sort_test. The characters that form the keydef are shown in bold.



Simple sort on column 1


28 colinbr@williams> sort -k 1 sort_test

37238 003XAF 74944 413EAF

60647 005EAF 210868 414EAF

66078 006EAB 197462 515EAF

66576 011EAF 667358 420EAF

71628 007EAF 457557 417EAF

74564 011EAF 129028 421EAF

79131 008EAF 867863 418EAF

102966 009EAF 372492 619EAF

154837 115EAF 7956033 435EAF

227680 013EAF 234611 422JFK


Numeric sort on column 1


29 colinbr@williams> sort -n -k 1 sort_test

37238 003XAF 74944 413EAF

60647 005EAF 210868 414EAF

66078 006EAB 197462 515EAF

66576 011EAF 667358 420EAF

71628 007EAF 457557 417EAF

74564 011EAF 129028 421EAF

79131 008EAF 867863 418EAF

102966 009EAF 372492 619EAF

154837 115EAF 7956033 435EAF

227680 013EAF 234611 422JFK


Notice that column 1 is sorted in numerical order and that commands 28 and 29

are functionally equivalent.


Numeric sort on column 3


30 colinbr@williams> sort -n -k 3 sort_test

37238 003XAF 74944 413EAF

74564 011EAF 129028 421EAF

66078 006EAB 197462 515EAF

60647 005EAF 210868 414EAF

227680 013EAF 234611 422JFK

102966 009EAF 372492 619EAF

71628 007EAF 457557 417EAF

66576 011EAF 667358 420EAF

79131 008EAF 867863 418EAF

154837 115EAF 7956033 435EAF


Reverse Numeric sort on column 3


31 colinbr@williams> sort -n -r -k 3 sort_test

154837 115EAF 7956033 435EAF

79131 008EAF 867863 418EAF

66576 011EAF 667358 420EAF

71628 007EAF 457557 417EAF

102966 009EAF 372492 619EAF

227680 013EAF 234611 422JFK

60647 005EAF 210868 414EAF

66078 006EAB 197462 515EAF

74564 011EAF 129028 421EAF

37238 003XAF 74944 413EAF


Simple sort on column 2


43 colinbr@williams> sort -k 2 sort_test

37238 003XAF 74944 413EAF

60647 005EAF 210868 414EAF

66078 006EAB 197462 515EAF

71628 007EAF 457557 417EAF

79131 008EAF 867863 418EAF

102966 009EAF 372492 619EAF

74564 011EAF 129028 421EAF

66576 011EAF 667358 420EAF

227680 013EAF 234611 422JFK

154837 115EAF 7956033 435EAF


Reverse sort on column 2


44 colinbr@williams> sort -r -k 2 sort_test

154837 115EAF 7956033 435EAF

227680 013EAF 234611 422JFK

66576 011EAF 667358 420EAF

74564 011EAF 129028 421EAF

102966 009EAF 372492 619EAF

79131 008EAF 867863 418EAF

71628 007EAF 457557 417EAF

66078 006EAB 197462 515EAF

60647 005EAF 210868 414EAF

37238 003XAF 74944 413EAF


Sort on the third character of column 1


46 colinbr@williams> sort -k 1.3 sort_test

60647 005EAF 210868 414EAF

71628 007EAF 457557 417EAF

102966 009EAF 372492 619EAF

74564 011EAF 129028 421EAF

154837 115EAF 7956033 435EAF

66078 006EAB 197462 515EAF

66576 011EAF 667358 420EAF

37238 003XAF 74944 413EAF

227680 013EAF 234611 422JFK

79131 008EAF 867863 418EAF


Sort on the third NON-BLANK character of column 1, using the b type modifier


47 colinbr@williams> sort -k 1.3b sort_test

66078 006EAB 197462 515EAF

79131 008EAF 867863 418EAF

37238 003XAF 74944 413EAF

102966 009EAF 372492 619EAF

154837 115EAF 7956033 435EAF

74564 011EAF 129028 421EAF

66576 011EAF 667358 420EAF

71628 007EAF 457557 417EAF

60647 005EAF 210868 414EAF

227680 013EAF 234611 422JFK


Sort on columns 2 and 3


48 colinbr@williams> sort -k 2,3 sort_test

37238 003XAF 74944 413EAF

60647 005EAF 210868 414EAF

66078 006EAB 197462 515EAF

71628 007EAF 457557 417EAF

79131 008EAF 867863 418EAF

102966 009EAF 372492 619EAF

74564 011EAF 129028 421EAF

66576 011EAF 667358 420EAF

227680 013EAF 234611 422JFK

154837 115EAF 7956033 435EAF


Sort on the fourth NON-BLANK character of column 2


50 colinbr@williams> sort -k 2.8 sort_test

66078 006EAB 197462 515EAF

74564 011EAF 129028 421EAF

60647 005EAF 210868 414EAF

227680 013EAF 234611 422JFK

102966 009EAF 372492 619EAF

71628 007EAF 457557 417EAF

66576 011EAF 667358 420EAF

79131 008EAF 867863 418EAF

154837 115EAF 7956033 435EAF

37238 003XAF 74944 413EAF


5) Summary: The UNIX Top 20


The following is a list of those basic UNIX user commands that, with practice, will enable you to use UNIX effectively on a day-to-day basis.


  1. cd/pwd

  2. ls

  3. rm/rmdir

  4. cat/more/pg/head/tail

  5. mkdir

  6. vi

  7. grep

  8. sort

  9. find

  10. tar

  11. ftp

  12. which

  13. ps

  14. compress/uncompress/zcat

  15. cp

  16. mv

  17. man

  18. who/w

  19. ln

  20. file


Most of these commands have been reviewed in these training sessions. The exceptions are the ftp command and the vi text editor.