Topic: System Load and Process Monitoring
One of the activities any UNIX administrator performs regularly is watching system load and the processes that are running on a computer. Both activities help identify trouble and are the first steps toward keeping a system running smoothly. One errant process that is consuming lots of CPU time or memory can make a lot of users unhappy.
There are a number of commands that report on processes and system activity, but I usually start with the w command. The command is a relative of the who command, which is the source of its abbreviated name.
Here's some sample output:
7:46pm up 7 days, 12:06, 2 users, load average: 0.08, 0.04, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
kgirrard ttyS2 - 6:52pm 37.00s 0.13s 0.01s pppd
kgirrard pts/0 10.209.10.9 6:52pm 0.00s 0.10s 0.04s w
The first line provides a lot of information about the sytem in general, followed by information about each user currently logged into UNIX. Let's take it one line at a time, one piece at a time, as there's a lot going on here.
Displayed first is the current time, then how long the system has been up (in days, and hours). Following these are system load averages for the last minute, five minutes, and fifteen minutes. Load, in this case, is the average length of the process queue.
What is the process queue? UNIX runs a lot of programs at the same time but makes them all share system resources fairly. Memory and disk storage space are examples of system resources. Another one is the CPU itself. Each running program that needs CPU time (which is typically any process not waiting for some type of input or output to occur) is placed in the process queue until its number is called. It then gets a little time to excute (a timeslice) after which UNIX stops it to give time to others. If a process hasn't finished its work in its allotted time, it goes right back in the queue.
Typical loads for servers are quite low since most activity is dealing with input or output and not CPU-intensive things like finding the answer to the universe. In the example output from w above, the one-minute average is 0.08; this is fairly typical on shark. When it gets higher than 0.5 I start looking for problems. When it gets as high as ten or fifteen, problems start looking for me.
The remainder of w's output pertains to each logged in user (those connected to a terminal-type connection using telnet, or directly to a serial port). Information listed is:
USER: Their UNIX login, or the first eight characters of it anyway
TTY: How they are connected, a pts/? for telnets, or the serial port number
FROM: Where the network connection is from
LOGIN@: When they logged in, hour and minute, or which day if they've been connected that long
IDLE: How long it's been since they typed anything
JCPU: The amount of CPU time consumed by all their active processes
PCPU: The amount of CPU time consumed by the current process
WHAT: Name of the current process.
If someone is logged in and running something that is consuming lots of CPU time, it may be revealed in the JCPU or PCPU columns. Since these are cummulative, a process that doesn't use too much time but has been running for a long time, will report a fairly high number. Notice that although I had been logged in over an hour, both my logins combined don't report more that half a second of CPU time used. Makes you wonder why people need 750MHz Pentiums on their desks...
The Manual Page for w
Back to UNIX Command of the Day