What is a Process in Linux

Processes are an instance of an executing program. In other operating systems, programs are often large, elaborate, graphical applications that take a noticeably long time to start up. In the Linux (and Unix) world, these types of programs exist as well, but so do a whole class of programs that usually have no counterpart in other operating systems. These programs are designed to be quick to start, specialized in function, and play well with others. On a Linux system, processes running these programs are constantly popping into and out of existence. For example, consider the user maxwell performing the following command line.

$ ps aux | grep httpd > daemons.$(date +%d%b%y)

In the split second that the command line took to execute, no less four than processes (ps, grep, bash, and date) were started, did their thing, and exited.

What is a Process?

By this point, you could well be tired of hearing the answer: a process in an instance of a running program. Here, however, we provide a more detailed list of the components that constitute a process.

Execution Context

Every process exists (at least to some extent) within the physical memory of the machine. Because Linux (and Unix) is designed to be a multiuser environment, the memory allocated to a process is protected, and no other process can access it. In its memory, a process loads a copy of its executable instructions, and stores any other dynamic information it is managing. A process also carries parameters associated with how often it gets the opportunity to access the CPU, such as its execution state and its niceness value (more on these soon).

I/O Context

Every process interacts to some extent with the filesystem in order to read or write information that exists before or will exist after the lifespan of the process. Elements of a process’s input/output context include the following.

Open File Descriptors

Almost every process is reading information from or writing information to external sources, usually both. In Linux, open file descriptors act as sources or sinks of information. Processes read information from or write information to file descriptors, which may be connected to regular files, device nodes, network sockets, or even each other as pipes (allowing interprocess communication).

Memory Mapped Files

Memory mapped files are files whose contents have been mapped directly into the process’s memory. Rather than reading or writing to a file descriptor, the process just accesses the appropriate memory address. Memory maps are most often used to load a process’s executable code, but may also be used for other types of non-sequential access to data.

Filesystem Context

We have encountered several pieces of information related to the filesystem that processes maintain, such as the process’s current working directory (for translating relative file references) and the process’s umask (for setting permissions on newly created files).

Environment Variables

Every process maintains its own list of name-value pairs, referred to as environment variables, or collectively as the process’s environment. Processes generally inherit their environment on startup and may refer to it for information such as the user’s preferred language or favorite editor.

Heritage Information

Every process is identified by a PID, or process id, which is assigned when it is created. In a later post, we will discover that every process has a clearly defined parent and possibly well-defined children. A process’s own identity, the identity of its children, and to some extent the identity of its siblings are maintained by the process.

Credentials

Every process runs under the context of a given user (or, more exactly, a given user id), and under the context of a collection of group id’s (generally, all of the groups that the user belongs to). These credentials limit what resources a process can access, such as which files it can open or with which other processes it is allowed to communicate.

Resource Statistics and Limits

Every process also records statistics to track the extent to which system resources have been utilized, such as its memory size, its number of open files, its amount of CPU time, and others. The amount of many of these resources that a process is allowed to use can also be limited, a concept called resource limits.

Viewing Processes with the ps Command

We have already encountered the ps command many times. Now, we will attempt to familiarize ourselves with a broader selection of the many command-line switches associated with it. A quick ps –help will display a summary of over 50 different switches for customizing the ps command’s behavior. To complicate matters, different versions of Unix have developed their own versions of the ps command, which do not use the same command line switch conventions. The Linux version of the ps command tries to be as accommodating as possible to people from different Unix backgrounds, and often there are multiple switches for any given option, some of which start with a conventional leading hyphen (“-”), and some of which do not.

Process Selection

By default, the ps command lists all processes started from a user’s terminal. While reasonable when users connected to Unix boxes using serial line terminals, this behavior seems a bit minimalist when every terminal window within an X graphical environment is treated as a separate terminal. The following command line switches can be used to expand (or reduce) the processes which the ps command lists.

Switch	Which Processes are Listed
-A, -e, ax	All processes.
-C command	All instances of command
-U, –user, –User user	All processes belonging to user
-t, –tty terminal	All processes started from terminal
-p, p, –pid N	Process with pid N

Output Selection

As implied by the initial paragraphs of this Lesson, there are many parameters associated with processes, too many to display in a standard terminal width of 80 columns. The following table lists common command line switches used to select what aspects of a process are listed.

Switch	Output Format
-f	“full” listing
-l, l	long format
-j, j	jobs format
-o, o, –format str	user defined format, using fields specified by str (Available fields for str can be listed with ps L, or by consulting the ps(1) man page.)

Additionally, the following switches can be used to modify how the selected information is displayed.

Switch	Output Format
-f	“full” listing
-l, l	long format
-j, j	jobs format
-o, o, –format str	user defined format, using fields specified by str (Available fields for str can be listed with ps L, or by consulting the ps(1) man page.)

Oddities of the ps Command

The ps command, probably more so than any other command in Linux, has oddities associated with its command-line switches. In practice, users tend to experiment until they find combinations that work for them, and then stick to them. For example, the author prefers ps aux for a general-purpose listing of all processes, while many people prefer ps -ef. The above tables should provide a reasonable “working set” for the novice.

The command-line switches tend to fall into two categories, those with the traditional leading hyphen (“Unix98” style options), and those without (“BSD” style options). Often, a given functionality will be represented by one of each. When grouping multiple single letter switches, only switches of the same style can be grouped. For example, ps axf is the same as ps a x f, not ps a x -f.

Monitoring Processes with the top Command

The ps command displays statistics for specified processes at the instant that the command is run, providing a snapshot of an instance in time. In contrast, the top command is useful for monitoring the general state of affairs of processes on the machine.

The top command is intended to be run from within a terminal. It will replace the command line with a table of currently running processes, which updates every few seconds. The following demonstrates a user’s screen after running the top command.

While the command is running, the keyboard is “live”. In other words, the top command will respond to single key presses without waiting for a return key. The following table lists some of the more commonly used keys.

Key Press	Command
q	quit
h or ?	help
s	set the delay between updates (in seconds)
space	update display
M	Sort processes by Memory Size
P	Sort processes by CPU (Processor) Activity
u	Reduce display to processes owned by a specific user
k	Kill a process (send a process a signal)
r	Renice a process

The last two command, which either kill or renice a process, use concepts that we will cover in more detail in a later Lesson. Although most often run without command line configuration, top does support the following command line switches.

Switch	Effect
-d secs	Delay secs seconds between refreshes (Default = 5 seconds).
-q	Refresh as often as possible.
-nN	Run for N iterations, then exit.
-b	Run in “batch mode”, writing simply as if to a dumb terminal.

Locating processes with the pgrep Command

Often, users are trying to locate information about processes identified by the command they are running, or the user who is running them. One technique is to list all processes and use the grep command to reduce the information. In the following, maxwell first looks for all instances of the sshd daemon, and then for all processes owned by the user maxwell.

$ ps aux | grep sshd
root     829    0.0   0.0   3436   4    ?      S   09:13   0:00   /usr/sbin/sshd
maxwell  2200   0.0   0.2   3572   640  pts/8  S   10:10   0:00   grep sshd

$ ps aux | grep maxwell
root       2109   0.0   0.3   4108   876    pts/8  S  10:05   0:00  su - maxwell
maxwell    2112   0.0   0.4   4312   1268   pts/8  S  10:05   0:00  -bash
maxwell    2146   1.4   8.3   89256  21232  pts/8  S  10:05   0:04  /usr/lib/mozilla-
maxwell    2201   0.0   0.2   2676   724    pts/8  R  10:10   0:00  ps aux
maxwell    2202   0.0   0.2   3576   644    pts/8  S  10:10   0:00  grep maxwell

While maxwell can find the information he needs, there are some unpleasant issues.

The approach is not exacting. Notice that, in the second search, a su process showed up, not because it was owned by maxwell, but because the word maxwell was one of its arguments.
Similarly, the grep command itself usually shows up in the output.
The compound command can be awkward to type.

In order to address these issues, the pgrep command was created. Named pgrep for obvious reasons, the command allows users to quickly list processes by command name, user, terminal, or group.

pgrep [SWITCHES] [PATTERN]

Its optional argument, if supplied, is interpreted as an extended regular expression pattern to be matched against command names. The following command line switches may also be used to qualify the search.

Switch	Effect
-n	Select only the newest (most recently started) matching process.
-u USER	Select processes owned by the user USER.
-t TERM	Select processes controlled by terminal TERM.

In addition, the following command line switches can be use to qualify the output formatting of the command.

Switch	Effect
-d delimiter	Use delimiter to delimit each process ID (by default, a newline is used).
-l	List process name as well as process ID.

For a complete list of switches, consult the pgrep man page. As a quick example, maxwell will repeat his two previous process listings, using the pgrep command.

$ pgrep -l sshd
829 sshd

$ pgrep -lu maxwell
2112 bash
2146 mozilla-bin
2155 mozilla-bin
2156 mozilla-bin
2157 mozilla-bin