Linux overview for SCEC


Why Unix and linux are important in an academic environment. How to install linux. Anticipated support issues.

Chapter 1. Why install it? What are the minimal things necessary to know to install it? How do I perform a default installation ?

Why do Unix and linux persist in the academic environment?

For many reasons, it is commonplace for programs written by the academic community to be given away. Since there are differences in the local computing environment from institution to institution, many are given away in source code form. Most Unix implementations provide a rich set of tools to programmers that help with the mechanics of this exchange. For example, C/C++/FORTRAN compilers, make utilites, revision control utilities, powerful editors, and automatic configuration utilities. Linux fosters this exchange, as the same free operating system can now be made to run on many different hardware architectures (Intel, SPARC, Alpha)

Early versions of Unix (especially the Berkeley Standard Distribution, or BSD) were used to implement and test the early Internet protocols. As changes were made to existing protocols and new protocols were added, much of this code became part of BSD. Many modern operating systems (Solaris, Linux, Irix) contain parts of or are based upon BSD. Hence many reference implementations of core Internet services (SMTP/POP/HTTP/FTP/NNTP) are written for Unix. Because of this, many servers on the internet run a flavor of Unix or Linux.

Programmers like to work on stable, crash-free computers. Providers of important internet services demand stable, crash-free computers. Most Unix implementations have been incrementally refined for over a decade. I feel comfortable saying that Unix/Linux is currently the most stable operating system that is commonly available.

Basics for understanding installation

Most linux installations require a minimum of two partitions. One will contain a filesystem and will be mounted at /, while the other is a "raw" partition whose blocks are used for paging and swapping. There can be a reliability advantage to having filesystems occupy their own partitions, as the filling of a filesystem on one partition will have no effect on a filesystem occupying a second partition. Candidate filesystems for their own partition include /usr, /var, /boot, /usr/local, /opt, and /home.

Unlike the Mac and Windows, where a single vendor controls the user interface and the look and feel, Unix user interfaces are built upon X, which has no look and feel philosophy at all. Hence there are as many user interfaces for Unix as there are people writing user interfaces. Typically a "user interface" for X will contain at least a window manager, and possibly a session manager and other utilities. The two most common linux user interfaces are GNOME and KDE. GNOME is the default user interface for RedHat linux.

RedHat linux started the packaging mechanism known as RPM (RedHat Package Manager). RPMs automate the installation, updating, and removal of software from a system. RPM's work through a central database to keep track of what packages are installed, what each package contains (file-wise) and what a package depends upon in order to work.
 

Choosing the packages to install is something that can be worried about once linux is installed. I would either choose the GNOME workstation installation (for most end users) or the custom installation (if you would like more control).

Basic TCP/IP networking

IP addresses (IPv4) consist of 32 bit integers. It's commonplace for humans to refer to IP addresses as a sequence of four decimal integers of 8 bits each (0-255) separated by dots, for example 130.191.226.1. In a given IP address, some contiguous number of the bits describe the network that the host is connected to (130.191) and some number describe the host itself on that network (226.1). These two parts are referred to as the network portion and host portion of the address, respectively. When IPv4 was first being implemented, three distinct classes of IP addresses were initially created. Class A addresses ( a few networks each with 2^24 hosts), class B addresses ( large number of networks each with 2^16 hosts) and class C addresses ( very large number networks each with 2^8) hosts.

On any given network, a host can tell whether it is directly connected to another host by using it's netmask. Netmasks are also 32 bit integers, where a 1 bit denotes network and a 0 bit denotes host. In the case of our 130.191.226.1 host, if it could directly communicate with all other 130.191 hosts, it would use a 255.255.0.0 netmask. If it could only communicate with the 130.191.226 hosts directly, it would use a 255.255.255.0 netmask. How does a host communicate with another host that it is not directly connected to (in other words, a host on a different network) ? The default route is a host on the directly connected network that knows how to reach hosts not on the directly connected network. So a host uses it's IP address along with it's netmask to determine if a host it wants to communicate with (a second IP address) is reachable directly. If not directly reachable, it communicates with that host through another host, the default route. Some examples:

IP address       Netmask        Network        Default route    2nd host         reachable?
130.191.226.1    255.255.0.0    130.191.0.0    130.191.226.254  130.191.227.1    yes
130.191.226.1    255.255.255.0  130.191.226.0  130.191.226.254  130.191.227.1    no

The least amount I can say about subnetting is that it's not necessary to have netmasks that contain integral multiples of 8 bits. One could split a class C network (one with 256 hosts) into 4 networks of 64 hosts with an appropriate subnet mask, for example.

Humans don't do as well with IP adresses as they do with names. For this reason, as well as others, it's beneficial to have a way to translate host names (legacy-sunstroke.sdsu.edu) to IP addresses (130.191.226.1). Most Unix/linux hosts have an up to 3 tiered scheme for performing this translation. First, a local file (/etc/hosts) is consulted. Next, a local network information service (NIS) might be consulted. Finally, a global name service (DNS) might be consulted. The order of use of these name services is controlled by a name service switch, /etc/nsswitch.conf. Finally, the configuration of DNS is controlled by the file /etc/resolv.conf. What sort of configuration goes into this file ? Usually, a list of domains to search and the IP addresses of nameservers who support those domains. In the case of hierarchical domains, the order in which to search the domains may be of importance. For example, in the domain tns.sdsu.edu, is a host known only as rohan rohan.tns.sdsu.edu or rohan.sdsu.edu ? DNS can be queried to see what information is available for a particular hostname or IP address by using the host command. In specific, the command 'host -a hostname or IP address' will return any information known about the hostname or IP address.

Here's an example:

[morris@goes morris]$ host -a wgasa.sdsu.edu
Forcing `-t a' for signature trace.
Trying null domain
rcode = 0 (Success), ancount=1
The following answer is not verified as authentic by the server:
wgasa.sdsu.edu 3600 IN A 130.191.226.44
[morris@goes morris]$ host -t mx wgasa
wgasa.sdsu.edu mail is handled (pri=10) by sunstroke.sdsu.edu

Unix and linux refer to network connections as interfaces (or network interfaces). Hence the utility that is used to configure network interfaces and report on their status is called ifconfig. Under linux, it lives in /sbin/ifconfig and can be called with the -a flag to report on all interfaces. Here's what my laptop reports:

[morris@goes morris]$ /sbin/ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:60:97:90:9E:42
          inet addr:130.191.226.4  Bcast:130.191.226.255  Mask:255.255.255.0
          UP BROADCAST RUNNING  MTU:1500  Metric:1
          RX packets:120442 errors:0 dropped:0 overruns:0 frame:0
          TX packets:93 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:9 Base address:0x300

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

ping is probably the most used network connectivity tool in the unix world. ping sends out packets (ICMP echo requests) and waits for the requested host to reply (with ICMP echo replies). There's quite a wealth of information in these replies.

Note that sciences.sdsu.edu is reachable through no routers ( the ttl is 255 ) and the two way travel time is less than 1 millisecond.

[morris@goes morris]$ ping sciences.sdsu.edu
PING sciences.sdsu.edu (130.191.226.112) from 130.191.226.4 : 56(84) bytes of data.
64 bytes from 130.191.226.112: icmp_seq=0 ttl=255 time=0.6 ms
64 bytes from 130.191.226.112: icmp_seq=1 ttl=255 time=0.6 ms
64 bytes from 130.191.226.112: icmp_seq=2 ttl=255 time=0.6 ms

--- sciences.sdsu.edu ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.6/0.6/0.6 ms

Here, nodulus.extern.ucsd.edu is 11 hops away and has 100 milliseconds of two way travel time.

[morris@goes morris]$ ping nodulus.extern.ucsd.edu
PING nodulus.extern.ucsd.edu (199.105.15.33) from 130.191.226.4 : 56(84) bytes of data.
64 bytes from 199.105.15.33: icmp_seq=0 ttl=234 time=147.0 ms
64 bytes from 199.105.15.33: icmp_seq=1 ttl=234 time=107.8 ms
64 bytes from 199.105.15.33: icmp_seq=2 ttl=234 time=108.5 ms

--- nodulus.extern.ucsd.edu ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 107.8/121.1/147.0 ms

Finally, it should be noted that a host can have ICMP turned off and still be reachable for any service other than ICMP. You just won't be able to ping it.

netstat is a command that can be used to determine quite a bit about the network configuration of a host. In particular, since we used ifconfig above to find out about the IP address and netmask, we still need a way to discover routing information. In specific, we need to know what the default route is set to. Here's how to get routing information using netstat:

[morris@goes morris]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
130.191.226.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U         0 0          0 lo
0.0.0.0         130.191.226.254 0.0.0.0         UG        0 0          0 eth0

Translated, this table means that the 130.191.226.0 network with a 255.255.255.0 netmask is reachable directly out of the eth0 network interface with no gateway necessary. The 0.0.0.0 network (known as the default) is reachable out of the eth0 network interface via the 130.191.226.254 gateway. Hence the default gateway or default route is 130.191.226.254.

X windows is fundamentally different then most client windowing systems. At the lowest level, X windows is a network protocol that describes certain messages that can be sent on a connection which may or may not be on the same host. Without going into more detail, X allows one to build what look like conventional host windowing systems where there is a distinction between the host providing the window drawing and the host requesting the window drawing. So in X vernacular, an X server is the process that provides pointer events and draws windows on a screen, and an X client is process that receives pointer events and requests window drawing. The overall behavior of the user interface is usually provided by  special X client called a window manager. Typically, the window mananger is responsible for a lot of the behavior that most folks attribute to the "look and feel". To further muddy the water, a typical workstation will have processes running on it that are X server, X window manager, and X client, although this is not necessarily true. The version of X that typically runs under linux is called X Free 86. It's an x server based upon X 11 Release 6.3 or 6.4 that was written to support the many video cards that Intel based PC hardware contain. Hence installing and configuring X Free 86 usually entails specifying the video card, how much memory it has, what type the mouse is and how it is connected, the scan rates that the monitor supports, etc.

Performing the installation

The first step is to either boot the target computer off of the RedHat CD, or a boot disk created from the CD. The first screen will then ask whether one wants to perform the install in a GUI, in text mode, or in expert mode. The GUI relies upon X, so in cases where X configuration is difficult or impossible, text mode is the way to go. The selection of the local language and the keyboard type and layout are pretty self-explanatory, so the first place where thought is required is the mouse type and port. Both X windows and a text mode mouse utility (gpm) rely on the specification of the mouse protocol and port. Typically, the mouse is either ps2 or serial, and a device symlink is created for /dev/mouse that points to /dev/psaux, /dev/ttyS0, or /dev/ttyS1. The installation type is really a choice of several common installations; a user workstation with the GNOME user interface, a user workstation with the KDE user interface, a server, or "custom". Of these choices, custom leaves more choices to the person performing the install. Once an installation type is chosen, the disk is either automatically partitioned or the user is prompted to make choices regarding disk partitioning. Most of the rest of the installation choices are fairly self-explanatory.
 

Chapter 2. How do most *nix OS's boot and more importantly, how do they start "services"? What are services and how do I inventory/stop/start/exploit them? What is a denial of service exploit?

Outline for Chapter 2

What is a process ? A process, for the purposes of this discussion, is a "thread of execution", with it's own resources (protected memory space, stack, file descriptors, etc.)

The root (0) UID. Most *nix operating systems assign ID's to "users" who have processes running on the system. Of these ID's (known as UIDs or User IDs) only the UID of 0 is treated differently. No file permissions exist for the UID 0, any system calls can be executed by UID 0, and any process can be sent a signal by UID 0.

The fork system call is the somewhat inefficient way that unix operating systems create processes. Basically, the fork() system call is used to duplicate an existing process. Most fork()ed processes immediately execute the exec() system call, which overlays the process with a new "program". So you duplicate one process in order to then completely change it.

What is the kernel, or unix, or linux? When a unix computer boots, some number of iterations of boot loaders run and finally, the "kernel" itself is loaded and executed. This program is special in that it never reliquishes control of the computer hardware. Instead, it initializes itself, inventories the computer's hardware, loads drivers for that hardware, configures operating system services (including a scheduler) and starts the mother of all processes ... init.

Init is the name commonly given to the first process run by a booting unix system. Init is responsible for (among other things) the orderly startup of the operating system. Most unix operating systems have the concept of "run levels" or sets of functionality that they can exist at. For instance, "single user" mode is typically run when administrative chores require as simple of an environment as possible. "Multi-user" might have additional services run that allow additional users to login and perform useful work. Finally, there may be additional levels that incrementally add layers of service (for example, X windows may be added to the console at a higher run level than just multi-user mode).

For each different version of unix there is a different convention for how init starts up services in these different run levels. RedHat linux uses a table of runlevels (/etc/inittab) in conjunction with a set of directories (/etc/rc.d/rc*.d) and files (/etc/rc.d/rc*) to bring about this orderly startup or shutdown.
First, /etc/rc.d/rc.sysinit is run. This is a standard shell (/bin/sh) script, so it's easy to see what it does. Next, /etc/rc.d/rc is run, and given the argument of the desired runlevel. Each runlevel (3, for example) will contain it's own directory ( /etc/rc.d/rc3.d) that contains shell scripts whose name are KXY or SXY where X and Y are the characters 0-9. The /etc/rc.d/rc shell script first calls all of the K* scripts in ascending order, with the argument "stop" and then calls all of the S* scripts in ascending order with the argument "start". Each of these /etc/rc.d/rc3.d/* scripts contain instructions for services that should be stopped (K) or started (S) at run level 3. The order in which they should be started or stopped is conveyed by XY. Finally, it should be noted that RedHat defaults to run level 5 and uses an overall manager program (chkconfig) that allows the user to convey what services should run at what runlevel, and chkconfig then manipulates the contents of these directories. Try /sbin/chkconfig --list to see which services run at each level.

A host computer typically only has one IP address, and yet typically has many active network connections. For TCP/IP, this means that the network "space" on that computer is further subdivided into "ports". Ports are numbered from 1 to some maximum, with most unixes reserving the ports below 1024 for processes with a UID of 0 (known as priveledged ports). Many ports are "well known", in that a particular service is expected to be run on a particular port. For example, telnetting to sciences.sdsu.edu means connecting to port 23 on the IP address 130.191.226.112 and then using the accepted telnet protocol over the established connection.

A TCP/IP server process is usually a program that performs some initialization and then calls the listen() system call. This system call is given an IP address and a port to listen on, and the number of simultaneous connections to expect. The listen system call will return only when an connection attempt is made to that port and IP address. The process then may make a copy of itself  (fork) in order to handle that connection request while continuing to listen() on that IP address and port. This behavior of hanging around and waiting for an external event is what gave rise to the name of daemon for these server processes.

At a lower level, handled by the kernel itself, is the initial connection setup. A TCP/IP connection is established when a sender sends a packet, called SYN, to an IP address and port. The TCP/IP implementation on the receiving host then replies with an ACK (or RST, if there is no service connected), and finally the sender responds to the ACK with a SYN/ACK. These opening packets are used to agree upon initial parameters of TCP/IP, such as sequence numbers. The point is that the daemon listening on the IP address/port pair will only "awaken" after the connection is open, or after the SYN/ACK. It should also be noted that for each IP/port pair, there are two connection queues. The first is maintained by the operating system for maintenance of more than one of these incoming connection requests, and the second is maintained in the application that issued the listen() system call.

At this point, we can envision an operating system that offers three services (say telnet, finger,
and rlogin) by virtue of three daemons bound to three well known ports. This is inefficient, as daemons consume system resources and only occasionally perform useful work. Someone got the bright idea that a daemon could be written that listened() on a large number of ports, and then "started" a new daemon when a connection arrived on one of these ports. Unix implementations commonly have one of these, named inetd. Inetd has a configuration file that tells it what ports to listen on, and what to do when a connection arrives on one of those ports. Since inetd gives us this sort of flexibility, why would we ever do things the other way? The answer is "speed". Inetd is not the fastest way to start up a service handler. Anyway, the point here is that some services are started upon demand by inetd, while others are started by their own daemons in their own startup scripts.

The above paragraph implies that we control what services our computer offers by editing the startup scripts (using /sbin/chkconfig) and by editing the inetd configuration file (/etc/inetd.conf). This is not how folks on the net catalog our service offerings, and we should learn from them. They wil "port scan" our computer. Port-scanning is usually as simple as "testing" each port in an ascending fashion in order to see if a service is offered. It's this "testing" that has gone significant evolution in the past few years. In the old days, scanners used to actually try to sequentially connect() to the IP address/port pairs, which is easily detectable by sysadmins. (If you are keeping track of who telnets to your host, which you should do, you will see these scan attempts.) Now, it's understood that you can test whether a TCP/IP service is offered without actually entering into a "valid" TCP/IP transaction. These "stealth" scans are more difficult to detect.

You have done the first part of your job as a system administrator and you have audited and tailored the services that you offer. You would now like to restrict who you offer those services to. Who, in this context, means what groups of IP address ranges. So our mission is to implement access control, and our criteria for deciding if a connection should be allowed are: source and destination IP addresses, and source and destination IP ports. (There are other more obscure criteria that we could use. ) There are many ways in which to implement this sort of access control, or packet filtering.

Why all of the fuss ? Can't the programs that provide the services just be written in a secure fashion? If only it were true. Programmers are lazy and most write imperfect code. Time is spent making sure that a program behaves appropriately given anticipated input, but it is expensive to make programs impervious to malicious input. A good example of this is stack smashing. Stack smashing takes into account some arcane details of languages like C, combined with sloppy coding on the part of programmers. Here's a code fragment:

void do_something( char* argument )
{
    char buffer[100];

    strcpy( buffer, argument);
    /* do something else*/
    ...
}

The variable "buffer" is a C automatic variable, allocated on the stack. It has a fixed size ( 100 bytes ) and C does no bounds checking. What happens if "argument" is longer than 100 characters ? The answer is that part of the stack will be overwritten. What if we carefully fill "argument" with enough characters to overrun the stack, a new return address to code to follow, and then binary data that is actually executable instructions? The routine do_something will "return", and our carefully crafted bomb will be executed by the host. What if do_something was inside of an ftp server daemon running with a UID of 0 ? Answer is that our computer would now be "owned" by anyone on the net who knew how to exploit this vulnerability.
Still sceptical? Check out the exploits database at www.securityfocus.com.

Denials of service. Let's say that we don't want to break in, we just want to render a host incapable of offering service to others. A classic old example of this is the "SYN flood" attack. We saw above that the operating system maintains a queue of partially formed incoming connections. What if we just fill this queue (by sending repeated SYN's to the IP/port pair)? The answer is that additional requests will be refused.

Summary: Dr. Smith needs to run the application foobar under linux, and it will only be used from the console. What should I do? Answer: use /sbin/chkconfig and turn off the following services (many of which are off by default):

Since I don't know what application foobar does, you might want to make sure that it still works. Also, if the computer will get used for outbound network connections, turn network on. The inet service is the ones that actually starts inetd.

Summary 2: I'd like to use linux at home to connect to a cable modem and provide http and anonymous ftp service to the internet. What should I do ?

Links


General Redhat

The official RedHat Training site.

Get RHCE (RedHat Certified Engineer) training.

View the RHCE pre-requisites or the RHCE content outline.

The RedHat manuals in html and PDF.

Alpha linux

The official Alpha Linux site with a useful document on SRM installs.
Compaq's Linux site, manuals for installing RedHat/SUSE on Compaq hardware.
Netscape navigator for Alpha linux.