Why Unix and linux are important in an academic environment. How
to install linux. Anticipated support issues.
Early versions of Unix (especially the Berkeley Standard Distribution, or BSD) were used to implement and test the early Internet protocols. As changes were made to existing protocols and new protocols were added, much of this code became part of BSD. Many modern operating systems (Solaris, Linux, Irix) contain parts of or are based upon BSD. Hence many reference implementations of core Internet services (SMTP/POP/HTTP/FTP/NNTP) are written for Unix. Because of this, many servers on the internet run a flavor of Unix or Linux.
Programmers like to work on stable, crash-free computers. Providers of important internet services demand stable, crash-free computers. Most Unix implementations have been incrementally refined for over a decade. I feel comfortable saying that Unix/Linux is currently the most stable operating system that is commonly available.
Unlike the Mac and Windows, where a single vendor controls the user interface and the look and feel, Unix user interfaces are built upon X, which has no look and feel philosophy at all. Hence there are as many user interfaces for Unix as there are people writing user interfaces. Typically a "user interface" for X will contain at least a window manager, and possibly a session manager and other utilities. The two most common linux user interfaces are GNOME and KDE. GNOME is the default user interface for RedHat linux.
RedHat linux started the packaging mechanism known as RPM (RedHat
Package Manager). RPMs automate the installation, updating, and removal
of software from a system. RPM's work through a central database to keep
track of what packages are installed, what each package contains (file-wise)
and what a package depends upon in order to work.
Choosing the packages to install is something that can be worried about once linux is installed. I would either choose the GNOME workstation installation (for most end users) or the custom installation (if you would like more control).
Basic TCP/IP networking
On any given network, a host can tell whether it is directly connected to another host by using it's netmask. Netmasks are also 32 bit integers, where a 1 bit denotes network and a 0 bit denotes host. In the case of our 130.191.226.1 host, if it could directly communicate with all other 130.191 hosts, it would use a 255.255.0.0 netmask. If it could only communicate with the 130.191.226 hosts directly, it would use a 255.255.255.0 netmask. How does a host communicate with another host that it is not directly connected to (in other words, a host on a different network) ? The default route is a host on the directly connected network that knows how to reach hosts not on the directly connected network. So a host uses it's IP address along with it's netmask to determine if a host it wants to communicate with (a second IP address) is reachable directly. If not directly reachable, it communicates with that host through another host, the default route. Some examples:
IP address Netmask
Network Default route
2nd host reachable?
130.191.226.1 255.255.0.0
130.191.0.0 130.191.226.254 130.191.227.1
yes
130.191.226.1 255.255.255.0 130.191.226.0
130.191.226.254 130.191.227.1 no
The least amount I can say about subnetting is that it's not necessary to have netmasks that contain integral multiples of 8 bits. One could split a class C network (one with 256 hosts) into 4 networks of 64 hosts with an appropriate subnet mask, for example.
Humans don't do as well with IP adresses as they do with names. For this reason, as well as others, it's beneficial to have a way to translate host names (legacy-sunstroke.sdsu.edu) to IP addresses (130.191.226.1). Most Unix/linux hosts have an up to 3 tiered scheme for performing this translation. First, a local file (/etc/hosts) is consulted. Next, a local network information service (NIS) might be consulted. Finally, a global name service (DNS) might be consulted. The order of use of these name services is controlled by a name service switch, /etc/nsswitch.conf. Finally, the configuration of DNS is controlled by the file /etc/resolv.conf. What sort of configuration goes into this file ? Usually, a list of domains to search and the IP addresses of nameservers who support those domains. In the case of hierarchical domains, the order in which to search the domains may be of importance. For example, in the domain tns.sdsu.edu, is a host known only as rohan rohan.tns.sdsu.edu or rohan.sdsu.edu ? DNS can be queried to see what information is available for a particular hostname or IP address by using the host command. In specific, the command 'host -a hostname or IP address' will return any information known about the hostname or IP address.
Here's an example:
[morris@goes morris]$ host -a wgasa.sdsu.edu
Forcing `-t a' for signature trace.
Trying null domain
rcode = 0 (Success), ancount=1
The following answer is not verified as authentic by the server:
wgasa.sdsu.edu 3600 IN A 130.191.226.44
[morris@goes morris]$ host -t mx wgasa
wgasa.sdsu.edu mail is handled (pri=10) by sunstroke.sdsu.edu
Unix and linux refer to network connections as interfaces (or network interfaces). Hence the utility that is used to configure network interfaces and report on their status is called ifconfig. Under linux, it lives in /sbin/ifconfig and can be called with the -a flag to report on all interfaces. Here's what my laptop reports:
[morris@goes morris]$ /sbin/ifconfig -a
eth0 Link encap:Ethernet HWaddr
00:60:97:90:9E:42
inet addr:130.191.226.4
Bcast:130.191.226.255 Mask:255.255.255.0
UP BROADCAST
RUNNING MTU:1500 Metric:1
RX packets:120442
errors:0 dropped:0 overruns:0 frame:0
TX packets:93
errors:0 dropped:0 overruns:0 carrier:0
collisions:0
txqueuelen:100
Interrupt:9
Base address:0x300
lo Link encap:Local Loopback
inet addr:127.0.0.1
Mask:255.0.0.0
UP LOOPBACK
RUNNING MTU:3924 Metric:1
RX packets:0
errors:0 dropped:0 overruns:0 frame:0
TX packets:0
errors:0 dropped:0 overruns:0 carrier:0
collisions:0
txqueuelen:0
ping is probably the most used network connectivity tool in the unix world. ping sends out packets (ICMP echo requests) and waits for the requested host to reply (with ICMP echo replies). There's quite a wealth of information in these replies.
Note that sciences.sdsu.edu is reachable through no routers ( the ttl is 255 ) and the two way travel time is less than 1 millisecond.
[morris@goes morris]$ ping sciences.sdsu.edu
PING sciences.sdsu.edu (130.191.226.112) from 130.191.226.4 : 56(84)
bytes of data.
64 bytes from 130.191.226.112: icmp_seq=0 ttl=255 time=0.6 ms
64 bytes from 130.191.226.112: icmp_seq=1 ttl=255 time=0.6 ms
64 bytes from 130.191.226.112: icmp_seq=2 ttl=255 time=0.6 ms
--- sciences.sdsu.edu ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.6/0.6/0.6 ms
Here, nodulus.extern.ucsd.edu is 11 hops away and has 100 milliseconds of two way travel time.
[morris@goes morris]$ ping nodulus.extern.ucsd.edu
PING nodulus.extern.ucsd.edu (199.105.15.33) from 130.191.226.4
: 56(84) bytes of data.
64 bytes from 199.105.15.33: icmp_seq=0 ttl=234 time=147.0 ms
64 bytes from 199.105.15.33: icmp_seq=1 ttl=234 time=107.8 ms
64 bytes from 199.105.15.33: icmp_seq=2 ttl=234 time=108.5 ms
--- nodulus.extern.ucsd.edu ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 107.8/121.1/147.0 ms
Finally, it should be noted that a host can have ICMP turned off and still be reachable for any service other than ICMP. You just won't be able to ping it.
netstat is a command that can be used to determine quite a bit about the network configuration of a host. In particular, since we used ifconfig above to find out about the IP address and netmask, we still need a way to discover routing information. In specific, we need to know what the default route is set to. Here's how to get routing information using netstat:
[morris@goes morris]$ netstat -rn
Kernel IP routing table
Destination Gateway
Genmask Flags
MSS Window irtt Iface
130.191.226.0 0.0.0.0
255.255.255.0 U
0 0 0 eth0
127.0.0.0 0.0.0.0
255.0.0.0 U
0 0 0 lo
0.0.0.0 130.191.226.254
0.0.0.0 UG
0 0 0 eth0
Translated, this table means that the 130.191.226.0 network with a 255.255.255.0 netmask is reachable directly out of the eth0 network interface with no gateway necessary. The 0.0.0.0 network (known as the default) is reachable out of the eth0 network interface via the 130.191.226.254 gateway. Hence the default gateway or default route is 130.191.226.254.
X windows is fundamentally different then most client windowing systems. At the lowest level, X windows is a network protocol that describes certain messages that can be sent on a connection which may or may not be on the same host. Without going into more detail, X allows one to build what look like conventional host windowing systems where there is a distinction between the host providing the window drawing and the host requesting the window drawing. So in X vernacular, an X server is the process that provides pointer events and draws windows on a screen, and an X client is process that receives pointer events and requests window drawing. The overall behavior of the user interface is usually provided by special X client called a window manager. Typically, the window mananger is responsible for a lot of the behavior that most folks attribute to the "look and feel". To further muddy the water, a typical workstation will have processes running on it that are X server, X window manager, and X client, although this is not necessarily true. The version of X that typically runs under linux is called X Free 86. It's an x server based upon X 11 Release 6.3 or 6.4 that was written to support the many video cards that Intel based PC hardware contain. Hence installing and configuring X Free 86 usually entails specifying the video card, how much memory it has, what type the mouse is and how it is connected, the scan rates that the monitor supports, etc.
The root (0) UID. Most *nix operating systems assign ID's to "users" who have processes running on the system. Of these ID's (known as UIDs or User IDs) only the UID of 0 is treated differently. No file permissions exist for the UID 0, any system calls can be executed by UID 0, and any process can be sent a signal by UID 0.
The fork system call is the somewhat inefficient way that unix operating systems create processes. Basically, the fork() system call is used to duplicate an existing process. Most fork()ed processes immediately execute the exec() system call, which overlays the process with a new "program". So you duplicate one process in order to then completely change it.
What is the kernel, or unix, or linux? When a unix computer boots, some number of iterations of boot loaders run and finally, the "kernel" itself is loaded and executed. This program is special in that it never reliquishes control of the computer hardware. Instead, it initializes itself, inventories the computer's hardware, loads drivers for that hardware, configures operating system services (including a scheduler) and starts the mother of all processes ... init.
Init is the name commonly given to the first process run by a booting unix system. Init is responsible for (among other things) the orderly startup of the operating system. Most unix operating systems have the concept of "run levels" or sets of functionality that they can exist at. For instance, "single user" mode is typically run when administrative chores require as simple of an environment as possible. "Multi-user" might have additional services run that allow additional users to login and perform useful work. Finally, there may be additional levels that incrementally add layers of service (for example, X windows may be added to the console at a higher run level than just multi-user mode).
For each different version of unix there is a different convention
for how init starts up services in these different run levels. RedHat linux
uses a table of runlevels (/etc/inittab) in conjunction with a set of directories
(/etc/rc.d/rc*.d) and files (/etc/rc.d/rc*) to bring about this orderly
startup or shutdown.
First, /etc/rc.d/rc.sysinit is run. This is a standard shell (/bin/sh)
script, so it's easy to see what it does. Next, /etc/rc.d/rc is run, and
given the argument of the desired runlevel. Each runlevel (3, for example)
will contain it's own directory ( /etc/rc.d/rc3.d) that contains shell
scripts whose name are KXY or SXY where X and Y are the characters 0-9.
The /etc/rc.d/rc shell script first calls all of the K* scripts in ascending
order, with the argument "stop" and then calls all of the S* scripts
in ascending order with the argument "start". Each of these /etc/rc.d/rc3.d/*
scripts contain instructions for services that should be stopped (K) or
started (S) at run level 3. The order in which they should be started or
stopped is conveyed by XY. Finally, it should be noted that RedHat defaults
to run level 5 and uses an overall manager program (chkconfig) that allows
the user to convey what services should run at what runlevel, and chkconfig
then manipulates the contents of these directories. Try /sbin/chkconfig
--list to see which services run at each level.
A host computer typically only has one IP address, and yet typically has many active network connections. For TCP/IP, this means that the network "space" on that computer is further subdivided into "ports". Ports are numbered from 1 to some maximum, with most unixes reserving the ports below 1024 for processes with a UID of 0 (known as priveledged ports). Many ports are "well known", in that a particular service is expected to be run on a particular port. For example, telnetting to sciences.sdsu.edu means connecting to port 23 on the IP address 130.191.226.112 and then using the accepted telnet protocol over the established connection.
A TCP/IP server process is usually a program that performs some initialization and then calls the listen() system call. This system call is given an IP address and a port to listen on, and the number of simultaneous connections to expect. The listen system call will return only when an connection attempt is made to that port and IP address. The process then may make a copy of itself (fork) in order to handle that connection request while continuing to listen() on that IP address and port. This behavior of hanging around and waiting for an external event is what gave rise to the name of daemon for these server processes.
At a lower level, handled by the kernel itself, is the initial connection setup. A TCP/IP connection is established when a sender sends a packet, called SYN, to an IP address and port. The TCP/IP implementation on the receiving host then replies with an ACK (or RST, if there is no service connected), and finally the sender responds to the ACK with a SYN/ACK. These opening packets are used to agree upon initial parameters of TCP/IP, such as sequence numbers. The point is that the daemon listening on the IP address/port pair will only "awaken" after the connection is open, or after the SYN/ACK. It should also be noted that for each IP/port pair, there are two connection queues. The first is maintained by the operating system for maintenance of more than one of these incoming connection requests, and the second is maintained in the application that issued the listen() system call.
At this point, we can envision an operating system that offers three
services (say telnet, finger,
and rlogin) by virtue of three daemons bound to three well known
ports. This is inefficient, as daemons consume system resources and only
occasionally perform useful work. Someone got the bright idea that a daemon
could be written that listened() on a large number of ports, and then "started"
a new daemon when a connection arrived on one of these ports. Unix implementations
commonly have one of these, named inetd. Inetd has a configuration file
that tells it what ports to listen on, and what to do when a connection
arrives on one of those ports. Since inetd gives us this sort of flexibility,
why would we ever do things the other way? The answer is "speed". Inetd
is not the fastest way to start up a service handler. Anyway, the point
here is that some services are started upon demand by inetd, while others
are started by their own daemons in their own startup scripts.
The above paragraph implies that we control what services our computer offers by editing the startup scripts (using /sbin/chkconfig) and by editing the inetd configuration file (/etc/inetd.conf). This is not how folks on the net catalog our service offerings, and we should learn from them. They wil "port scan" our computer. Port-scanning is usually as simple as "testing" each port in an ascending fashion in order to see if a service is offered. It's this "testing" that has gone significant evolution in the past few years. In the old days, scanners used to actually try to sequentially connect() to the IP address/port pairs, which is easily detectable by sysadmins. (If you are keeping track of who telnets to your host, which you should do, you will see these scan attempts.) Now, it's understood that you can test whether a TCP/IP service is offered without actually entering into a "valid" TCP/IP transaction. These "stealth" scans are more difficult to detect.
You have done the first part of your job as a system administrator and you have audited and tailored the services that you offer. You would now like to restrict who you offer those services to. Who, in this context, means what groups of IP address ranges. So our mission is to implement access control, and our criteria for deciding if a connection should be allowed are: source and destination IP addresses, and source and destination IP ports. (There are other more obscure criteria that we could use. ) There are many ways in which to implement this sort of access control, or packet filtering.
void do_something( char* argument )
{
char buffer[100];
strcpy( buffer, argument);
/* do something else*/
...
}
The variable "buffer" is a C automatic variable, allocated on the
stack. It has a fixed size ( 100 bytes ) and C does no bounds checking.
What happens if "argument" is longer than 100 characters ? The answer is
that part of the stack will be overwritten. What if we carefully fill "argument"
with enough characters to overrun the stack, a new return address to code
to follow, and then binary data that is actually executable instructions?
The routine do_something will "return", and our carefully crafted bomb
will be executed by the host. What if do_something was inside of an ftp
server daemon running with a UID of 0 ? Answer is that our computer
would now be "owned" by anyone on the net who knew how to exploit this
vulnerability.
Still sceptical? Check out the exploits database at www.securityfocus.com.
Denials of service. Let's say that we don't want to break in, we just want to render a host incapable of offering service to others. A classic old example of this is the "SYN flood" attack. We saw above that the operating system maintains a queue of partially formed incoming connections. What if we just fill this queue (by sending repeated SYN's to the IP/port pair)? The answer is that additional requests will be refused.
Summary: Dr. Smith needs to run the application foobar under linux, and it will only be used from the console. What should I do? Answer: use /sbin/chkconfig and turn off the following services (many of which are off by default):
Summary 2: I'd like to use linux at home to connect to a cable modem and provide http and anonymous ftp service to the internet. What should I do ?
General Redhat
The official RedHat Training site.
Get RHCE (RedHat Certified Engineer) training.
View the RHCE pre-requisites or the RHCE content outline.
The RedHat manuals in html and PDF.
Alpha linux
The official Alpha Linux site
with a useful document on SRM
installs.
Compaq's Linux site,
manuals for installing RedHat/SUSE
on
Compaq hardware.
Netscape
navigator for Alpha linux.