Chapter 2 Software Basics
A program is a set of computer instructions that perform a particular task. That program can be written in assembler, a very low level computer language, or in a high level, machine independent language such as the C programming language. An operating system is a special program which allows the user to run applications such as spreadsheets and word processors. This chapter introduces basic programming principles and gives an overview of the aims and functions of an operating system.
2.1 Computer Languages
2.1.1 Assembly Languages
The instructions that a CPU fetches from memory and executes are not at all understandable to human beings. They are machine codes which tell the computer precisely what to do. The hexadecimal number 0x89E5 is an Intel 80486 instruction which copies the contents of the ESP register to the EBP register. One of the first software tools invented for the earliest computers was an assembler, a program which takes a human readable source file and assembles it into machine code. Assembly languages explicitly handle registers and operations on data and they are specific to a particular microprocessor. The assembly language for an Intel X86 microprocessor is very different to the assembly language for an Alpha AXP microprocessor. The following Alpha AXP assembly code shows the sort of operations that a program can perform:
ldr r16, (r15) ; Line 1
ldr r17, 4(r15) ; Line 2
beq r16,r17,100 ; Line 3
str r17, (r15) ; Line 4
100: ; Line 5
The first statement (on line 1) loads register 16 from the address held in register 15. The next instruction loads register 17 from the next location in memory. Line 3 compares the contents of register 16 with that of register 17 and, if they are equal, branches to label 100. If the registers do not contain the same value then the program continues to line 4 where the contents of r17 are saved into memory. If the registers do contain the same value then no data needs to be saved. Assembly level programs are tedious and tricky to write and prone to errors. Very little of the Linux kernel is written in assembly language and those parts that are written only for efficiency and they are specific to particular microprocessors.
2.1.2 The C Programming Language and Compiler
Writing large programs in assembly language is a difficult and time consuming task. It is prone to error and the resulting program is not portable, being tied to one particular processor family. It is far better to use a machine independent language like C. C allows you to describe programs in terms of their logical algorithms and the data that they operate on. Special programs called compilers read the C program and translate it into assembly language, generating machine specific code from it. A good compiler can generate assembly instructions that are very nearly as efficient as those written by a good assembly programmer. Most of the Linux kernel is written in the C language. The following C fragment:
if (x != y)
x = y ;
performs exactly the same operations as the previous example assembly code. If the contents of the variable x are not the same as the contents of variable y then the contents of y will be copied to x. C code is organized into routines, each of which perform a task. Routines may return any value or data type supported by C. Large programs like the Linux kernel comprise many separate C source modules each with its own routines and data structures. These C source code modules group together logical functions such as filesystem handling code.
C supports many types of variables, a variable is a location in memory which can be referenced by a symbolic name. In the above C fragment x and y refer to locations in memory. The programmer does not care where in memory the variables are put, it is the linker (see below) that has to worry about that. Some variables contain different sorts of data, integer and floating point and others are pointers.
Pointers are variables that contain the address, the location in memory of other data. Consider a variable called x, it might live in memory at address 0x80010000. You could have a pointer, called px, which points at x. px might live at address 0x80010030. The value of px would be 0x80010000: the address of the variable x.
C allows you to bundle together related variables into data structures. For example,
struct {
int i ;
char b ;
} my_struct ;
is a data structure called my_struct which contains two elements, an integer (32 bits of data storage) called i and a character (8 bits of data) called b.
2.1.3 Linkers
Linkers are programs that link together several object modules and libraries to form a single, coherent, program. Object modules are the machine code output from an assembler or compiler and contain executable machine code and data together with information that allows the linker to combine the modules together to form a program. For example one module might contain all of a program's database functions and another module its command line argument handling functions. Linkers fix up references between these object modules, where a routine or data structure referenced in one module actually exists in another module. The Linux kernel is a single, large program linked together from its many constituent object modules.
2.2 What is an Operating System?
Without software a computer is just a pile of electronics that gives off heat. If the hardware is the heart of a computer then the software is its soul. An operating system is a collection of system programs which allow the user to run application software. The operating system abstracts the real hardware of the system and presents the system's users and its applications with a virtual machine. In a very real sense the software provides the character of the system. Most PCs can run one or more operating systems and each one can have a very different look and feel. Linux is made up of a number of functionally separate pieces that, together, comprise the operating system. One obvious part of Linux is the kernel itself; but even that would be useless without libraries or shells.
In order to start understanding what an operating system is, consider what happens when you type an apparently simple command:
$ ls
Mail c images perl
docs tcl
$
The $ is a prompt put out by a login shell (in this case bash). This means that it is waiting for you, the user, to type some command. Typing ls causes the keyboard driver to recognize that characters have been typed. The keyboard driver passes them to the shell which processes that command by looking for an executable image of the same name. It finds that image, in /bin/ls. Kernel services are called to pull the ls executable image into virtual memory and start executing it. The ls image makes calls to the file subsystem of the kernel to find out what files are available. The filesystem might make use of cached filesystem information or use the disk device driver to read this information from the disk. It might even cause a network driver to exchange information with a remote machine to find out details of remote files that this system has access to (filesystems can be remotely mounted via the Networked File System or NFS). Whichever way the information is located, ls writes that information out and the video driver displays it on the screen.
All of the above seems rather complicated but it shows that even most simple commands reveal that an operating system is in fact a co-operating set of functions that together give you, the user, a coherent view of the system.
2.2.1 Memory management
With infinite resources, for example memory, many of the things that an operating system has to do would be redundant. One of the basic tricks of any operating system is the ability to make a small amount of physical memory behave like rather more memory. This apparently large memory is known as virtual memory. The idea is that the software running in the system is fooled into believing that it is running in a lot of memory. The system divides the memory into easily handled pages and swaps these pages onto a hard disk as the system runs. The software does not notice because of another trick, multi-processing.
2.2.2 Processes
A process could be thought of as a program in action, each process is a separate entity that is running a particular program. If you look at the processes on your Linux system, you will see that there are rather a lot. For example, typing ps shows the following processes on my system:
|
第二章 软件基础
程序就是一组执行特定任务的计算机指令。程序既可以用非常低级的计算机语言——汇编语言,也可以用高级的、独立于机器的语言如C语言来编写。操作系统是一种特殊的程序,它允许用户运行各种应用程序如制表程序和字处理程序。本章介绍基本的程序设计原理,并对操作系统的目标和功能做一综述。
2.1 计算机语言
2.1.1 汇编语言
CPU从内存中取出并运行的指令对人来说根本无法理解。它们是精确指示机器如何操作的机器代码。例如,十六进制数0x89E5是Intel 80486的一条指令,它指示把ESP寄存器的内容拷贝到EBP寄存器中。汇编器是最早发明的软件工具之一,它输入人类可以理解的源代码,汇编为机器代码。汇编语言显式地处理寄存器和数据操作,与特定的微处理器相关(应为与特定的处理器相关--译者注)。Intel X86微处理器的汇编语言就与Alpha AXP微处理器的汇编语言大相径庭。以下Alpha AXP汇编代码表示了程序可以进行的一种操作:
ldr r16, (r15) ; Line 1
ldr r17, 4(r15) ; Line 2
beq r16,r17,100 ; Line 3
str r17, (r15) ; Line 4
100: ; Line 5
第一条指令(见第一行)把寄存器15中存放的地址中的内容装入寄存器16。下一条指令把内存中下一个位置的内容装入寄存器17。第三行把寄存器16和寄存器17的内容比较,如果相等,则转向标号100处。如果两个寄存器包含数值不等,程序继续运行第四行,把寄存器17的内容存到内存。如果两个寄存器包含数值相等,那么没有数据需要保存。编写汇编语言程序枯燥乏味、技巧性强而且易于出错。Linux核心只有很少的一点用汇编语言编写,目的是为了效率,或者用在一些与特定处理器相关的地方。
2.1.2 C语言和编译器
用汇编语言编写大型程序十分困难而且消耗大量时间。这样做易于出错,得到的程序也无法移植,而被限制在特定的处理器族上。用独立于机器的语言如C,会好得多。C允许你用逻辑算法和其操作的数据结构来描述程序。称之为编译器的特定程序读入C程序,并把它翻译成汇编语言,生成相应的机器代码。好的编译器所产生的汇编指令的效率接近于好的汇编语言程序员编写的汇编语言程序。大部份Linux核心是用C语言编写的。以下的C片段:
if (x != y)
x = y ;
与前一个例子中汇编代码的操作完全相同。如果变量x和y的内容并不完全相同,就把y的内容拷贝给x。C代码组织为例程,每一个例程执行一个任务。例程可以返回C支持的任何数值或者数据类型。像Linux核心这样的大型程序包含很多独立的C源模块,每个模块都有自己的例程和数据结构。这些C源代码模块把像文件系统处理这样的逻辑功能代码组合在一起。(编者注:这里直译不太好理解,其实意思就是把一些相关模块组织起来,完成像文件系统这样的逻辑功能。)
C支持很多类型的变量。所谓变量,就是内存中的一个位置,可以用符号名字来引用。在以上C片段中,x和y指引了内存中的位置。程序员不关心变量究竟存放在内存中的何处,这是连接器(见下面所述)的任务。一些变量含有不同类型的数据,整数和浮点数,另一些则是指针。
指针就是包含地址——其它数据在内存中的位置——的变量。考虑叫做x的变量,它可能处于内存地址0x80010000。你可以有一个指针,叫做px,指向x。px可能处于地址0x80010030,而px的值是0x80010000,即变量x的地址。
C允许你把相关的变量绑在一起,形成数据结构。例如,
struct {
int i ;
char b ;
} my_struct ;
是一个叫做my_struct的数据结构,它包含两个元素:一个叫做i的整数(32位数据)和一个叫做b的字符(8位数据)。
2.1.3 连接器
连接器是一种程序,它可以把几个目标模块和库连接在一起,产生一个独立的、连贯的程序。目标模块是汇编器或编译器生成的机器代码输出,含有可执行的机器代码和数据,以及允许连接器把模块连接起来的信息。例如一个模块可能含有程序中所有的数据库函数,而另外一个则含有命令行参数处理函数。连接器负责解决目标模块之间的引用,例如一个模块中引用的例程或数据结构事实上在另外一个模块之中。Linux核心就是一个与很多成员目标模块连接在一起的独立的大程序。
2.2 什么是操作系统?
没有软件的计算机就是一堆发热的电子器件。如果说硬件是计算机的核心,那么软件就是计算机的灵魂。所谓操作系统,就是允许用户在其上运行应用软件的一组系统程序。操作系统对系统的真正硬件进行抽象,向系统的用户和应用程序给出一个虚拟机。在很现实的意义上说,软件提供了系统的特点。绝大部份PC能运行一个或多个操作系统,每一个操作系统都有一个完全不同的外观和风格。Linux是由一批功能上分离的部件组成,其中明显的一个是核心本身。但是即使是核心,如果脱离库和外壳程序(编者注:其实就是Shell程序,这里编者认为不译最好,但考虑到不少材料都将其译成外壳程序,也为了尊重原作者故而未做改动。另外本文不少地方将Kernel译为核心,其实就是常说的内核)也是没有用的。
为了开始理解什么是操作系统,请考虑当你敲入以下的简单命令时会发生的情况:
$ ls
Mail c images perl
docs tcl
$
这里$是由登录外壳程序(在此例为bash)给出的提示符。这意味着它在等待你——用户——敲入命令。敲入ls后,键盘驱动程序识别出已经有字符输入。键盘驱动程序把这些字符传给外壳程序,外壳程序则通过寻找可执行程序的映像来处理这个命令。它在/bin/ls发现了映像,于是调用核心服务来把ls可执行程序的映像拖入虚拟内存,并开始执行。ls的映像调用核心的文件子系统,以找出有哪些文件可以获得。文件系统有可能要充分使用放在被缓存的文件系统信息或者用磁盘驱动程序从磁盘读出这些信息,甚至可能用网络驱动程序与远程机器交换信息,以找出本系统能够存取的远程文件的细节(文件系统可以通过网络文件系统或NFS来远程挂载)。无论是用哪种方式定位信息,ls都会把信息写出来,由视频驱动程序把它显示在屏幕上。
以上看起来很复杂,但是说明了一个道理:即使是最简单的命令,也需要相当的处理,操作系统事实上是一组互相合作的函数,它们在整体上给用户以一个系统的完整印象。(以上一句是根据译者理解翻译的,未必忠实于原文——译者注)
2.2.1 内存管理
如果有无限的资源,例如内存,很多操作系统需要做的事情都是多余的。操作系统的一个基本技巧是使一小块内存看起来像很多内存。这种表面上看起来大的内存称为虚拟内存。其思想是使系统中运行的软件以为它在很多内存上运行。系统把内存分成很多容易处理的页面,在系统运行的时候,把一些页面交换到硬盘上。由于另外的一个技巧——多道处理,软件注意不到这一点。
2.2.2 进程
进程可以想象为一个活动中的程序。每一个进程是一个独立的实体,在运行一个特定程序。如果你看看你的Linux系统中的进程,你就会发现一大堆。例如,在我的系统中敲入ps可以显示如下进程:
|