Excerpts of
Windows Fan, Linux Fan
( Second Edition )
Fore June
Chapter 1 Introduction Chapter 2 The Fisherman
Chapter 3 Life of an ISP Chapter 4 Clash of The Fans
Chapter 5 Windows, Windows! Chapter 6 Lord Evil
Chapter 7 Gold Rush Chapter 8 The Shop
Chapter 9 Reborn Chapter 10 Red Dust
Chapter 11 The Golden Rule of Getting Rich Chapter 12 The Final Battle
Appendix B The Road to Freedom


Appendix B      The Road to Freedom


Since Linux Fan and I became friends, I learned a lot about Linux from 
Linux Fan; he taught me how to utilize the OS to conduct a business to 
provide Internet services. As discussed in the text, Linux is a free computer 
operating system created by tens of thousands of volunteer programmers 
around the world. Developed under the GNU General Public License, the 
source code for Linux is freely available to everyone. Because of its 
robustness and secure features, its popularity among computer users has 
gained momentum in recent years. Its use ranges from desktop applications 
to movie animations to sophisticated distributed computing. With Linux, 
you can do a task with zero software cost that could cost you thousands or 
even millions of dollars if you do it in MS Windows. Moreover, your chance 
of getting infected by virus is a lot smaller as most Linux applications are 
written based upon well-studied standards. 
    Some may argue that Linux is more difficult to use than MS Windows 
and consequently, it is not worthwhile to spend a significant amount of time 
to study Linux to save a few thousand dollars. It is unfortunate that many 
people are not aware that by paying efforts to learn something, you add 
value to yourself; the more difficult the material, the more valuable it is. 
There's not much difference in productivity between a worker who has 
worked in a fast food restaurant for a month and one who has worked for ten 
years. However, there's a huge difference for a corresponding pair working 
on the design of microprocessors. It is true that it is very easy to install an 
application in Windows; all you need to do is to click an icon and the 
installation shield takes care of the rest; you do not know what you are doing 
and do not know if it is vulnerable to virus attack or the manufacturer has 
used it to collect your personal data. You will not gain any knowledge in the 
installation process. On the other hand, each time you install a Linux 
application, you learn something new. If a company has created an 
environment where every employee can add value to herself, it adds value to 
itself too. Linux Fan observed that the emergence of Linux might create a 
new economic model. In recent years, the world has been evolving to an 
information-centric community. The transition is not totally painless. Tens 
of thousands of professionals who once thought their college degrees earned 
them a measure of security find themselves financially strained and 
emotionally exasperated as jobs continue to evaporate. Many of them work 
part time in grocery or hardware stores earning minimum wages and make 
searching for work their full time job only to find that their old jobs have 
gone forever. Here is a much better alternative. Instead of working full time 
to search for jobs, Linux Fan suggested that they could learn Linux and use 
it to serve others, which could generate a small amount of income at the 
beginning and eventually a self-paid high salary after they have added 
enough value to themselves. 
    If you are interested in starting a business to provide Internet services, you 
must first convince yourself what Linux Fan and I believe -- there are 
unlimited kinds of services that you can provide via the Internet. But 
regardless of the services you offer, there are some basics that you may 
always need to provide in your site. Readers may be curious about how long 
it will take to master all these basics. It really depends on your background 
and how much time you plan to spend on your study. Suppose you do this on 
a part-time basis, spending about three to four hours a day on the project. If 
you have a university degree in science, familiar with a contemporary 
programming language like C++ and have taken a couple core computer 
science courses such as data structures and file systems, it may take about 
three to four years to master the techniques; if your degree is in Computer 
Science, it may take about two to three years. You may protest, "That's too 
long. I can build a sophisticated site utilizing available commercial software 
in a much shorter time." True. If you are working for a company, it is not a 
bad idea to ask your company to purchase expensive software applications 
so that you can take the short route. However, if you want to start your own 
business and want to be successful like Linux Fan, taking a slow route may 
be a better approach. First, there's always a trade off between the ease of use 
and the functionalities of an application; VISUAL BASIC is easy to learn 
and use but you can never do sophisticated programming with it. Second, by 
paying efforts to learn something, you add value to yourself; the more 
difficult the material, the more valuable it is. Third, after mastering the 
techniques, you can easily customize your site and add innovative features to 
it. You can build on top of what you know and further extend your 
knowledge; this would tilt the learning curve and make it more difficult for 
your competitors to catch up. Fourth, the cost of commercial applications 
may become a heavy burden to you and substantially raise the risk level of 
your business; you may not even be able to survive the first wave of 
competition. 
    Once you have mastered the basics, you become a free person. You 
control your destiny and work for your mission. Of course, you do not need 
to wait until you have learned all the techniques. You can start with a small 
site, serving a few customers at the beginning. As time goes along, you will 
be more experienced and knowledgeable. You can then make your system 
more secure and robust or you can add more features to it and serve more 
customers. 
 
Apache Server
     
    I shall start from the discussion of constructing a web server. I assume 
that you know the basics of Linux already.
     As one can easily see from the Netcraft survey 
(http://www.netcraft.com/Survey/), the most widely used web server is the 
Apache HTTP Server (http://www.apache.org/). Derived from the popular 
NCSA httpd server, Apache dominates the web, currently accounting for 
about 60% active web sites constructions. It is distributed along with many 
Linux distributions and is in general installed by default. However, it is 
always better to download the latest version from http://www.apache.org or 
its mirror sites. The installation is a learning process.  You will then know 
what's going on and can upgrade or configure it with ease in the future. 
    To handle all the packages systematically, you may create a directory say, 
'/download' to hold all the downloaded distributions. You can then create a 
specific directory to unpack the package with the following command, 
gunzip -c /download/package_name.tar.gz | tar xvf - 
This command unpacks the package into your working directory without 
changing the original downloaded package. If something goes wrong, you 
can start all over again. In general, an unpacked package consists of a 
'README' file and 'INSTALL' file. You can simply follow the instructions 
in those files to install the package. In most cases, it is fairly straightforward. 
There are many books discussing the configuration and administration of an 
Apache Server (See for example, Linux Apache Web Server Administration 
(Linux Library) by Charles Aulds, Sybex, Nov. 2000.), which can be 
purchased online or from a local bookstore. The Apache Software 
Foundation (http://www.apache.org/) web site contains plenty information 
of the server and related projects. 
    After you have learnt the HTTP basics, you have to learn about HTTPS, 
which utilizes SSL (Secure Socket Layer) to transmit data in a secure way; 
this prepares you to conduct secure e-commerce (see 
http://www.openssl.org). These are foundations of your knowledge of the 
Apache server and after you have acquired the basic concepts, you should be 
able to construct a fancy personal web site. However, your knowledge is not 
enough to build a serious commercial site.
     One crucial topic about Apache that you need to learn is write modules to 
extend it; there's a good book on this written by Stein & MacEachern (L. 
Stein, and D. MacEachern, Writing Apache Modules with Perl and C, 
O'Reilly & Associates, 1999.) Writing apache modules let you go beyond 
simple CGI scripting; apache modules provide performance many times 
greater than the fastest conventional CGI scripts. By utilizing the Apache 
API, you can make your modules memory-leak proof. You can develop 
Apache modules to process images, making secure transactions, streaming 
data or adding many innovative features that you can think of. One may also 
achieve these functionalities using Java Servets but the Apache modules 
approach gives you much better performance and robustness. It may take 
you about six months to master the Apache basics.


PHP Programming 

     The next topic that you want to master is PHP scripting 
(http://www.php.net). When I was an ISP (Internet Service Provider), I 
started out using ASP (Active Server Pages) to do server side scripting but I 
eventually gave up using it because of the cumbersome syntax and limited 
support in other environments beyond Windows. I later switched to PHP 
scripting and found that it is a better and more powerful language for web 
programming. One nice feature of it is that you can use classes to construct 
web pages. Very often, beginners tend to use functions to accomplish all the 
work with one file containing one function; a directory may contain a few 
hundred files. Scripts written in this way make tracing, debugging and 
maintenance very difficult. A better way is to group relevant functions into a 
class and makes use of inheritance to organize your class structures in a 
comprehensive way. On the other hand, you should not make your class too 
large as that will slow down your server and will consume substantial 
resources. Another alternative to ASP and PHP is JSP (Java Server Pages) 
scripting. However, JSP needs to work with Java Servlet to realize its power. 
This makes JSP less convenient to use and has worse performance. There's a 
tendency for large independent web sites like yahoo.com to standardize their 
development using PHP. 
    Writing PHP scripts is relatively easy if you already know C. It may take 
you one week to two months to get yourself proficient in writing PHP 
programs. 
 
Qmail  
 
By now you are able to do significant work on web programming. It is time 
for you to learn to build and extend an email server. Almost all Internet 
Service Providers provide some kind of email services. My favorite email 
server is Qmail (http://www.qmail.org), which is a modern email server with 
robust and secure features. It is written in a modular way that users can 
easily extend or modify its functions. In Qmail, the mail-sending and mail-
receiving servers are decoupled and work independently. However, it does 
require one to pay substantial efforts to master its use. If you can 
successfully setup a useful email system using qmail, you know how email 
works and you know what you are doing. Moreover, to use it for commercial 
purposes, most likely you may need to make modifications to it. In many 
cases, you may want to integrate your email system with your database so 
that you can utilize the advanced features of a contemporary database to 
search or to authenticate users. 
     There are two files that you may want to modify. One is the 
checkpassword.c program, which is used to authenticate users when they 
retrieve emails; you can easily modify it to authenticate a user against a 
database instead of a file. Another program that you may need to modify is 
the qmail-smtpd.c, which is responsible for sending emails. The original 
program does not require users to authenticate before sending an email; it 
only checks if the user's IP address is allowed to relay emails; if its yes, the 
user can send emails otherwise the request is denied. This becomes 
impractical if you have users coming from many different places. Therefore, 
you may want to modify this file so that a login-password authentication is 
required when a user wants to send emails via your server. Of course, after 
the modification, you need to inform your users that they need to set their 
mail clients like Outlook Express to request for authentication upon sending 
emails. The modification requires some hacking of the package but it is not 
too difficult to do. By hacking the files, you will also learn more about the 
system and have a better understanding of how email servers work. Again 
you will add value to yourself. 
     You may also want to provide a web-based email services to your clients. 
Such an application can be developed using PHP. It can be easy or difficult, 
depending on the features you want to provide with your web-based email 
service. A good reference on email programming is Programming Internet 
Email by David Wood, O'Reilly, August 1999. It may take you about eight 
months or longer to develop applications for competitive email services. If 
you want to take a shortcut, you may also use some available free web-based 
email packages, like vpopmail (http://inter7.com/vpopmail.html) or 
squirrelmail (http://www.squirrelmail.org). There's a site that rates all 
significant web-based email packages at
     http://www.hotscripts.com/PHP/Scripts_and_Programs/\
     Email_Systems/Web-based_Email


PostgreSQL  
 
The next topic you want to learn is the use of database. It is almost 
impossible to build a commercial site without using a database engine. 
Mysql (http://www.mysql.com) has been a popular open source database. 
However, I recommend you to use PostgreSQL database 
(http://www.postgresql.org) which has transactions and better security 
model though there are some issues concerning this package; often its new 
release is not hundred percent backward compatible with old versions; each 
time I make an upgrade, I have to make minor modifications to my scripts 
which really is a headache; also, it seems that many versions have serious 
memory leakage problems. Despite these defects, I still feel that PostgreSQL 
is the best open source database available today and is powerful and secure 
enough to do sophisticated tasks. For a comprehensive description of 
PostgreSQL, you may refer to the book, Practical PostgreSQL by J.C. 
Worsley and J.D. Drake, O'Reilly, January 2002.
     A few years ago when the web began to emerge, the field was full of 
fancy names like 'three tier model', 'middleware', and 'Object Request 
Broker'. I was deceived by the names and I tried to access a database using 
middleware that has fancy names like ODBC, JDBC, and Data Request 
Broker. At the end, I found that all these were not necessary and the more 
layers you introduced in a system, the more errors you could induce in it. It 
is fine to use the standards like ODBC and JDBC if you work in a big 
company. But if you design, build and always maintain the system yourself, 
it is not necessary to use the middleware. I later gave up the use of all the 
stuffs with fancy names and access the PostgreSQL database via its native 
C-interface, which is a lot more straightforward and less error-prone. The 
PHP module already has built in functions to access PostgreSQL database 
and that makes life even easier.
     If you have written a remote client program in Java and you don't want to 
use JDBC to access your web site database, your Java program can access it 
via a PHP script, which is also straightforward and simple. Or in case you 
need to transmit the data in large quantities, you can learn some socket 
programming (discussed below) and write a simple server in C to access the 
database; your remote Java client can 'talk' to your C server which 'talks' to 
the database. Of course, if you like, your remote Java client can also 
communicate with an Apache module developed by you to make access to 
the database.
     It may take you three to nine months to master the use of PostgreSQL.


Java and Network Programming
  
After you have covered the above topics, you should have a reasonable web 
site that may be able to provide Internet Services to others. (Of course, I 
assume that you have also learnt some related minor topics such as HTML, 
Javascript and XML.) However, you cannot do much with it if you do not 
enrich your programming skills. At this point, I recommend that you spend 
more efforts to write better programs so that you can be proficient in both 
C/C++ and Java. Most likely, you need to use C/C++ to develop server 
programs and Java to develop client applications for your users. Though you 
may use Java to develop server side programs, in many cases, C/C++ is a 
better choice; C/C++ is more sophisticated and the programs thus developed 
run significantly faster. On the other hand, Java is a better choice for 
developing client programs. Java programs are platform independent and 
can be embedded in a browser as applets. 
     One crucial topic here is socket programming, which allows you to write 
a server program to communicate with clients at remote sites. For example, 
you can write a chat server in C/C++ and a chat client in Java as an applet. 
Your remote user uses a browser to start the chat client applet and send 
information to the chat server, which then broadcast the data to all other chat 
clients. At the same time you need to learn more about networking and be 
proficient in configuring your name server. The following are some good 
references on this topic: 
1. Paul Albitz and Cricket Liu, DNS and BIND, Fourth Edition, 
O'Reilly, April 2001. 
2. W. Richard Stevens, Unix Network Programming, Volume 1, 
Second Edition, Prentice Hall, 1998. 
3. Neil Matthew and Rick Stones, Beginning Linux Programming, 
Wrox Press Ltd., 1996. 
4. Warren W. Gay, Linux Socket Programming by Example, Que, 
2000. 
     There are many good books on Java in the market. The following are a 
few that are appropriate for beginners. 
1. Cay S. Horstmann and G. Cornell, Core Java, Volume 1 & 2, Sun 
Microsystems Press, 1999. 
2. David M. Geary, Graphic Java, Mastering the JFC, Volume 1 & 
2,Third Edition, Sun Microsystems Press, 1999. 
3. Jacquie Barker, Beginning Java Objects, Wrox Press Ltd., 2000. 
     The official site of Java, http://java.sun.com contains the latest and other 
relevant information about Java. The site 
http://jakarta.apache.org
provides information about server-side solutions for Java platform.
     This process may consume you three to twelve months.  
 

Clustering  

Now with the knowledge you have gained, you may have built a fairly 
sophisticated commercial site. However, your learning process is not 
complete and your site could not be too useful until you have learnt 
clustering and related technologies. If you are serious about your business, 
you may plan to serve millions of customers in the long run. This means that 
you need many machines to accomplish your goal and you want your system 
to be scalable and reliable. When more customers come, you simply add 
more machines. Also, you want to have your system up for 24 hours a day. 
There are a few approaches to address this problem. A simple and effective 
way to accomplish this is to establish a virtual server, which actually 
consists of a cluster of machines. Effectively, the cluster of machines 
behaves as a single virtual server, which is exposed to end-users. When one 
of the machines is down, it is automatically deleted from the cluster. When a 
new machine is added, the cluster automatically includes it to help share the 
load. An end user will not know when a machine is deleted or added to the 
cluster. You can build a Linux Virtual Server (LVS) with Linux machines 
using the patches provided by the web site 
http://www.linuxvirtualserver.org 
which hosts the Linux Virtual Server Project. There is a special member 
called 'director' in an LVS system. A user first contacts the director, which 
directs the requests to a member in the cluster. Subsequently, the user may 
communicate directly or indirectly with the selected cluster machine. Of 
course all these are transparent to the user. She only sees a single virtual 
server and thinks that she always communicates with a single machine. As 
you may have noticed, the director could become the single failure point of 
the system. If it is down, a user will find that the system has ceased to 
function. To maintain high availability, which is to ensure the whole system 
still functions properly when anyone node in the system fails, you may add a 
redundant director in your system. You can then 'heartbeat' the two directors 
so that when the primary fails, the secondary will take over the tasks 
automatically. Details about high availability can be found in the site 
http://www.linux-ha.org 
     The remaining question is how do you manage the files in your cluster. If 
you need a sophisticated distributed file system with fault tolerance, you 
may consider Coda developed by CMU (http://www.coda.cs.cmu.edu). But 
if your main concern is high availability, you may consider InterMezzo 
(http://www.inter-mezzo.org), which is a file system with a focus on high 
availability. You can use InterMezzo to replicate files across your servers. It 
can be used for mobile computing, which means that you can develop yours 
scripts and programs on your own personal machine. After you have 
thoroughly tested your scripts, you can connect your machine to the 
networked cluster; your new scripts will be replicated across the servers in 
the network. Or if you update a file in one machine, the changes will be 
propagated to all other members. This makes the maintenance and 
administration of your cluster a lot easier and less error-prone.  
     It may take you about six months to learn to build a highly available 
Linux Virtual Server cluster with InterMezzo deployed. 

Others  
 
After you have mastered all of the above, you are ready to provide reliable 
Internet Services to others. However, this is not the end of your learning 
process. What other topics you need to learn depend on your business. One 
common feature you may need is the streaming capabilities of your web site. 
Data streaming in general utilizes Real Time Protocol (RTP) to transmit 
data. Unlike general-purpose protocols such as HTTP or FTP, RTP is 
designed to transmit media streams that have strict timing requirements. 
Applications for data streaming can be conveniently developed using Java 
Media Framework (JMF) (see for example, Linden DeCarmo, Core Java 
media framework, Prentice Hall, 1999) You can learn more about data 
streaming from site http://www.real.com which also provides a free basic 
streaming server. Other relevant sites about this include
                             http://www.shoutcast.com
and 
                            http://www.icecast.org.  
     Another feature that you may be interested is text to speech synthesis 
(TTS) and speech recognition. TTS basically means changing a text to 
speech. A useful open-source package on TTS called 'flite' has been 
developed by Carnegie Mellon University (CMU); see

     http://www.speech.cs.cmu.edu/flite/index.html,
     http://www.speech.cs.cmu.edu/flite/index.html/flite/flite.html,
     http://www.speech.cs.cmu.edu/hephaestus.html
     Concerning speech recognition, there's an open-source project called 
'Sphinx' undergoing at CMU. It provides a collection of real-time speech 
recognition engines and an acoustic model trainer and documentation for 
building related acoustic models
     If you need to add standard encryption technologies to service, you may 
study the OpenPGP, which was originally derived from PGP (Pretty Good 
Privacy), first created by Phil Zimmermann in 1991. You may refer to the 
site 
http://www.openpgp.org
for further information. A Java implementation of OpenPGP can be found at 
http://www.cryptix.org
     If you want to use a simple XML based protocol to let applications 
exchange information over HTTP, you may study SOAP, which stands for 
Simple Object Access Protocol Currently, there are many ways for 
applications to communicate. Well-known methods like DCOM and 
CORBA utilize Remote Procedure Calls (RPC) for objects to exchange 
information, which may give rise to compatibility and security problems. A 
better way to communicate between applications is over HTTP as HTTP is 
supported by all Internet browsers and servers. SOAP was created to 
accomplish this and provides a way to communicate between applications 
running on different operating systems, with different technologies and 
programming languages. You may refer to the sites 
http://ws.apache.org/soap/ 
http://www.w3.org/TR/SOAP/ 
for more information.
     If you need to control your systems' traffic, refer to
                        http://www.lartc.org/howto
     After you have learned all these, you should have added a lot of value to 
yourself; you have become a free person and can work on something you 
feel significant and interesting. Some day, tell us your success stories.


< < Previous