DELIVERY & GIFT DETAILS:
Usually ships within 24 hours
Delivery Time and Shipping Rates
Eligible for gift wrap & gift message.
Peer-to-peer networks, such as those used for instant messaging, search engines, and data storage, have gotten a bad reputation because of legal and ethical questions about swapping copyrighted material on the Internet. The manager of the policy and networking group at the IBM T.J. Watson Research Center in New York describes the architecture of the hardware and software behind such networks, and defends the development of legitimate applications utilizing their assets of decentralization and resiliency. Annotation ©2004 Book News, Inc., Portland, OR
More Reviews and RecommendationsDINESH C. VERMA, PhD, is Manager of the Policy and Networking Group at the IBM T.J. Watson Research Center, New York. He received his doctorate from University of California, Berkeley, and holds more than fourteen patents in the area of computer networking. Dr. Verma is the author of Content Distribution Networks: An Engineering Approach (published by Wiley).
| Preface | xi | |
| Who Will Benefit from This Book? | xii | |
| Who Is This Book Not For? | xiii | |
| Organization of the Book | xiii | |
| 1 | The Peer-to-Peer Architecture | 1 |
| 1.1 | Distributed Applications | 1 |
| 1.1.1 | A Distributed Computing Example | 2 |
| 1.1.2 | Client-Server Architecture | 5 |
| 1.1.3 | Peer-to-Peer Architecture | 7 |
| 1.2 | The Peer-to-Peer Software Structure | 7 |
| 1.2.1 | Base Overlay Layer | 8 |
| 1.2.2 | Middleware Functions | 9 |
| 1.2.3 | Application Layer | 10 |
| 1.3 | Comparison of Architectures | 11 |
| 1.3.1 | Ease of Development | 12 |
| 1.3.2 | Manageability | 12 |
| 1.3.3 | Scalability | 13 |
| 1.3.4 | Administrative Domains | 15 |
| 1.3.5 | Security | 15 |
| 1.3.6 | Reliability | 16 |
| 2 | Peer Discovery and Overlay Formation | 19 |
| 2.1 | Discovery | 20 |
| 2.1.1 | Static Configuration | 20 |
| 2.1.2 | Centralized Directory | 21 |
| 2.1.3 | Using the Domain Name Service | 23 |
| 2.1.4 | Member Propagation Techniques with Initial Member Discovery | 24 |
| 2.1.4.1 | Member Propagation with Full Member List | 25 |
| 2.1.4.2 | Member Propagation with Partial Member List | 25 |
| 2.1.4.3 | Member Propagation with a Hint Server | 26 |
| 2.2 | Overlay Formation | 27 |
| 2.2.1 | Creating an Overlay Link | 28 |
| 2.2.1.1 | Communicating Across Firewalls | 28 |
| 2.2.1.2 | Communicating Across Two Firewalls | 29 |
| 2.3 | Topology Selection | 31 |
| 2.3.1 | Random Mesh Formation | 32 |
| 2.3.2 | Tiered Formation | 33 |
| 2.3.3 | Ordered Lattices | 34 |
| 3 | Application-Layer Multicast | 37 |
| 3.1 | General Multicast Techniques | 38 |
| 3.1.1 | Group Addressing | 39 |
| 3.1.2 | Group Maintenance | 40 |
| 3.1.3 | Message Forwarding Scheme | 42 |
| 3.1.4 | Multicast Routing | 44 |
| 3.1.5 | Secure Multicast | 45 |
| 3.1.6 | Reliable Multicast | 46 |
| 3.1.7 | Multicast Flow and Congestion Control | 47 |
| 3.2 | Network-Layer Multicast--IP Multicast | 49 |
| 3.2.1 | Problems with IP-Layer Multicast | 50 |
| 3.3 | Application-Layer Multicast | 52 |
| 3.3.1 | Broadcast Mechanisms in Peer-to-Peer Networks | 52 |
| 3.3.2 | Multicast in Peer-to-Peer Overlays | 53 |
| 4 | File-Sharing Applications | 55 |
| 4.1 | File-Sharing Overview | 55 |
| 4.1.1 | Disk Space Management | 56 |
| 4.1.2 | File Indexing | 57 |
| 4.1.3 | File Search/Retrieval | 58 |
| 4.1.4 | Access Control and Security | 59 |
| 4.1.5 | Anonymous File Retrieval | 59 |
| 4.1.6 | Search Acceleration Techniques | 61 |
| 4.1.7 | Digital Rights Management | 62 |
| 4.2 | Usage of File-Sharing Applications | 63 |
| 4.2.1 | Limitations of File-Sharing Applications | 64 |
| 4.3 | Preventing Unauthorized File Sharing | 66 |
| 4.3.1 | Firewall-Based Techniques | 67 |
| 4.3.2 | Asset Inventory | 70 |
| 4.3.3 | Port Scanning | 71 |
| 4.3.4 | Usage-Based Rate Control | 72 |
| 4.3.5 | Malicious Participation | 73 |
| 5 | File Storage Service | 77 |
| 5.1 | Handle Management | 78 |
| 5.2 | Retrieving Files with Handles | 80 |
| 5.2.1 | Circular Ring Routing | 81 |
| 5.2.2 | Plaxton Scheme | 82 |
| 5.2.3 | CAN Routing Algorithm | 84 |
| 5.2.4 | Modified Network Routing Schemes | 85 |
| 5.2.5 | Modified Broadcast | 86 |
| 5.3 | Miscellaneous Functions | 87 |
| 5.3.1 | Access Control | 87 |
| 5.3.2 | Availability and Reliability | 89 |
| 5.4 | Usage Scenarios | 90 |
| 5.4.1 | Distributed File Systems | 91 |
| 5.4.2 | Anonymous Publishing | 92 |
| 6 | Data Backup Service | 95 |
| 6.1 | The Traditional Data Management System | 96 |
| 6.2 | The Peer-to-Peer Data Management System | 98 |
| 6.2.1 | The Backup/Restore Manager | 99 |
| 6.2.2 | The Peer Searcher | 100 |
| 6.2.3 | The File Searcher | 101 |
| 6.2.4 | The Properties Manager | 101 |
| 6.2.5 | The Data Manager | 102 |
| 6.2.6 | The Schedule Manager | 103 |
| 6.3 | Security Issues | 103 |
| 6.4 | Hybrid Data Management Approach | 105 |
| 6.5 | Feasibility of Peer-to-Peer Data Backup Service | 106 |
| 7 | Peer-to-Peer Directory System | 109 |
| 7.1 | LDAP Directory Servers | 110 |
| 7.2 | Why Use Peer-to-Peer Directories? | 112 |
| 7.3 | A Peer-to-Peer Directory System | 113 |
| 7.3.1 | Schema Maintenance | 114 |
| 7.3.2 | Operation Processing | 115 |
| 7.3.2.1 | Local Placement of Records | 116 |
| 7.3.2.2 | Name Space Partitioning | 118 |
| 7.3.3 | Access Management | 120 |
| 7.4 | Example Applications of Peer-to-Peer Directory | 121 |
| 8 | Publish-Subscribe Middleware | 123 |
| 8.1 | Overview of Publish-Subscribe Systems | 124 |
| 8.2 | Server-Centric Publish-Subscribe Services | 125 |
| 8.3 | Peer-to-Peer Publish-Subscribe Services | 127 |
| 8.3.1 | Broadcast Scheme | 128 |
| 8.3.2 | Multicast Group Approach | 129 |
| 8.4 | Comparison of Approaches | 130 |
| 8.5 | Example Application | 132 |
| 9 | Collaborative Applications | 135 |
| 9.1 | General Issues | 136 |
| 9.2 | Instant Messaging | 138 |
| 9.3 | IP Telephony | 140 |
| 9.4 | Shared Collaboration Databases | 142 |
| 9.5 | Collaborative Content Hosting | 145 |
| 9.6 | Anonymous Web Surfing | 147 |
| 10 | Related Topics | 151 |
| 10.1 | Legacy Peer-to-Peer Applications | 152 |
| 10.2 | Grid Computing | 154 |
| References | 157 | |
| Index | 161 |
In this chapter, we look at the general architecture of a peer-to-peer system and contrast it with the traditional client-server architecture that is ubiquitous in current computing systems. We then compare the relative merits and demerits of each of these approaches toward building a distributed system.
We begin the chapter with a discussion of the client-server and peer-to-peer computing architectures. The subsequent subsections look at the base components that go into making a peer-to-peer application, finally concluding with a section that compares the relative strengths and weaknesses of the two approaches.
1.1 DISTRIBUTED APPLICATIONS
A distributed application is an application that contains two or more software modules that are located on different computers. The software modules interact with each other over a communication network connecting the different computers.
To build a distributed application, you would need to decide how many software modules to include in the application, how to place those software modules on the different computers in the network, and how each software module discovers the other modules it needs to communicate with. There are many othertasks that must be done to build a distributed application, but those mentioned above are the key tasks to explain the difference between client-server computing and peer-to-peer computing.
1.1.1 A Distributed Computing Example
The different approaches to distributed computing can be explained best by means of an example. Suppose you are given the task of creating a simulation of the movement of the Sun, the Earth, and the Moon by a team of five astronomers. Each of the five astronomers has a computer on which he or she would like to see the motion and position of the three heavenly bodies at any given time for the last 2000 years as well as the next 2000 years. Let us say (purely for the sake of illustration, rather than as the preferred way to write such simulation) that the best way to solve this problem is to create a large database of records, each record containing the relative positions of the three bodies at different times ranging over the entire 4000-year period. To show the positions of the three heavenly bodies, the program will find the appropriate set of records and display the position visually on the computer screen. Even after making this choice on how to write the program, you, as the programmer assigned to the task, have multiple ways to develop and deploy the software.
You can write a stand-alone program that will do the complete simulation of the three heavenly bodies that runs on a single computer and install five copies of it on each of the computers. This approach (approach I) has the advantage that each astronomer can run the program as long as his or her computer is up and does not require access to a network, or to the computers of the other astronomers. This approach would be fine if the application runs well enough on each of the computers, but it does not harness the combined processing power of all the five computers. Furthermore, experienced programmers know that all programs must be maintained and upgraded multiple times-to fix bugs, to add new features, or to correct any errors in generation of the set of records in the program. With this approach, any changes that you make after the initial installation of the program would need to be replicated five times.
An alternative approach (approach II) would be for you to select the most powerful computer among the five to do the simulation, with all the other computers (as well as the one running the simulation) having a visualization interface for the users to interact with the simulator. You have broken the program into two software modules, the visualization module and the simulation module, and created five instances of the visualization module and one instance of the simulation module. If the astronomers' computers are of differing power, this allows all of them to harness the power of the fastest computer. Because the simulation module maintains a large set of data, this set can be maintained at a single place and use less disk space. Also, you can localize changes to the simulation module to a single computer, and only changes to the visualization module must be propagated to all of the five computers. If the visualization module is much simpler than the simulation module, this will cut down significantly on the number of bugs and changes that need to be maintained in different places. The drawback now is that each astronomer needs connectivity to the network in order to access the simulation module and that the computer running the simulation module must be available continuously.
Approach II outlined above follows the client-server architecture for distributed applications where the fastest computer is acting as the simulation server. The visualization modules running on the other computers are the clients that accesses the simulation module running on the server.
Although approach II allowed each computer to access the resources of the fastest computer, it did not use the combined processing power of the other four computers available to the distributed application. To use the processing power of all the five computers, you can divide your simulation modules into five identical portions, each one handling a different but similar part of the simulation process (approach III). Recalling the fact that the simulation module was implemented as a database of relative positions, each of the five computers can be assigned to hold a portion of the database. One could split the database into five equal portions, each computer holding one portion, or one could divide the database into overlapping portions so that the position at any time is stored at two or more computers. If disk space is not an issue, one could simply replicate the database on all the computers. When an astronomer wants to check the position of the three heavenly bodies at any time, the visualization module on his/her computer finds one of the five computers that has the simulation module with the correct portion of the database and then talks to that simulation module. All the five computers are acting as peers, each having a client component (the visualization module) as well as a server component (the simulation module). Approach III is the pure peer-to-peer approach to solving the three-body simulation problem.
Approach III could potentially be more scalable than approach II because it is leveraging the combined power of all the computers rather than that of a single computer. If the records in the database are available from multiple computers, the reliability of this approach may be higher than that of approach II. However, it reintroduces the problem that any changes made to a module (simulation or visualization) need to be replicated on all of the different computers.
In real life, one could also use a hybrid approach that is a mixture between the client-server architecture and the peer-to-peer architecture. The hybrid approach places some software modules on a set of computers that can act as servers and others act as clients. The hybrid approach for some distributed applications can often result in a better trade-off between the ease of software maintenance, scalability, and reliability.
For any of the approaches selected, you would need to solve the discovery problem. The different modules of the application need to communicate with each other, and a prerequisite for this would be that the modules know where to send messages to the other modules. In the Internet, messages are sent to other applications by specifying their network address, which consists of the IP address of the application and the port numbers on which the application is receiving messages. To communicate over the Internet protocol suite, each software module must find out the network address of the other software module (or modules).
One solution to the discovery process is to fix the port numbers for all the software modules that they will be using and have all the modules know the port numbers and IP addresses of the different modules. When developing the simulation application for the astronomers, you can hard code this information within each of the modules. However, you must ensure that the selected port numbers are available on the computers that the applications will be running on. Because most computers run applications developed by many different companies, this solution would require a global coordination of port numbers among all the software developers in the world, which is clearly not feasible. The alternative is to have the address and port number information be provided as configuration parameters to the different software modules. If you use this approach with the example application we have discussed here, it is relatively easy to specify five port numbers and IP addresses in the configuration of each computer. However, if you consider the case of a more complex real-world application that needs to run on many more computers, the manual effort required for configuration could be quite substantial.
One of the key advantages of the client-server architecture (approach II discussed above) is that it makes the discovery process quite simple. This enables the deployment of a large number of clients and a high degree of scalability. Let us now define the client-server architecture and the peer-to-peer computing architecture in a more precise manner and then examine the discovery process in each of the architectures.
1.1.2 Client-Server Architecture
The client-server architecture is a way to structure a distributed application so that it consists of two distinct software modules:
A server module, only one instance of which is present in the system
A client module, of which multiple instances are present in the system
The only communication in the system is between the client modules and the server module.
Please note that the client and server modules themselves may be quite complex systems with further submodules and components. However, the key characteristic of the client-server architecture is that there is a server module that is the central point for communication. Clients do not communicate with each other, only with the server module.
In the client-server architecture, the server is usually the more complex piece of the software. The clients are often (although not always) simpler. With the wide availability of a web browser on most desktops, it is quite common to develop distributed applications so that they can use a standard web browser as the client. In this case, no effort is needed to develop or maintain the client (or, rather, the effort has been taken over by a third party-the developer of the web browser). This simplifies the task of maintaining and upgrading the application software.
In any distributed application, the different components must discover each other in order to communicate. In the client-server architecture, only the clients need to communicate with the server. Therefore, each client needs to discover the network address of the server, and the server needs to know the network address of each of the clients.
The solution used for discovery in the client-server architecture is quite simple. The server runs on a port and network address that is known to the client module. The clients connect to the server on this well-known network address. Once the client connects to the server, the client and server are able to communicate with each other. The server need not be configured with any information about the clients. This implies that the same server module can communicate with any number of clients, constrained only by the physical resources needed to provide a reasonable response time to all of the connected clients.
For most common applications that run on the Internet, the port numbers on which the server side can run have been standardized. Thus the clients only need to know the IP address of the computer on which the server is running. Any individual client can also easily switch to another server module by using the IP address (or, in general, the IP address and the port number) of the new server. As an example, a web server typically runs on port 80 and web browsers can connect to the web server when a user specifies the name of the computer running the web server. The browser also has the option of connecting to a server running on a port different than 80.
The simplicity and ease of maintenance of client-server architecture are the key reasons for its widespread usage in the design of distributed applications at the present time. However, the client-server architecture has one drawback-It does not utilize the computing power of the computers running the client modules as effectively as it does the computing power of the server module. At present, when even the standard desktop packs more computing power than the computers that were used for Neil Armstrong's flight to the Moon in 1969, this does appear to be a rather wasteful approach.
1.1.3 Peer-to-Peer Architecture
The peer-to-peer architecture is a way to structure a distributed application so that it consists of many identical software modules, each module running on a different computer. The different software modules communicate with each other to complete the processing required for the completion of the distributed application.
One could view the peer-to-peer architecture as placing a server module as well as a client module on each computer. Thus each computer can access services from the software modules on another computer, as well as providing services to the other computer. However, it also implies that the discovery process in the peer-to-peer architecture is much more complicated than that of the client-server architecture. Each computer would need to know the network addresses of the other computers running the distributed application, or at least of that subset of computers with which it may need to communicate. Furthermore, propagating changes to the different software modules on all the different computers would also be much harder. However, the combined processing power of several large computers could easily surpass the processing power available from even the best single computer, and the peer-to-peer architecture could thus result in much more scalable applications.
The bulk of this book is devoted to the subject of peer-to-peer applications. In Section 1.2, we look at the architecture of the typical software that must run on each computer in the peer-to-peer architecture. Subsequent chapters in the book discuss the issues of discovery and creating communication overlays among all the nodes that are participating in the peer-to-peer architecture.
1.2
Continues...
Excerpted from Legitimate Applications of Peer-to-Peer Networks by Dinesh C. Verma Copyright © 2004 by John Wiley & Sons, Inc.. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
loading...
loading...
loading...
Terms of Use, Copyright, and Privacy Policy
© 1997-2010 Barnesandnoble.com llc


