|
Principles of Computer Systems and Network Management |
2 |
|
|
Preface |
6 |
|
|
Contents |
9 |
|
|
Introduction |
14 |
|
|
1.1 Introduction |
14 |
|
|
1.2 Computer System Life Cycle |
15 |
|
|
1.3 Shared Hosting Data Center (SHDC) |
18 |
|
|
1.4 Large Enterprise |
20 |
|
|
1.5 Network Service Provider |
22 |
|
|
1.6 History of Systems Management |
25 |
|
|
1.7 Summary |
27 |
|
|
1.8 Review Questions |
27 |
|
|
References |
28 |
|
|
Planning and Implementation |
29 |
|
|
2.1 Requirements |
30 |
|
|
2.1.1 Performance Requirements |
30 |
|
|
2.1.2 Resiliency and Availability Requirements |
31 |
|
|
2.1.3 Power and Thermal Requirements |
32 |
|
|
2.1.4 Security Requirements |
33 |
|
|
2.1.5 Manageability Requirements |
34 |
|
|
2.1.6 Backward Compatibility |
34 |
|
|
2.1.7 Other Requirements |
35 |
|
|
2.2 Evaluating Computer Systems |
36 |
|
|
2.2.1 Evaluating Computer Systems Performance |
38 |
|
|
2.2.1.1 General Principles for Performance Evaluation |
39 |
|
|
Utilization Law |
39 |
|
|
Little’s Law |
39 |
|
|
Forced Flow Law |
40 |
|
|
Safety Margins |
42 |
|
|
2.2.1.2 Queuing Theory |
43 |
|
|
2.2.1.3 Queuing Networks |
44 |
|
|
Canonical Delay and Throughput Curves |
45 |
|
|
2.2.1.4 An Example |
47 |
|
|
2.2.1.5 Simulations |
48 |
|
|
2.2.2 Evaluating Resiliency and Availability |
49 |
|
|
2.2.2.1 Reliability Analysis |
50 |
|
|
2.2.2.2 Critical Component Identification |
52 |
|
|
2.2.2.3 Failure Effects Modeling |
54 |
|
|
2.2.3 Power and Thermal Analysis |
55 |
|
|
2.2.4 Computer System Security Analysis |
59 |
|
|
2.2.4.1 Comparison Against Checklists |
60 |
|
|
2.2.4.2 Vulnerability Analysis |
60 |
|
|
2.3 Planning to Satisfy Requirements |
61 |
|
|
2.3.1 Systems Planning Process |
62 |
|
|
2.3.1.1 Architecture Definition |
63 |
|
|
Tiering |
63 |
|
|
Perimeter Defense |
65 |
|
|
Separation of Management and Operations |
65 |
|
|
Hierarchical Architecture |
66 |
|
|
Standardization of Components |
67 |
|
|
2.3.1.2 Logical Plan Development |
67 |
|
|
2.3.1.3 Approaches for Satisfying Requirements |
69 |
|
|
Reuse from a Catalogue |
69 |
|
|
Extrapolate from a Catalogue |
70 |
|
|
2.3.1.4 Dynamic Composition of Configurations |
70 |
|
|
2.4 Implementation |
71 |
|
|
2.5 Summary |
71 |
|
|
2.6 Review Questions |
72 |
|
|
References |
73 |
|
|
Operations Management |
74 |
|
|
3.1 Operations Center |
74 |
|
|
3.2 Management Data |
77 |
|
|
3.3 Manager Agent Protocols |
80 |
|
|
3.3.1 Remote Consoles |
80 |
|
|
3.3.2 Simple Network Management Protocol (SNMP) |
81 |
|
|
3.3.3 Common Object Repository Broker Architecture (CORBA) |
82 |
|
|
3.3.4 Web-Based Enterprise Management (WBEM) |
83 |
|
|
3.3.5 Web Services |
83 |
|
|
3.3.6 NetConf |
84 |
|
|
3.3.7 Comparison of the Different Management Protocols |
85 |
|
|
3.4 Management Information Structure |
85 |
|
|
3.4.1 Management Information Base |
86 |
|
|
3.4.2 Common Information Model |
88 |
|
|
3.4.3 Issues with Standard Representation |
89 |
|
|
3.5 Device Agent Structure |
91 |
|
|
3.6 Management Application Structure |
92 |
|
|
3.7 Operations Center Function |
94 |
|
|
3.8 Summary |
97 |
|
|
3.9 Review Questions |
97 |
|
|
References |
98 |
|
|
Discovery |
99 |
|
|
4.1 Discovery Approaches |
100 |
|
|
4.1.1 Manual Inventory |
100 |
|
|
4.1.2 Dictionary/Directory Queries |
101 |
|
|
4.1.3 Self-Advertisement |
101 |
|
|
4.1.4 Passive Observation |
102 |
|
|
4.1.5 Agent-Based Discovery |
102 |
|
|
4.1.6 Active Probing |
103 |
|
|
4.2 Discovery of Specific Types of IT Infrastructure |
104 |
|
|
4.2.1 Discovering Servers |
104 |
|
|
4.2.1.1 DNS Zone Transfer |
104 |
|
|
4.2.1.2 Traffic Analysis |
105 |
|
|
4.2.1.3 Agent-Based Discovery |
106 |
|
|
4.2.1.4 Agent-Less Discovery |
107 |
|
|
4.2.2 Discovering Client Machines |
107 |
|
|
4.2.3 Discovering Applications on Servers and Clients |
108 |
|
|
4.2.4 Discovering Layer-3 Network Devices |
110 |
|
|
4.2.5 Discovering Layer-2 Network Devices |
111 |
|
|
4.3 Storing Discovered Information |
112 |
|
|
4.3.1 Representing Hierarchical Relationships |
113 |
|
|
4.3.2 Representing General Graphs |
116 |
|
|
4.3.3 Representing Generic Relationships |
117 |
|
|
4.3.4 Other Types of Databases |
118 |
|
|
4.4 Summary |
119 |
|
|
4.5 Review Questions |
119 |
|
|
References |
120 |
|
|
Monitoring |
121 |
|
|
5.1 Monitored Information |
121 |
|
|
5.2 Generic Model for Monitoring |
122 |
|
|
5.3 Data Collection |
124 |
|
|
5.3.1 Passive Monitoring |
124 |
|
|
5.3.1.1 Applications |
125 |
|
|
5.3.1.2 Servers, Personal Computers, and Laptops |
126 |
|
|
5.3.1.3 Networks |
128 |
|
|
5.3.2 Active Monitoring |
129 |
|
|
5.3.2.1 Applications |
129 |
|
|
5.3.2.2 Servers, Personal Computers, and Laptops |
131 |
|
|
5.3.2.3 Networks |
132 |
|
|
5.4 Pre-DB Data Processing |
133 |
|
|
5.4.1 Data Reduction |
133 |
|
|
5.4.2 Data Cleansing |
134 |
|
|
5.4.3 Data Format Conversion |
137 |
|
|
5.5 Management Database |
139 |
|
|
5.5.1 Partitioned Databases |
140 |
|
|
5.5.2 Rolling Databases |
141 |
|
|
5.5.3 Load-Balanced Databases |
141 |
|
|
5.5.4 Hierarchical Database Federation |
142 |
|
|
5.5.5 Round-Robin Databases |
144 |
|
|
5.6 Summary |
144 |
|
|
5.7 Review Questions |
144 |
|
|
Fault Management |
146 |
|
|
6.1 Fault Management Architecture |
146 |
|
|
6.1.1 Common Types of Symptoms |
148 |
|
|
6.1.2 Common Types of Root Causes |
150 |
|
|
6.2 Fault Diagnosis Algorithms |
152 |
|
|
6.2.1 Topology Analysis Methods |
153 |
|
|
6.2.2 Rule-Based Methods |
156 |
|
|
6.2.3 Decision Trees |
157 |
|
|
6.2.4 Dependency Graphs |
158 |
|
|
6.2.5 Code Book |
160 |
|
|
6.2.6 Knowledge Bases |
161 |
|
|
6.2.7 Case-Based Reasoning |
162 |
|
|
6.2.8 Other Techniques |
163 |
|
|
6.3 Self-Healing Systems |
164 |
|
|
6.3.1 Autonomic Computing Architecture and Variations |
164 |
|
|
6.3.2 An Example of a Self Healing System |
166 |
|
|
6.4 Avoiding Failures |
167 |
|
|
6.4.1 Redundancy |
167 |
|
|
6.4.2 Independent Monitor |
168 |
|
|
6.4.3 Collaborative Monitoring |
169 |
|
|
6.4.4 Aged Restarts |
169 |
|
|
6.5 Summary |
170 |
|
|
6.6 Review Questions |
170 |
|
|
References |
171 |
|
|
Configuration Management |
173 |
|
|
7.1 Configuration Management Overview |
173 |
|
|
7.2 Configuration Setting |
175 |
|
|
7.2.1 Reusing Configuration Settings |
176 |
|
|
7.2.2 Script-Based Configuration Management |
178 |
|
|
7.2.3 Model-Based Configuration Management |
179 |
|
|
7.2.4 Configuration Workflows |
181 |
|
|
7.2.5 Simplifying Configuration Through Higher Abstractions |
182 |
|
|
7.2.6 Policy-Based Configuration Management |
183 |
|
|
7.3 Configuration Discovery and Change Control |
184 |
|
|
7.3.1 Structure of the CMDB |
185 |
|
|
7.3.2 Federated CMDB |
186 |
|
|
7.3.3 Dependency Discovery |
186 |
|
|
7.3.3.1 Application Dependencies |
187 |
|
|
7.3.3.2 Network Dependencies |
188 |
|
|
7.3.3.3 Service Dependencies |
189 |
|
|
7.4 Configuration Management Applications |
189 |
|
|
7.4.1 Configuration Validation |
189 |
|
|
7.4.2 What-If Analysis |
190 |
|
|
7.4.3 Configuration Cloning |
191 |
|
|
7.5 Patch Management |
191 |
|
|
7.5.1 Patch Identification |
191 |
|
|
7.5.2 Patch Assessment |
192 |
|
|
7.5.3 Patch Testing |
193 |
|
|
7.5.4 Patch Installation |
194 |
|
|
7.6 Summary |
195 |
|
|
7.7 Review Questions |
196 |
|
|
References |
196 |
|
|
Performance and Accounting Management |
198 |
|
|
8.1 Need for Operation Time Performance Management |
199 |
|
|
8.2 Approaches for Performance Management |
199 |
|
|
8.3 Performance Monitoring and Reporting |
201 |
|
|
8.3.1 Performance Metrics |
202 |
|
|
8.3.2 Addressing Scalability Issues |
203 |
|
|
8.3.3 Error Handling and Data Cleansing |
205 |
|
|
8.3.4 Metric Composition |
207 |
|
|
8.3.5 Performance Monitoring Approaches |
209 |
|
|
8.3.5.1 Networks |
209 |
|
|
8.3.5.2 Servers |
210 |
|
|
8.3.5.3 Applications |
210 |
|
|
8.3.6 Performance Reporting and Visualization |
212 |
|
|
8.4 Performance TroubleShooting |
216 |
|
|
8.4.1 Detecting Performance Problems |
216 |
|
|
8.4.1.1 Thresholds |
216 |
|
|
8.4.1.2 Statistical Abnormality |
218 |
|
|
8.4.1.3 Help Desk Reports |
218 |
|
|
8.4.2 Correcting Performance Problems |
218 |
|
|
8.4.2.1 Misconfiguration |
219 |
|
|
8.4.2.2 System Changes |
219 |
|
|
8.4.2.3 Workload Growth |
219 |
|
|
8.4.2.4 Workload Surge |
220 |
|
|
8.5 Capacity Planning |
220 |
|
|
8.5.1 Simple Estimation |
221 |
|
|
8.5.2 ARIMA Models |
222 |
|
|
8.5.3 Seasonal Decomposition |
223 |
|
|
8.6 Accounting Management |
224 |
|
|
8.7 Summary |
226 |
|
|
8.8 Review Questions |
226 |
|
|
References |
227 |
|
|
Security Management |
228 |
|
|
9.1 General Techniques |
229 |
|
|
9.1.1 Cryptography and Key Management |
229 |
|
|
9.1.2 Authentication |
233 |
|
|
9.1.3 Confidentiality/Access Control |
235 |
|
|
9.1.4 Integrity |
236 |
|
|
9.1.5 Non-Repudiation |
238 |
|
|
9.1.6 Availability |
238 |
|
|
9.2 Security Management for Personal Computers |
239 |
|
|
9.2.1 Data Protection |
240 |
|
|
9.2.2 Malware Protection |
241 |
|
|
9.2.3 Patch Management |
242 |
|
|
9.2.4 Data Backup and Recovery |
243 |
|
|
9.3 Security Management for Computer Servers |
244 |
|
|
9.3.1 Password Management |
245 |
|
|
9.3.2 Single Sign-On |
246 |
|
|
9.3.3 Secure Access Protocols |
247 |
|
|
9.4 Security Management for Computer Networks |
248 |
|
|
9.4.1 Firewalls |
249 |
|
|
9.4.2 Intrusion Detection/Prevention Systems |
250 |
|
|
9.4.3 Honeypots |
252 |
|
|
9.5 Operational Issues |
252 |
|
|
9.5.1 Physical Security |
253 |
|
|
9.5.2 Security Policies |
253 |
|
|
9.5.3 Auditing |
255 |
|
|
9.6 Summary |
255 |
|
|
9.7 Review Questions |
256 |
|
|
References |
257 |
|
|
Advanced Topics |
258 |
|
|
10.1 Process Management |
258 |
|
|
10.2 Helpdesk Systems |
259 |
|
|
10.3 Web, Web 2.0, and Management |
261 |
|
|
10.4 Summary |
262 |
|
|
10.5 Review Questions |
262 |
|
|
References |
262 |
|
|
Index |
263 |
|