ciando eBooks - ein Service Ihrer Bibliothek

	Preface	5
	Contents	7
	1 Data Mining and Information Systems: Quo Vadis?	14
	Robert Stahlbock, Stefan Lessmann, and Sven F. Crone	14
	1.1 Introduction	14
	1.2 Special Issues in Data Mining	16
	1.2.1 Confirmatory Data Analysis	16
	1.2.2 Knowledge Discovery from Supervised Learning	17
	1.2.3 Classification Analysis	19
	1.2.4 Hybrid Data Mining Procedures	21
	1.2.5 Web Mining	23
	1.2.6 Privacy-Preserving Data Mining	24
	1.3 Conclusion and Outlook	25
	References	26
	Part I Confirmatory Data Analysis	29
	2 Response-Based Segmentation Using Finite Mixture Partial Least Squares	30
	Christian M. Ringle, Marko Sarstedt, and Erik A. Mooi	30
	2.1 Introduction	31
	2.1.1 On the Use of PLS Path Modeling	31
	2.1.2 Problem Statement	33
	2.1.3 Objectives and Organization	34
	2.2 Partial Least Squares Path Modeling	35
	2.3 Finite Mixture Partial Least Squares Segmentation	37
	2.3.1 Foundations	37
	2.3.2 Methodology	39
	2.3.3 Systematic Application of FIMIX-PLS	42
	2.4 Application of FIMIX-PLS	45
	2.4.1 On Measuring Customer Satisfaction	45
	2.4.2 Data and Measures	45
	2.4.3 Data Analysis and Results	47
	2.5 Summary and Conclusion	55
	References	56
	Part II Knowledge Discovery from Supervised Learning	61
	3 Building Acceptable Classification Models	62
	David Martens and Bart Baesens	62
	3.1 Introduction	63
	3.2 Comprehensibility of Classification Models	64
	3.2.1 Measuring Comprehensibility	66
	3.2.2 Obtaining Comprehensible Classification Models	67
	3.2.2.1 Building Rule-Based Models	67
	3.2.2.2 Combining Output Types	67
	3.2.2.3 Visualization	67
	3.3 Justifiability of Classification Models	68
	3.3.1 Taxonomy of Constraints	69
	3.3.2 Monotonicity Constraint	71
	3.3.3 Measuring Justifiability	72
	3.3.4 Obtaining Justifiable Classification Models	77
	3.4 Conclusion	79
	References	80
	4 Mining Interesting Rules Without Support Requirement: A General Universal Existential Upward Closure Property	84
	Yannick Le Bras, Philippe Lenca, and Stéphane Lallich	84
	4.1 Introduction	85
	4.2 State of the Art	86
	4.3 An Algorithmic Property of Confidence	89
	4.3.1 On UEUC Framework	89
	4.3.2 The UEUC Property	89
	4.3.3 An Efficient Pruning Algorithm	90
	4.3.4 Generalizing the UEUC Property	91
	4.4 A Framework for the Study of Measures	93
	4.4.1 Adapted Functions of Measure	93
	4.4.1.1 Association Rules	93
	4.4.1.2 Contingency Tables	93
	4.4.1.3 Minimal Joint Domain	1
	4.4.2 Expression of a Set of Measures of Ddconf	96
	4.5 Conditions for GUEUC	99
	4.5.1 A Sufficient Condition	99
	4.5.2 A Necessary Condition	100
	4.5.3 Classification of the Measures	101
	4.6 Conclusion	103
	References	104
	5 Classification Techniques and Error Control in Logic Mining	108
	Giovanni Felici, Bruno Simeone, and Vincenzo Spinelli	108
	5.1 Introduction	109
	5.2 Brief Introduction to Box Clustering	111
	5.3 BC-Based Classifier	113
	5.4 Best Choice of a Box System	117
	5.5 Bi-criterion Procedure for BC-Based Classifier	120
	5.6 Examples	121
	5.6.1 The Data Sets	121
	5.6.2 Experimental Results with BC	122
	5.6.3 Comparison with Decision Trees	124
	5.7 Conclusions	126
	References	126
	Part III Classification Analysis	129
	6 An Extended Study of the Discriminant Random Forest	130
	Tracy D. Lemmond, Barry Y. Chen, Andrew O. Hatch,and William G. Hanley	130
	6.1 Introduction	130
	6.2 Random Forests	131
	6.3 Discriminant Random Forests	132
	6.3.1 Linear Discriminant Analysis	133
	6.3.2 The Discriminant Random Forest Methodology	134
	6.4 DRF and RF: An Empirical Study	135
	6.4.1 Hidden Signal Detection	136
	6.4.1.1 Training on T1, Testing on J2	137
	6.4.1.2 Prediction Performance for J2 with Cross-validation	138
	6.4.2 Radiation Detection	139
	6.4.3 Significance of Empirical Results	143
	6.4.4 Small Samples and Early Stopping	144
	6.4.5 Expected Cost	150
	6.5 Conclusions	150
	References	152
	7 Prediction with the SVM Using Test Point Margins	154
	Süreyya Özögür-Akyüz, Zakria Hussain, and John Shawe-Taylor	154
	7.1 Introduction	154
	7.2 Methods	158
	7.3 Data Set Description	161
	7.4 Results	161
	7.5 Discussion and Future Work	162
	References	164
	8 Effects of Oversampling Versus Cost-Sensitive Learning for Bayesian and SVM Classifiers	166
	Alexander Liu, Cheryl Martin, Brian La Cour, and Joydeep Ghosh	166
	8.1 Introduction	166
	8.2 Resampling	168
	8.2.1 Random Oversampling	168
	8.2.2 Generative Oversampling	168
	8.3 Cost-Sensitive Learning	169
	8.4 Related Work	170
	8.5 A Theoretical Analysis of Oversampling Versus Cost-Sensitive Learning	171
	8.5.1 Bayesian Classification	171
	8.5.2 Resampling Versus Cost-Sensitive Learning in Bayesian Classifiers	172
	8.5.3 Effect of Oversampling on Gaussian Naive Bayes	173
	8.5.3.1 Random Oversampling	174
	8.5.3.2 Generative Oversampling	174
	8.5.3.3 Comparison to Cost-Sensitive Learning	175
	8.5.4 Effects of Oversampling for Multinomial Naive Bayes	175
	8.6 Empirical Comparison of Resampling and Cost-SensitiveLearning	177
	8.6.1 Explaining Empirical Differences Between Resampling and Cost-Sensitive Learning	177
	8.6.2 Naive Bayes Comparisons on Low-Dimensional Gaussian Data	178
	8.6.2.1 Gaussian Naive Bayes on Artificial, Low-Dimensional Data	179
	8.6.2.2 A Note on ROC and AUC	180
	8.6.2.3 Gaussian Naive Bayes on Real, Low-Dimensional Data	1
	8.6.3 Multinomial Naive Bayes	183
	8.6.4 SVMs	185
	8.6.5 Discussion	188
	8.7 Conclusion	189
	Appendix	190
	References	197
	9 The Impact of Small Disjuncts on Classifier Learning	200
	Gary M. Weiss	200
	9.1 Introduction	200
	9.2 An Example: The Vote Data Set	202
	9.3 Description of Experiments	204
	9.4 The Problem with Small Disjuncts	205
	9.5 The Effect of Pruning on Small Disjuncts	209
	9.6 The Effect of Training Set Size on Small Disjuncts	217
	9.7 The Effect of Noise on Small Disjuncts	220
	9.8 The Effect of Class Imbalance on Small Disjuncts	224
	9.9 Related Work	227
	9.10 Conclusion	230
	References	232
	Part IV Hybrid Data Mining Procedures	234
	10 Predicting Customer Loyalty Labels in a Large Retail Database: A Case Study in Chile	235
	Cristián J. Figueroa	235
	10.1 Introduction	235
	10.2 Related Work	237
	10.3 Objectives of the Study	239
	10.3.1 Supervised and Unsupervised Learning	240
	10.3.2 Unsupervised Algorithms	240
	10.3.2.1 Self-Organizing Map	240
	10.3.2.2 Sammon Mapping	242
	10.3.2.3 Curvilinear Component Analysis	243
	10.3.3 Variables for Segmentation	244
	10.3.4 Exploratory Data Analysis	245
	10.3.5 Results of the Segmentation	246
	10.4 Results of the Classifier	247
	10.5 Business Validation	250
	10.5.1 In-Store Minutes Charges for Prepaid Cell Phones	251
	10.5.2 Distribution of Products in the Store	252
	10.6 Conclusions and Discussion	254
	Appendix	256
	References	258
	11 PCA-Based Time Series Similarity Search	260
	Leonidas Karamitopoulos, Georgios Evangelidis, and Dimitris Dervos	260
	11.1 Introduction	261
	11.2 Background	263
	11.2.1 Review of PCA	263
	11.2.2 Implications of PCA in Similarity Search	264
	11.2.3 Related Work	266
	11.3 Proposed Approach	268
	11.4 Experimental Methodology	270
	11.4.1 Data Sets	270
	11.4.2 Evaluation Methods	271
	11.4.3 Rival Measures	272
	11.5 Results	273
	11.5.1 1-NN Classification	273
	11.5.2 k-NN Similarity Search	276
	11.5.3 Speeding Up the Calculation of APEdist	277
	11.6 Conclusion	279
	References	279
	12 Evolutionary Optimization of Least-Squares Support Vector Machines	282
	Arjan Gijsberts, Giorgio Metta, and Léon Rothkrantz	282
	12.1 Introduction	283
	12.2 Kernel Machines	283
	12.2.1 Least-Squares Support Vector Machines	284
	12.2.2 Kernel Functions	285
	12.2.2.1 Conditions for Kernels	285
	12.3 Evolutionary Computation	286
	12.3.1 Genetic Algorithms	286
	12.3.2 Evolution Strategies	287
	12.3.3 Genetic Programming	288
	12.4 Related Work	288
	12.4.1 Hyperparameter Optimization	289
	12.4.2 Combined Kernel Functions	289
	12.5 Evolutionary Optimization of Kernel Machines	291
	12.5.1 Hyperparameter Optimization	291
	12.5.2 Kernel Construction	292
	12.5.3 Objective Function	293
	12.6 Results	294
	12.6.1 Data Sets	294
	12.6.2 Results for Hyperparameter Optimization	295
	12.6.3 Results for EvoKMGP	298
	12.7 Conclusions and Future Work	299
	References	300
	13 Genetically Evolved kNN Ensembles	303
	Ulf Johansson, Rikard König, and Lars Niklasson	303
	13.1 Introduction	303
	13.2 Background and Related Work	305
	13.3 Method	306
	13.3.1 Data sets	309
	13.4 Results	311
	13.5 Conclusions	316
	References	317
	Part V Web-Mining	318
	14 Behaviorally Founded Recommendation Algorithm for Browsing Assistance Systems	319
	Peter Géczy, Noriaki Izumi, Shotaro Akaho, and Kôiti Hasida	319
	14.1 Introduction	319
	14.1.1 Related Works	320
	14.1.2 Our Contribution and Approach	321
	14.2 Concept Formalization	321
	14.3 System Design	325
	14.3.1 A Priori Knowledge of Human--System Interactions	325
	14.3.2 Strategic Design Factors	325
	14.3.3 Recommendation Algorithm Derivation	327
	14.4 Practical Evaluation	329
	14.4.1 Intranet Portal	330
	14.4.2 System Evaluation	332
	14.4.3 Practical Implications and Limitations	333
	14.5 Conclusions and Future Work	334
	References	335
	15 Using Web Text Mining to Predict Future Events: A Testof the Wisdom of Crowds Hypothesis	337
	Scott Ryan and Lutz Hamel	337
	15.1 Introduction	337
	15.2 Method	339
	15.2.1 Hypotheses and Goals	339
	15.2.2 General Methodology	341
	15.2.3 The 2006 Congressional and Gubernatorial Elections	341
	15.2.4 Sporting Events and Reality Television Programs	342
	15.2.5 Movie Box Office Receipts and Music Sales	343
	15.2.6 Replication	344
	15.3 Results and Discussion	345
	15.3.1 The 2006 Congressional and Gubernatorial Elections	345
	15.3.2 Sporting Events and Reality Television Programs	347
	15.3.3 Movie and Music Album Results	349
	15.4 Conclusion	350
	References	351
	Part VI Privacy-Preserving Data Mining	353
	16 Avoiding Attribute Disclosure with the (Extended) p-Sensitive k-Anonymity Model	354
	Traian Marius Truta and Alina Campan	354
	16.1 Introduction	354
	16.2 Privacy Models and Algorithms	355
	16.2.1 The p-Sensitive k-Anonymity Model and Its Extension	355
	16.2.2 Algorithms for the p-Sensitive k-Anonymity Model	358
	16.3 Experimental Results	361
	16.3.1 Experiments for p-Sensitive k-Anonymity	361
	16.3.2 Experiments for Extended p-Sensitive k-Anonymity	363
	16.4 New Enhanced Models Based on p-Sensitive k-Anonymity	367
	16.4.1 Constrained p-Sensitive k-Anonymity	367
	16.4.2 p-Sensitive k-Anonymity in Social Networks	371
	16.5 Conclusions and Future Work	373
	References	373
	17 Privacy-Preserving Random Kernel Classification of Checkerboard Partitioned Data	375
	Olvi L. Mangasarian and Edward W. Wild	375
	17.1 Introduction	375
	17.2 Privacy-Preserving Linear Classifier for Checkerboard Partitioned Data	379
	17.3 Privacy-Preserving Nonlinear Classifier for Checkerboard Partitioned Data	381
	17.4 Computational Results	382
	17.5 Conclusion and Outlook	384
	References	386