Hilfe Warenkorb Konto Anmelden
 
 
   Schnellsuche   
     zur Expertensuche                      
Data Mining - Special Issue in Annals of Information Systems
  Großes Bild
 
Data Mining - Special Issue in Annals of Information Systems
von: Robert Stahlbock, Sven F. Crone, Stefan Lessmann
Springer-Verlag, 2009
ISBN: 9781441912800
387 Seiten, Download: 7194 KB
 
Format:  PDF
geeignet für: Apple iPad, Android Tablet PC's Online-Lesen PC, MAC, Laptop

Typ: B (paralleler Zugriff)

 

 
eBook anfordern
Inhaltsverzeichnis

  Preface 5  
  Contents 7  
  1 Data Mining and Information Systems: Quo Vadis? 14  
     Robert Stahlbock, Stefan Lessmann, and Sven F. Crone 14  
     1.1 Introduction 14  
     1.2 Special Issues in Data Mining 16  
        1.2.1 Confirmatory Data Analysis 16  
        1.2.2 Knowledge Discovery from Supervised Learning 17  
        1.2.3 Classification Analysis 19  
        1.2.4 Hybrid Data Mining Procedures 21  
        1.2.5 Web Mining 23  
        1.2.6 Privacy-Preserving Data Mining 24  
     1.3 Conclusion and Outlook 25  
     References 26  
  Part I Confirmatory Data Analysis 29  
     2 Response-Based Segmentation Using Finite Mixture Partial Least Squares 30  
        Christian M. Ringle, Marko Sarstedt, and Erik A. Mooi 30  
        2.1 Introduction 31  
           2.1.1 On the Use of PLS Path Modeling 31  
           2.1.2 Problem Statement 33  
           2.1.3 Objectives and Organization 34  
        2.2 Partial Least Squares Path Modeling 35  
        2.3 Finite Mixture Partial Least Squares Segmentation 37  
           2.3.1 Foundations 37  
           2.3.2 Methodology 39  
           2.3.3 Systematic Application of FIMIX-PLS 42  
        2.4 Application of FIMIX-PLS 45  
           2.4.1 On Measuring Customer Satisfaction 45  
           2.4.2 Data and Measures 45  
           2.4.3 Data Analysis and Results 47  
        2.5 Summary and Conclusion 55  
        References 56  
  Part II Knowledge Discovery from Supervised Learning 61  
     3 Building Acceptable Classification Models 62  
        David Martens and Bart Baesens 62  
        3.1 Introduction 63  
        3.2 Comprehensibility of Classification Models 64  
           3.2.1 Measuring Comprehensibility 66  
           3.2.2 Obtaining Comprehensible Classification Models 67  
              3.2.2.1 Building Rule-Based Models 67  
              3.2.2.2 Combining Output Types 67  
              3.2.2.3 Visualization 67  
        3.3 Justifiability of Classification Models 68  
           3.3.1 Taxonomy of Constraints 69  
           3.3.2 Monotonicity Constraint 71  
           3.3.3 Measuring Justifiability 72  
           3.3.4 Obtaining Justifiable Classification Models 77  
        3.4 Conclusion 79  
        References 80  
     4 Mining Interesting Rules Without Support Requirement: A General Universal Existential Upward Closure Property 84  
        Yannick Le Bras, Philippe Lenca, and Stéphane Lallich 84  
        4.1 Introduction 85  
        4.2 State of the Art 86  
        4.3 An Algorithmic Property of Confidence 89  
           4.3.1 On UEUC Framework 89  
           4.3.2 The UEUC Property 89  
           4.3.3 An Efficient Pruning Algorithm 90  
           4.3.4 Generalizing the UEUC Property 91  
        4.4 A Framework for the Study of Measures 93  
           4.4.1 Adapted Functions of Measure 93  
              4.4.1.1 Association Rules 93  
              4.4.1.2 Contingency Tables 93  
              4.4.1.3 Minimal Joint Domain 1  
           4.4.2 Expression of a Set of Measures of Ddconf 96  
        4.5 Conditions for GUEUC 99  
           4.5.1 A Sufficient Condition 99  
           4.5.2 A Necessary Condition 100  
           4.5.3 Classification of the Measures 101  
        4.6 Conclusion 103  
        References 104  
     5 Classification Techniques and Error Control in Logic Mining 108  
        Giovanni Felici, Bruno Simeone, and Vincenzo Spinelli 108  
        5.1 Introduction 109  
        5.2 Brief Introduction to Box Clustering 111  
        5.3 BC-Based Classifier 113  
        5.4 Best Choice of a Box System 117  
        5.5 Bi-criterion Procedure for BC-Based Classifier 120  
        5.6 Examples 121  
           5.6.1 The Data Sets 121  
           5.6.2 Experimental Results with BC 122  
           5.6.3 Comparison with Decision Trees 124  
        5.7 Conclusions 126  
        References 126  
  Part III Classification Analysis 129  
     6 An Extended Study of the Discriminant Random Forest 130  
        Tracy D. Lemmond, Barry Y. Chen, Andrew O. Hatch,and William G. Hanley 130  
        6.1 Introduction 130  
        6.2 Random Forests 131  
        6.3 Discriminant Random Forests 132  
           6.3.1 Linear Discriminant Analysis 133  
           6.3.2 The Discriminant Random Forest Methodology 134  
        6.4 DRF and RF: An Empirical Study 135  
           6.4.1 Hidden Signal Detection 136  
              6.4.1.1 Training on T1, Testing on J2 137  
              6.4.1.2 Prediction Performance for J2 with Cross-validation 138  
           6.4.2 Radiation Detection 139  
           6.4.3 Significance of Empirical Results 143  
           6.4.4 Small Samples and Early Stopping 144  
           6.4.5 Expected Cost 150  
        6.5 Conclusions 150  
        References 152  
     7 Prediction with the SVM Using Test Point Margins 154  
        Süreyya Özögür-Akyüz, Zakria Hussain, and John Shawe-Taylor 154  
        7.1 Introduction 154  
        7.2 Methods 158  
        7.3 Data Set Description 161  
        7.4 Results 161  
        7.5 Discussion and Future Work 162  
        References 164  
     8 Effects of Oversampling Versus Cost-Sensitive Learning for Bayesian and SVM Classifiers 166  
        Alexander Liu, Cheryl Martin, Brian La Cour, and Joydeep Ghosh 166  
        8.1 Introduction 166  
        8.2 Resampling 168  
           8.2.1 Random Oversampling 168  
           8.2.2 Generative Oversampling 168  
        8.3 Cost-Sensitive Learning 169  
        8.4 Related Work 170  
        8.5 A Theoretical Analysis of Oversampling Versus Cost-Sensitive Learning 171  
           8.5.1 Bayesian Classification 171  
           8.5.2 Resampling Versus Cost-Sensitive Learning in Bayesian Classifiers 172  
           8.5.3 Effect of Oversampling on Gaussian Naive Bayes 173  
              8.5.3.1 Random Oversampling 174  
              8.5.3.2 Generative Oversampling 174  
              8.5.3.3 Comparison to Cost-Sensitive Learning 175  
           8.5.4 Effects of Oversampling for Multinomial Naive Bayes 175  
        8.6 Empirical Comparison of Resampling and Cost-SensitiveLearning 177  
           8.6.1 Explaining Empirical Differences Between Resampling and Cost-Sensitive Learning 177  
           8.6.2 Naive Bayes Comparisons on Low-Dimensional Gaussian Data 178  
              8.6.2.1 Gaussian Naive Bayes on Artificial, Low-Dimensional Data 179  
              8.6.2.2 A Note on ROC and AUC 180  
              8.6.2.3 Gaussian Naive Bayes on Real, Low-Dimensional Data 1  
           8.6.3 Multinomial Naive Bayes 183  
           8.6.4 SVMs 185  
           8.6.5 Discussion 188  
        8.7 Conclusion 189  
        Appendix 190  
        References 197  
     9 The Impact of Small Disjuncts on Classifier Learning 200  
        Gary M. Weiss 200  
        9.1 Introduction 200  
        9.2 An Example: The Vote Data Set 202  
        9.3 Description of Experiments 204  
        9.4 The Problem with Small Disjuncts 205  
        9.5 The Effect of Pruning on Small Disjuncts 209  
        9.6 The Effect of Training Set Size on Small Disjuncts 217  
        9.7 The Effect of Noise on Small Disjuncts 220  
        9.8 The Effect of Class Imbalance on Small Disjuncts 224  
        9.9 Related Work 227  
        9.10 Conclusion 230  
        References 232  
  Part IV Hybrid Data Mining Procedures 234  
     10 Predicting Customer Loyalty Labels in a Large Retail Database: A Case Study in Chile 235  
        Cristián J. Figueroa 235  
        10.1 Introduction 235  
        10.2 Related Work 237  
        10.3 Objectives of the Study 239  
           10.3.1 Supervised and Unsupervised Learning 240  
           10.3.2 Unsupervised Algorithms 240  
              10.3.2.1 Self-Organizing Map 240  
              10.3.2.2 Sammon Mapping 242  
              10.3.2.3 Curvilinear Component Analysis 243  
           10.3.3 Variables for Segmentation 244  
           10.3.4 Exploratory Data Analysis 245  
           10.3.5 Results of the Segmentation 246  
        10.4 Results of the Classifier 247  
        10.5 Business Validation 250  
           10.5.1 In-Store Minutes Charges for Prepaid Cell Phones 251  
           10.5.2 Distribution of Products in the Store 252  
        10.6 Conclusions and Discussion 254  
        Appendix 256  
        References 258  
     11 PCA-Based Time Series Similarity Search 260  
        Leonidas Karamitopoulos, Georgios Evangelidis, and Dimitris Dervos 260  
        11.1 Introduction 261  
        11.2 Background 263  
           11.2.1 Review of PCA 263  
           11.2.2 Implications of PCA in Similarity Search 264  
           11.2.3 Related Work 266  
        11.3 Proposed Approach 268  
        11.4 Experimental Methodology 270  
           11.4.1 Data Sets 270  
           11.4.2 Evaluation Methods 271  
           11.4.3 Rival Measures 272  
        11.5 Results 273  
           11.5.1 1-NN Classification 273  
           11.5.2 k-NN Similarity Search 276  
           11.5.3 Speeding Up the Calculation of APEdist 277  
        11.6 Conclusion 279  
        References 279  
     12 Evolutionary Optimization of Least-Squares Support Vector Machines 282  
        Arjan Gijsberts, Giorgio Metta, and Léon Rothkrantz 282  
        12.1 Introduction 283  
        12.2 Kernel Machines 283  
           12.2.1 Least-Squares Support Vector Machines 284  
           12.2.2 Kernel Functions 285  
              12.2.2.1 Conditions for Kernels 285  
        12.3 Evolutionary Computation 286  
           12.3.1 Genetic Algorithms 286  
           12.3.2 Evolution Strategies 287  
           12.3.3 Genetic Programming 288  
        12.4 Related Work 288  
           12.4.1 Hyperparameter Optimization 289  
           12.4.2 Combined Kernel Functions 289  
        12.5 Evolutionary Optimization of Kernel Machines 291  
           12.5.1 Hyperparameter Optimization 291  
           12.5.2 Kernel Construction 292  
           12.5.3 Objective Function 293  
        12.6 Results 294  
           12.6.1 Data Sets 294  
           12.6.2 Results for Hyperparameter Optimization 295  
           12.6.3 Results for EvoKMGP 298  
        12.7 Conclusions and Future Work 299  
        References 300  
     13 Genetically Evolved kNN Ensembles 303  
        Ulf Johansson, Rikard König, and Lars Niklasson 303  
        13.1 Introduction 303  
        13.2 Background and Related Work 305  
        13.3 Method 306  
           13.3.1 Data sets 309  
        13.4 Results 311  
        13.5 Conclusions 316  
        References 317  
  Part V Web-Mining 318  
     14 Behaviorally Founded Recommendation Algorithm for Browsing Assistance Systems 319  
        Peter Géczy, Noriaki Izumi, Shotaro Akaho, and Kôiti Hasida 319  
        14.1 Introduction 319  
           14.1.1 Related Works 320  
           14.1.2 Our Contribution and Approach 321  
        14.2 Concept Formalization 321  
        14.3 System Design 325  
           14.3.1 A Priori Knowledge of Human--System Interactions 325  
           14.3.2 Strategic Design Factors 325  
           14.3.3 Recommendation Algorithm Derivation 327  
        14.4 Practical Evaluation 329  
           14.4.1 Intranet Portal 330  
           14.4.2 System Evaluation 332  
           14.4.3 Practical Implications and Limitations 333  
        14.5 Conclusions and Future Work 334  
        References 335  
     15 Using Web Text Mining to Predict Future Events: A Testof the Wisdom of Crowds Hypothesis 337  
        Scott Ryan and Lutz Hamel 337  
        15.1 Introduction 337  
        15.2 Method 339  
           15.2.1 Hypotheses and Goals 339  
           15.2.2 General Methodology 341  
           15.2.3 The 2006 Congressional and Gubernatorial Elections 341  
           15.2.4 Sporting Events and Reality Television Programs 342  
           15.2.5 Movie Box Office Receipts and Music Sales 343  
           15.2.6 Replication 344  
        15.3 Results and Discussion 345  
           15.3.1 The 2006 Congressional and Gubernatorial Elections 345  
           15.3.2 Sporting Events and Reality Television Programs 347  
           15.3.3 Movie and Music Album Results 349  
        15.4 Conclusion 350  
        References 351  
  Part VI Privacy-Preserving Data Mining 353  
     16 Avoiding Attribute Disclosure with the (Extended) p-Sensitive k-Anonymity Model 354  
        Traian Marius Truta and Alina Campan 354  
        16.1 Introduction 354  
        16.2 Privacy Models and Algorithms 355  
           16.2.1 The p-Sensitive k-Anonymity Model and Its Extension 355  
           16.2.2 Algorithms for the p-Sensitive k-Anonymity Model 358  
        16.3 Experimental Results 361  
           16.3.1 Experiments for p-Sensitive k-Anonymity 361  
           16.3.2 Experiments for Extended p-Sensitive k-Anonymity 363  
        16.4 New Enhanced Models Based on p-Sensitive k-Anonymity 367  
           16.4.1 Constrained p-Sensitive k-Anonymity 367  
           16.4.2 p-Sensitive k-Anonymity in Social Networks 371  
        16.5 Conclusions and Future Work 373  
        References 373  
     17 Privacy-Preserving Random Kernel Classification of Checkerboard Partitioned Data 375  
        Olvi L. Mangasarian and Edward W. Wild 375  
        17.1 Introduction 375  
        17.2 Privacy-Preserving Linear Classifier for Checkerboard Partitioned Data 379  
        17.3 Privacy-Preserving Nonlinear Classifier for Checkerboard Partitioned Data 381  
        17.4 Computational Results 382  
        17.5 Conclusion and Outlook 384  
        References 386  


nach oben


  Mehr zum Inhalt
Kapitelübersicht
Kurzinformation
Inhaltsverzeichnis
Leseprobe
Blick ins Buch
Fragen zu eBooks?

  Navigation
Belletristik / Romane
Computer
Geschichte
Kultur
Medizin / Gesundheit
Philosophie / Religion
Politik
Psychologie / Pädagogik
Ratgeber
Recht
Reise / Hobbys
Sexualität / Erotik
Technik / Wissen
Wirtschaft

  Info
Hier gelangen Sie wieder zum Online-Auftritt Ihrer Bibliothek
© 2008-2024 ciando GmbH | Impressum | Kontakt | F.A.Q. | Datenschutz