Verfügbarkeit: Text mining :: THWS Bibkatalog

Text mining: classification, clustering, and applications

Gespeichert in:

Bibliographische Detailangaben
Format:	Buch
Sprache:	English
Veröffentlicht:	Boca Raton [u.a.] CRC Press 2009
Schriftenreihe:	Data mining and knowledge discovery series
Schlagworte:	Data mining / Statistical methods Data mining > Statistical methods Text Mining Aufsatzsammlung
Online-Zugang:	Inhaltsverzeichnis
Beschreibung:	Includes bibliographical references and index
Beschreibung:	XXX, 290 S. Ill., graf. Darst.
ISBN:	9781420059403

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV035605094
003	DE-604
005	20160204
007	t
008	090707s2009 a\|\|\| \|\|\|\| 00\|\|\| eng d
010			\|a 2009013047
020			\|a 9781420059403 \|c hardcover \|9 978-1-4200-5940-3
035			\|a (OCoLC)144226505
035			\|a (DE-599)GBV595954707
040			\|a DE-604 \|b ger \|e aacr
041	0		\|a eng
049			\|a DE-473 \|a DE-703 \|a DE-20 \|a DE-355 \|a DE-M382
050		0	\|a QA76.9.D343
082	0		\|a 006.3/12 \|2 22
084			\|a ST 302 \|0 (DE-625)143652: \|2 rvk
084			\|a ST 306 \|0 (DE-625)143654: \|2 rvk
084			\|a ST 530 \|0 (DE-625)143679: \|2 rvk
245	1	0	\|a Text mining \|b classification, clustering, and applications \|c ed. by Ashok Srivastava ...
264		1	\|a Boca Raton [u.a.] \|b CRC Press \|c 2009
300			\|a XXX, 290 S. \|b Ill., graf. Darst.
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
490	0		\|a Data mining and knowledge discovery series
500			\|a Includes bibliographical references and index
650		0	\|a Data mining / Statistical methods
650		4	\|a Data mining \|x Statistical methods
650	0	7	\|a Text Mining \|0 (DE-588)4728093-1 \|2 gnd \|9 rswk-swf
655		7	\|0 (DE-588)4143413-4 \|a Aufsatzsammlung \|2 gnd-content
689	0	0	\|a Text Mining \|0 (DE-588)4728093-1 \|D s
689	0		\|5 DE-604
700	1		\|a Srivastava, Ashok K. \|e Sonstige \|0 (DE-588)120408945 \|4 oth
856	4	2	\|m Digitalisierung UB Bamberg \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017660340&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-017660340

Datensatz im Suchindex

_version_	1804139273897639936
adam_text	Contents List of Figures xiii List of Tables xix Introduction xxi About the Editors xxvii Contributor List xxix 1 Analysis of Text Patterns Using Kernel Methods 1 Marco Turchi, Alessio, Mammone, and Nello Cristianini 1.1 Introduction ............................ 1 1.2 General Overview on Kernel Methods ............. 1 1.2.1 Finding Patterns in Feature Space ........... 5 1.2.2 Formal Properties of Kernel Functions ......... 8 1.2.3 Operations on Kernel Functions ............ 10 1.3 Kernels for Text ......................... 11 1.3.1 Vector Space Model ................... 11 1.3.2 Semantic Kernels ..................... 13 1.3.3 String Kernels ...................... 17 1.4 Example .............................. 19 1.5 Conclusion and Further Reading ................ 22 2 Detection of Bias in Media Outlets with Statistical Learning Methods 27 Blaz Fortuna, Carolina Galleguillos, and Nello Cristianini 2.1 Introduction ............................ 27 2.2 Overview of the Experiments .................. 29 2.3 Data Collection and Preparation ................ 30 2.3.1 Article Extraction from HTML Pages ......... 31 2.3.2 Data Preparation ..................... 31 2.3.3 Detection of Matching News Items ........... 32 2.4 News Outlet Identification .................... Зо 2.5 Topic-Wise Comparison of Term Bias ............. 38 2.6 News Outlets Map ........................ 40 2.6.1 Distance Based on Lexical Choices ........... 42 vin 2.6.2 Distance Based on Choice of Topics .......... 43 2.7 Related Work ........................... 44 2.8 Conclusion ............................ 45 2.9 Appendix A: Support Vector Machines ............. 48 2.10 Appendix B: Bag of Words and Vector Space Models ..... 48 2.11 Appendix C: Kernel Canonical Correlation Analysis ..... 49 2.12 Appendix D: Multidimensional Scaling ............. 50 Collective Classification for Text Classification 51 Galileo Namata, Prithviraj Sen, Mustafa Bilgic, and Lise Getoor 3.1 Introduction ............................ 51 3.2 Collective Classification: Notation and Problem Definition . . 53 3.3 Approximate Inference Algorithms for Approaches Based on Local Conditional Classifiers ................... 53 3.3.1 Iterative Classification .................. 54 3.3.2 Gibbs Sampling ...................... 55 3.3.3 Local Classifiers and Further Optimizations ...... 55 3.4 Approximate Inference Algorithms for Approaches Based on Global Formulations ....................... 56 3.4.1 Loopy Belief Propagation ................ 58 3.4.2 Relaxation Labeling via Mean-Field Approach .... 59 3.5 Learning the Classifiers ..................... 60 3.6 Experimental Comparison .................... 60 3.6.1 Features Used ....................... 60 3.6.2 Real-World Datasets ................... 60 3.6.3 Practical Issues ...................... 63 3.7 Related Work ........................... 64 3.8 Conclusion ............................ 66 3.9 Acknowledgments ......................... 66 Topic Models 71 David M. BleA and John D. Lafferty 4.1 Introduction ............................ 71 4.2 Latent Dirichlet Allocation ................... 72 4.2.1 Statistical Assumptions ................. 73 4.2.2 Exploring a Corpus with the Posterior Distribution . . 75 4.3 Posterior Inference for LDÂ ................... 76 4.3.1 Mean Field Variational Inference .....,...... 78 4.3.2 Practical Considerations ................. 81 4.4 Dynamic Topic Models and Correlated Topic Models ..... 82 4.4.1 The Correlated Topic Model .............. 82 4.4.2 The Dynamic Topic Model ................ 84 4.5 Discussion ........,,,,...........,..... 39 їх Nonnegative Matrix and Tensor Factorization for Discussion Tracking 95 Brett W. Bader, Michael W. Berry, and Amy N. Langville 5.1 Introduction ............................ 95 5.1.1 Extracting Discussions .................. 96 5.1.2 Related Work ....................... 96 5.2 Notation .............................. 97 5.3 Tensor Decompositions and Algorithms ............ 98 5.3.1 PARAFAC-ALS ..................... 100 5.3.2 Nonnegative Tensor Factorization ............ 100 5.4 Enron Subset ........................... 102 5.4.1 Term Weighting Techniques ............... 103 5.5 Observations and Results .................... 105 5.5.1 Nonnegative Tensor Decomposition ........... 105 5.5.2 Analysis of Three-Way Tensor ............. 106 5.5.3 Analysis of Four-Way Tensor .............. 108 5.6 Visualizing Results of the NMF Clustering ........... Ill 5.7 Future Work ........................... 116 Text Clustering with Mixture of von Mises-Fisher Distribu¬ tions 121 Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, and Suvrit Sra 6.1 Introduction ............................ 121 6.2 Related Work ........................... 123 6.3 Preliminaries ........................... 124 6.3.1 The von Mises-Fisher (vMF) Distribution ....... 124 6.3.2 Maximum Likelihood Estimates ............. 125 6.4 EM on a Mixttire of vMFs (moVMF) .............. 126 6.5 Handling High-Dimensional Text Datasets ........... 127 6.5.1 Approximating к ..................... 128 6.5.2 Experimental Study of the Approximation ....... 130 6.6 Algorithms ............................ 132 6.7 Experimental Results ....................... 134 6.7.1 Datasets .......................... 135 6.7.2 Methodology ....................... 138 6.7.3 Simulated Datasets .................... 138 6.7.4 Classics Family of Datasets ............... 140 6.7.5 Yahoo News Dataset ..........,........ 143 6.7.6 20 Newsgroup Family of Datasets ............ 143 6.7.7 Slashdofc Datasets ...................... 145 6.8 Discussion ............................. 146 6.9 Conclusions and Future Work .................. 148 Constrained Partitional Clustering of Text Data: An Overview 155 Sugato Basu and Ian Davidson 7.1 Introduction ............................ 155 7.2 Uses of Constraints ........................ 157 7.2.1 Constraint-Based Methods ................ 157 7.2.2 Distance-Based Methods ................. 158 7.3 Text Clustering .......................... 159 7.3.1 Pre-Processing ...................... 161 7.3.2 Distance Measures .................... 162 7.4 Partitional Clustering with Constraints ............ 163 7.4.1 COP-KMeans ....................... 163 7.4.2 Algorithms with Penalties - PKM, CVQE ....... 164 7.4.3 LCVQE: An Extension to CVQE ............ 167 7.4.4 Probabilistic Penalty - PKM .............. 167 7.5 Learning Distance Function with Constraints ......... 168 7.5.1 Generalized Mahalanobis Distance Learning ...... 168 7.5.2 Kernel Distance Functions Using AdaBoost ...... 169 7.6 Satisfying Constraints and Learning Distance Functions . . . 170 7.6.1 Hidden Markov Random Field (HMRF) Model .... 170 7.6.2 EM Algorithm ...................... 173 7.6.3 Improvements to HMRF-KMeans ........... 173 7.7 Experiments ............................ 174 7.7.1 Datasets .......................... 174 7.7.2 Clustering Evaluation ............... , . . 175 7.7.3 Methodology ....................... 176 7.7.4 Comparison of Distance Functions ........... 176 7.7.5 Experimental Results .................. 177 7.8 Conclusions ............................ 180 Adaptive Information Filtering 185 Yi Zhang 8.1 Introduction ............................ 185 8.2 Standard Evaluation Measures ................. 188 8.3 Standard Retrieval Models and Filtering Approaches ..... 190 8.3.1 Existing Retrieval Models ................ 190 8.3.2 Existing Adaptive Filtering Approaches ........ 192 8.4 Collaborative Adaptive Filtering ................ 194 8.5 Novelty and Redundancy Detection ............... 196 8.5.1 Set Difference ....................... 199 8.5.2 Geometric Distance ................... 199 8.5.3 Distributional Similarity ................. 200 8.5.4 Summary of Novelty Detection ............. 201 8.6 Other Adaptive Filtering Topics ................ 201 8.6.1 Beyond Bag of Words .....,............ 202 8.6.2 Using Implicit Feedback ................. 202 8.6.3 Exploration and Exploitation Trade Off ........ 203 8.6.4 Evaluation beyond Topical Relevance ......... 203 8.7 Acknowledgments ......................... 204 9 Utility-Based Information Distillation 213 Yiming Yang and Abhimanyu Lad 9.1 Introduction ............................ 213 9.1.1 Related Work in Adaptive Filtering (AF) ....... 213 9.1.2 Related Work in Topic Detection and Tracking (TDT) 214 9.1.3 Limitations of Current Solutions ............ 215 9.2 A Sample Task .......................... 216 9.3 Technical Cores .......................... 218 9.3.1 Adaptive Filtering Component ............. 218 9.3.2 Passage Retrieval Component .............. 219 9.3.3 Novelty Detection Component ............. 220 9.3.4 Anti-Redundant Ranking Component ......... 220 9.4 Evaluation Methodology ..................... 221 9.4.1 Answer Keys ....................... 221 9.4.2 Evaluating the Utility of a Sequence of Ranked Lists . 223 9.5 Data ................................ 225 9.6 Experiments and Results ..................... 226 9.6.1 Baselines ......................... 226 9.6.2 Experimental Setup ................... 226 9.6.3 Results .......................... 227 9.7 Concluding Remarks ....................... 229 9.8 Acknowledgments ......................... 229 10 Text Search-Enhanced with Types and Entities 233 Soumen Chakrabarti, Sujatha Das, Vijay Krishnan, and Kriti Puniyani 10.1 Entity-Aware Search Architecture ................ 233 10.1.1 Guessing Answer Types ................. 234 10.1.2 Scoring Snippets ..................... 235 10.1.3 Efficient Indexing and Query Processing ........ 236 10.1.4 Comparison with Prior Work .............. 236 10.2 Understanding the Question ................... 236 10.2.1 Answer Type Clues in Questions ............ 239 10.2.2 Sequential Labeling of Type Clue Spans ........ 240 10.2.3 From Type Glue Spans to Answer Types ........ 245 10.2.4 Experiments ....................... 247 10.3 Scoring Potential Answer Snippets ............... 251 10.3.1 A Proximity Model .................... 253 10.3.2 Learning the Proximity Scoring Function ....... 255 10.3.3 Experiments ...............·....... 257 10.4 Indexing and Query Processing ................. 260 XU 10.4.1 Probability of a Query Atype .............. 262 10.4.2 Pre-Generalize and Post-Filter ............. 262 10.4.3 Atype Subset Index Space Model ............ 265 10.4.4 Query Time Bloat Model ................ 266 10.4.5 Choosing an Atype Subset ................ 269 10.4.6 Experiments ....................... 271 10.5 Conclusion ............................ 272 10.5.1 Summary ......................... 272 10.5.2 Ongoing and Future Work ................ 273 Index 279 List of Figures 1.1 Modularity of kernel-based algorithms: the data are trans¬ formed into a kernel matrix, by using a kernel function; then the pattern analysis algorithm uses this information to find interesting relations, which are all written in the form of a linear combination of kernel functions ............. 3 1.2 The evolutionary rooted tree built using a 4-spectrum kernel and the Neighbor Joining algorithm ............... 20 1.3 Multi-dimensional scaling using a 4-spectrum kernel distance matrix ............................... 21 2.1 Number of discovered pahs vs. time window size ....... 34 2.2 Distribution of ВЕР for 300 random sets ........... 38 2.3 Relative distance between news outlets using the ВЕР metric 43 2.4 Relative distance between news outlets, using the Topic simi¬ larity ............................... 44 3.1 À small text classification problem. Each box denotes a doc¬ ument, each directed edge between a pair of boxes denotes a hyperlink, and each oval node denotes a random variable. Assume the smaller oval nodes within each box represent the presence of the words. wi,w2, and гоц, in the document and the larger oval nodes the label of the document where the set of label values is С = {LI, L2}. A shaded oval denotes an observed variable whereas an unshaded oval node denotes an unobserved variable whose value needs to be predicted. ... 52 4.1 Five topics from a öO-topic LĐA model fit to Science from 1980-2002............................. 72 4.2 A graphical model representation of the latent Dirichlet allo¬ cation (LDA). Nodes denote random variables; edges denote dependence between random variables. Shaded nodes denote observed random variables; unshaded nodes denote hidden random variables. The rectangular boxes are plate notation, which denote replication .......·............. 74 4.3 Pive topics from a 50-topic model fit to the Yale Law Journal from 1980-2003,........-................ 75 ХШ XIV 4.4 (See color insert.) The analysis of a document from Sci¬ ence. Document similarity was computed using Eq. (4.4); topic words were computed using Eq. (4.3)........... 77 4.5 One iteration of mean field variational inference for LDA. This algorithm is repeated until the objective function in Eq. (4.6) converges ............................. 80 4.6 The graphical model for the correlated topic model in Sec¬ tion 4.4.1............................. 84 4.7 A portion of the topic graph learned from the 16,351 OCR arti¬ cles from Science (1990-1999). Each topic node is labeled with its five most probable phrases and has font proportional to its popularity in the corpus. (Phrases are found by permutation test.) The full model can be browsed with pointers to the origi¬ nal articles at http://www.es.cmu.edu/ lemur/science/ and on STATLIB. (The algorithm for constructing this graph from the covariance matrix of the logistic normal is given in (9).) ... 85 4.8 A graphical model representation of a dynamic topic model (for three time slices). Each topic s parameters ßt,k evolve over time ............................. 86 4.9 Two topics from a dynamic topic model fit to the Science archive (1880-2002)........................ 88 4.10 The top ten most similar articles to the query in Science (1880-2002), scored by Eq. (4.4) using the posterior distri¬ bution from the dynamic topic model .............. 89 5.1 PARAPAC provides a three-way decomposition with some similarity to the singular value decomposition ......... 99 5.2 (See color insert.) Five discussion topics identified in the three- way analysis over months .................... 106 5.3 Three discussion topics identified in the three-way analysis over days ............................. 108 5.4 Weekly betting pool identified in the three-way (top) and four- way (bottom) analyses ...................... 109 5.5 Long running discussion on FERC s various rulings of RTOs. 110 5.6 Forwarding of Texas A&M school fight song .......... Ill 5.7 (See color insert.) Pixel plot of the raw Enron term-by-email matrix ............................... 112 5.8 (See color insert.) Pixel plot of the reordered Enron term-by- email matrix ............................ 113 5.9 (See color insert.) Pixel plot of the reordered Enron term-by- document matrix with term and docmnent labels. ...... 114 5.10 (See color insert.) Close-up of one section of pixel plot of the reordered Enron term-by-document matrix ........... 115 6.1 True and approximated к values with d ~ 1000........ 130 6.2 Comparison of approximations for varying d, к = 500..... 131 6.3 Comparison of approximations for varying f (with d = 1000). 132 6.4 (See color insert.) Small-mix dataset and its clustering by soft-moVMF ............................ 139 6.5 Comparison of the algorithms for the Classicß datasets and the Yahoo News dataset ..................... 142 6.6 Comparison of the algorithms for the 20 Newsgroup and some subsets ............................... 144 6.7 Comparison of the algorithms for more subsets of 20 News¬ group data ............................. 145 6.8 (See color insert.) Variation of entropy of hidden variables with number of iterations (sof t-movMF) ............ 148 7.1 Input instances and constraints ................. 158 7.2 Constraint-based clustering ................... 159 7.3 Input instances and constraints ................. 160 7.4 Distance-based clustering .................... 160 7.5 Clustering using KMeans ..................... 164 7.6 Clustering under constraints using COP-KMeans ....... 165 7.7 DistBoost algorithm ....................... 169 7.8 A hidden Markov random field ................. 171 7.9 Graphical plate model of variable dependence ......... 171 7.10 HMRF-KMeans algorithm ................... 174 7.11 Comparison of cosine and Euclidean distance ......... 178 7.12 Results on News-Different-ă. .................. 178 7.13 Results on News-Related-3 .................... 179 7.14 Results on News-Similar-S. ................... 179 8.1 A typical filtering system. A filtering system can serve many users, although only one user is shown in the figure. Infor¬ mation can be documents, images, or videos. Without loss of generality, we focus on text documents in this chapter. ... 186 8.2 Illustration of dependencies of variables in the hierarchical model. The rating, y, for a document, re, is conditioned on the document and the user model, wm, associated with the user m. Users share information about their models through the prior, Φ = (μ,Σ)....................... 195 9.1 PNDCU Scores of Indri and CAFÉ for two dampen¬ ing factors (p), and various settings (PRP: Pseudo Rele¬ vance Feedback, F: Feedback, N: Novelty Detection, A: Anti- Redundant Ranking) ....................... 227 9-2 Performance of CAFE and Indri across ctonks. ........ 228 XVI 10.1 (See color insert.) Document as a linear sequence of tokens, some connected to a type hierarchy. Some sample queries and their approximate translation to a semi-structured form are shown ............................... 235 10.2 (See color insert.) The IR4QA system that we describe in this paper ................................ 237 10.3 Summary of % accuracy for UIUC data. (1) SNoW accuracy without the related word dictionary was not reported. With the related-word dictionary, it achieved 91%. ^ SNoW with a related-word dictionary achieved 84.2% but the other algo¬ rithms did not use it. Our results are summarized in the last two rows; see text for details .................. 240 10.4 2- and 3-state transition models ................ 241 10.5 Stanford Parser output example ................ 242 10.6 A multi-resolution tabular view of the question parse showing tag and num attributes in each cell, capital city is the informer span with y = 1......................... 242 10.7 The meta-learning approach .................. 245 10.8 Effect of feature choices ..................... 248 10.9 A significant boost in question classification accuracy is seen when two levels of non-local features are provided to the SVM, compared to just the POS features at the leaf of the parse tree. 249 10.10 Effect of number of CRF states, and comparison with the heuristic baseline (Jaccard accuracy expressed as %)..... 250 10.11 Percent accuracy with linear S VMs, perfect informer spans and various feature encodings. The Coarse column is for the 6 top-level UIUC classes and the fine column is for the 50 second-level classes ....................... 251 10.12 Summary of % accuracy broken down by broad syntactic ques¬ tion types, a: question bigrams, b: perfect informers only, c: heuristic informers only, d: CRP informers only, е -g: bi¬ grams plus perfect, heuristic and CRF informers ....... 252 10.13 (See color insert.) Setting up the proximity scoring problem. 254 10.14 Relative CPU times needed by RankSVM and RankExp as a function of the number of ordering constraints ........ 258 10.15 ßj shows a noisy unimodal pattern. .............. 259 10.16 End-to-end accuracy using RankExp β is significantly better than IR-style ranking. Train and test years are from 1999, 2000, 2001. R300 is recall at к = 300 out of 261 test questions. С = 0.1, G = 1 and С — 10 gave almost identical results. . 259 10.17 Relative sizes of the corpus and varkms indexes for TREC 2000........................... 261 10.18 Highly skewed atype frequencies in TREC query logs. , . . 261 10.19 Log likelihood of validation data against the Lidstone smooth¬ ing parameter L ......»................. 263 XVII 10.20 Pre-generalization and post-filtering .............. 263 10.21 Sizes of the additional indices needed for pre-generalize and post-filter query processing, compared to the usual indices for TREC 2000........................... 265 10.22 Y^aeR corpus С ount(a) is a very good predictor of the size of the atype subset index. (Root atypes are not indexed.) . . . 266 10.23 ¿scan is sufficiently concentrated that replacing the distribution by a constant number is not grossly inaccurate ........ 267 10.24 Like ¿scan, ¿forward is concentrated and can be reasonably re¬ placed by a point estimate ................... 268 10.25 Scatter of observed against estimated query bloat ...... 269 10.26 Histogram of observed-to-estimated bloat ratio for individual queries with a specific R occupying an estimated 145 MB of atype index ............................ 269 10.27 The inputs are atype set A and workload W. The output is a series of trade-offs between index size of R and average query bloat over W ........................... 270 10.28 (See color insert.) Estimated space-time tradeoffs produced by AtypeSubsetChooser. The y-axis uses a log scale. Note that the curve for I = 10~3 (suggested by Figure 10.19) has the lowest average bloat ..................... 272 10.29 Estimated bloat for various values of £ for a specific estimated index size of 145 MB. The y-axis uses a log scale ....... 278 10.30 Estimated and observed space-time tradeoffs produced by AtypeSubsetChooser ..................... 274 10.31 Average time per query (with and without generalization) for various estimated index sizes .................. 275 List of Tables 2.1 Number of news items collected from different outlets ..... 31 2.2 Number of discovered news pairs ................. 33 2.3 Results for outlet identification of a news item ........ 36 2.4 Results for news outlet identification of a news item from the set of news item pairs ...................... 37 2.5 Main topics covered by CNN or Al Jazeera .......... 40 2.6 Number of discovered pairs ................... 41 2.7 Conditional probabilities of a story ............... 41 2.8 Number of news articles covered by all four news outlets ... 42 2.9 ВЕР metric distances ...................... 43 3.1 Accuracy results for WebKB. CC algorithms outperformed their CO counterparts significantly, and LR versions outper¬ formed NB versions significantly. The differences between ICA- NB and GS-NB, and the differences between ICA-LR and GS- LR, are not statistically significant. Both LBP and MF out¬ performed ICA-LR and GS-LR significantly. .......... 62 3.2 Accuracy results for the Cora dataset. CC algorithms outper¬ formed their CO counterparts significantly. LR versions signif¬ icantly outperformed NB versions. ICA-NB outperformed GS- NB for SS and M, the other differences between ICA and GS were not significant (both NB and LR versions). Even though MF outperformed ICA-LR, GS-LR, and LBP, the differences were not statistically significant ................. 63 3.3 Accuracy results for the CiteSeer dataset. CC algorithms sig¬ nificantly outperformed their CO counterparts except for IGA- NB and GS-NB for matched cross-validation. CO and CC al¬ gorithms based on LR outperformed the NB versions, but the differences were not significant. ICA-NB outperformed GS-NB significantly for SS; but, the rest of the differences between LR, versions of ICA and GS, LBP and MF were not significant. . 64 5.1 Eleven of the 197 email authors represented in the term-author- time array X ............................ 103 6.1 Approximations к for a sampling of к and d values ....... 12Ô 6.2 True and estimated parameters for small-mix ......... 139 XX 6.3 Performance of sof t-moVMF on big-mix dataset ......... 140 6.4 Comparative confusion matrices for 3 clusters of Classics (rows represent clusters) ......................... 140 6.5 Comparative confusion matrices for 3 clusters of ClassicSOO. . 141 6.6 Comparative confusion matrices for 3 clusters of Classic400. . 141 6.7 Comparative confusion matrices for 5 clusters of Classics. . . 141 6.8 Performance comparison of algorithms averaged over 5 runs. . 145 6.9 Five of the topics obtained by running batch vMF on slash-7. 146 7.1 Text datasets used in experimental evaluation ......... 175 8.1 The values assigned to relevant and non-relevant documents that the filtering system did and did not deliver. R~, R+, N+, and N~ correspond to the number of documents that fall into the corresponding category. Ar, An, Br. and Β χ correspond to the credit/penalty for each element in the category. . . . 188
any_adam_object	1
author_GND	(DE-588)120408945
building	Verbundindex
bvnumber	BV035605094
callnumber-first	Q - Science
callnumber-label	QA76
callnumber-raw	QA76.9.D343
callnumber-search	QA76.9.D343
callnumber-sort	QA 276.9 D343
callnumber-subject	QA - Mathematics
classification_rvk	ST 302 ST 306 ST 530
ctrlnum	(OCoLC)144226505 (DE-599)GBV595954707
dewey-full	006.3/12
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	006 - Special computer methods
dewey-raw	006.3/12
dewey-search	006.3/12
dewey-sort	16.3 212
dewey-tens	000 - Computer science, information, general works
discipline	Informatik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01712nam a2200433 c 4500</leader><controlfield tag="001">BV035605094</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20160204 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">090707s2009 a\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2009013047</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781420059403</subfield><subfield code="c">hardcover</subfield><subfield code="9">978-1-4200-5940-3</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)144226505</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBV595954707</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-473</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-M382</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.9.D343</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3/12</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text mining</subfield><subfield code="b">classification, clustering, and applications</subfield><subfield code="c">ed. by Ashok Srivastava ...</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton [u.a.]</subfield><subfield code="b">CRC Press</subfield><subfield code="c">2009</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXX, 290 S.</subfield><subfield code="b">Ill., graf. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Data mining and knowledge discovery series</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Data mining / Statistical methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield><subfield code="x">Statistical methods</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4143413-4</subfield><subfield code="a">Aufsatzsammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Srivastava, Ashok K.</subfield><subfield code="e">Sonstige</subfield><subfield code="0">(DE-588)120408945</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017660340&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-017660340</subfield></datafield></record></collection>
genre	(DE-588)4143413-4 Aufsatzsammlung gnd-content
genre_facet	Aufsatzsammlung
id	DE-604.BV035605094
illustrated	Illustrated
indexdate	2024-07-09T21:41:28Z
institution	BVB
isbn	9781420059403
language	English
lccn	2009013047
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-017660340
oclc_num	144226505
open_access_boolean
owner	DE-473 DE-BY-UBG DE-703 DE-20 DE-355 DE-BY-UBR DE-M382
owner_facet	DE-473 DE-BY-UBG DE-703 DE-20 DE-355 DE-BY-UBR DE-M382
physical	XXX, 290 S. Ill., graf. Darst.
publishDate	2009
publishDateSearch	2009
publishDateSort	2009
publisher	CRC Press
record_format	marc
series2	Data mining and knowledge discovery series
spelling	Text mining classification, clustering, and applications ed. by Ashok Srivastava ... Boca Raton [u.a.] CRC Press 2009 XXX, 290 S. Ill., graf. Darst. txt rdacontent n rdamedia nc rdacarrier Data mining and knowledge discovery series Includes bibliographical references and index Data mining / Statistical methods Data mining Statistical methods Text Mining (DE-588)4728093-1 gnd rswk-swf (DE-588)4143413-4 Aufsatzsammlung gnd-content Text Mining (DE-588)4728093-1 s DE-604 Srivastava, Ashok K. Sonstige (DE-588)120408945 oth Digitalisierung UB Bamberg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017660340&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Text mining classification, clustering, and applications Data mining / Statistical methods Data mining Statistical methods Text Mining (DE-588)4728093-1 gnd
subject_GND	(DE-588)4728093-1 (DE-588)4143413-4
title	Text mining classification, clustering, and applications
title_auth	Text mining classification, clustering, and applications
title_exact_search	Text mining classification, clustering, and applications
title_full	Text mining classification, clustering, and applications ed. by Ashok Srivastava ...
title_fullStr	Text mining classification, clustering, and applications ed. by Ashok Srivastava ...
title_full_unstemmed	Text mining classification, clustering, and applications ed. by Ashok Srivastava ...
title_short	Text mining
title_sort	text mining classification clustering and applications
title_sub	classification, clustering, and applications
topic	Data mining / Statistical methods Data mining Statistical methods Text Mining (DE-588)4728093-1 gnd
topic_facet	Data mining / Statistical methods Data mining Statistical methods Text Mining Aufsatzsammlung
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017660340&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT srivastavaashokk textminingclassificationclusteringandapplications

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge