Semi-supervised and unsupervised machine learning: novel strategies
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
London [u.a.]
ISTE [u.a.]
2011
|
Ausgabe: | 1. publ. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references and index |
Beschreibung: | X, 244 S. graph. Darst. |
ISBN: | 9781848212039 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV037238875 | ||
003 | DE-604 | ||
005 | 20120109 | ||
007 | t | ||
008 | 110221s2011 d||| |||| 00||| eng d | ||
020 | |a 9781848212039 |9 978-1-84821-203-9 | ||
035 | |a (OCoLC)700509842 | ||
035 | |a (DE-599)BVBBV037238875 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-11 | ||
082 | 0 | |a 006.3 |2 22 | |
084 | |a ST 302 |0 (DE-625)143652: |2 rvk | ||
100 | 1 | |a Albalate, Amparo |e Verfasser |4 aut | |
245 | 1 | 0 | |a Semi-supervised and unsupervised machine learning |b novel strategies |c Amparo Albalate ; Wolfgang Minker |
250 | |a 1. publ. | ||
264 | 1 | |a London [u.a.] |b ISTE [u.a.] |c 2011 | |
300 | |a X, 244 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Includes bibliographical references and index | ||
650 | 0 | 7 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |D s |
689 | 0 | 1 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | |8 1\p |5 DE-604 | |
689 | 1 | 0 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |D s |
689 | 1 | 1 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 1 | |8 2\p |5 DE-604 | |
700 | 1 | |a Minker, Wolfgang |e Sonstige |4 oth | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021152396&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-021152396 | ||
883 | 1 | |8 1\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk | |
883 | 1 | |8 2\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk |
Datensatz im Suchindex
_version_ | 1804143847898349568 |
---|---|
adam_text | Titel: Semi-supervised and unsupervised machine learning
Autor: Albalate, Amparo
Jahr: 2011
Table of Contents
Part 1. State of the Art ................ 1
Chapter 1. Introduction.................. 3
1.1. Organization of the book.............. 6
1.2. Utterance corpus................... 8
1.3. Datasets from the UCI repository......... 10
1.3.1. Wine dataset (wine)............... 10
1.3.2. Wisconsin breast cancer dataset (breast) . . 11
1.3.3. Handwritten digits dataset (Pendig)..... 11
1.3.4. Pima Indians diabetes (diabetes)....... 12
1.3.5. Iris dataset (Iris)................. 13
1.4. Microarray dataset.................. 13
1.5. Simulated dataseis.................. 14
1.5.1. Mixtures of Gaussians.............. 14
1.5.2. Spatial dataseis with non-homogeneous
inter-cluster distance .............. 14
Chapter 2. State of the Art in Clustering and
Semi-Supervised Techniques .............. 15
2.1. Introduction...................... 15
2.2. Unsupervised machine learning (clustering) . . 15
2.3. Abrief history of cluster analysis......... 16
2.4. Cluster algorithms.................. 19
2.4.1. Hierarchical algorithms............. 19
vi Machine Learning
2.4.1.1. Agglomerative clustering......... 19
2.4.1.2. Divisive algorithms ............ 23
2.4.2. Model-based clustering............. 24
2.4.2.1. The expectation maximization (EM)
algorithm................... 25
2.4.3. Partitional competitive models ........ 30
2.4.3.1. Ji-means................... 30
2.4.3.2. Neural gas.................. 35
2.4.3.3. Partitioning around Medoids (PAM) . . 37
2.4.3.4. Self-organizing maps ........... 39
2.4.4. Density-based clustering............ 45
2.4.4.1. Direct density reachability........ 45
2.4.4.2. Density reachability............ 46
2.4.4.3. Density connection............. 46
2.4.4.4. Border points................ 47
2.4.4.5. Noise points................. 47
2.4.4.6. DBSCAN algorithm............ 47
2.4.5. Graph-based clustering............. 49
2.4.5.1. Pole-based overlapping clustering ... 49
2.4.6. Affectation stage................. 52
2.4.6.1. Advantages and drawbacks ....... 52
2.5. Applications of cluster analysis.......... 52
2.5.1. Image segmentation............... 53
2.5.2. Molecular biology................. 55
2.5.2.1. Biological considerations......... 56
2.5.3. Information retrieval and document
clustering...................... 60
2.5.3.1. Document pre-processing......... 61
2.5.3.2. Boolean model representation...... 63
2.5.3.3. Vector space model............. 64
2.5.3.4. Term weighting............... 65
2.5.3.5. Probabilistic models............ 71
2.5.4. Clustering documents in information
retrieval....................... 76
2.5.4.1. Clustering of presented results..... 76
Table of Contents vii
2.5.4.2. Post-retrieval document browsing
(Scatter-Gather) .............. 76
2.6. Evaluation methods................. 77
2.7. Internal cluster evaluation............. 77
2.7.1. Entropy....................... 78
2.7.2. Purity........................ 78
2.7.3. Normalized mutual information........ 79
2.8. External cluster validation............. 80
2.8.1. Hartigan...................... 80
2.8.2. Davies Bouldin index.............. 81
2.8.3. Krzanowski and Lai index........... 81
2.8.4. Silhouette ..................... 82
2.8.5. Gap statistic.................... 82
2.9. Semi-supervised learning.............. 84
2.9.1. Self training.................... 84
2.9.2. Co-training..................... 85
2.9.3. Generative models................ 86
2.10. Summary ....................... 88
Part 2. Approaches to Semi-Supervised
Classification ....................... 91
Chapter 3. Semi-Supervised Classification Using
Prior Word Clustering ................... 93
3.1. Introduction...................... 93
3.2. Dataset......................... 94
3.3. Utterance classification scheme.......... 94
3.3.1. Pre-processing................... 94
3.3.1.1. Utterance vector representation..... 96
3.3.2. Utterance classification............. 96
3.4. Semi-supervised approach based on term
clustering........................ 98
3.4.1. Term clustering.................. 99
3.4.2. Semantic term dissimilarity.......... 100
3.4.2.1. Term vector of lexical co-occurrences . . 101
3.4.2.2. Metric of dissimilarity........... 102
viii Machine Learning
3.4.3. Term vector truncation............. 104
3.4.4. Term clustering.................. 105
3.4.5. Feature extraction and utterance feature
vector........................ 109
3.4.6. Evaluation..................... 110
3.5. Disambiguation.................... 113
3.5.1. Evaluation..................... 116
3.6. Summary........................ 124
Chapter 4. Semi-Supervised Classification Using
Pattern Clustering...................... 127
4.1. Introduction...................... 127
4.2. New semi-supervised algorithm using the
cluster and label strategy.............. 128
4.2.1. Block diagram................... 128
4.2.1.1. Dataset.................... 129
4.2.1.2. Clustering.................. 130
4.2.1.3. Optimum cluster labeling......... 130
4.2.1.4. Classification................ 131
4.3. Optimum cluster labeling.............. 132
4.3.1. Problem definition................ 132
4.3.2. The Hungarian algorithm............ 134
4.3.2.1. Weighted complete bipartite graph . . . 134
4.3.2.2. Matching, perfect matching and
maximum weight matching....... 135
4.3.2.3. Objective of Hungarian method..... 136
4.3.2.4. Complexity considerations........ 141
4.3.3. Genetic algorithms................ 142
4.3.3.1. Reproduction operators.......... 143
4.3.3.2. Forming the next generation....... 146
4.3.3.3. GAs applied to optimum cluster
labeling.................... 147
4.3.3.4. Comparison of methods.......... 150
4.4. Supervised classification block........... 154
4.4.1. Support vector machines............ 154
Table of Contents ix
4.4.1.1. The kernel trick for nonlinearly
separable classes.............. 156
4.4.1.2. Multi-class classification......... 157
4.4.2. Example ...................... 157
4.5. Datasets ........................ 159
4.5.1. Mixtures of Gaussians.............. 159
4.5.2. Datasets from the UCI repository....... 159
4.5.2.1. Iris dataset (Iris).............. 159
4.5.2.2. Wine dataset (wine)............ 160
4.5.2.3. Wisconsin breast cancer dataset
(breast).................... 160
4.5.2.4. Handwritten digits dataset (Pendig) . . 160
4.5.2.5. Pima Indians diabetes (diabetes) .... 160
4.5.3. Utterance dataset ................ 160
4.6. An analysis of the bounds for the cluster and
label approaches ................... 162
4.7. Extension through cluster pruning........ 164
4.7.1. Determination of silhouette thresholds . . . 166
4.7.2. Evaluation of the cluster pruning approach 171
4.8. Simulations and results............... 173
4.9. Summary........................ 179
Part 3 . Contributions to Unsupervised
Classification - Algorithms to Detect
the Optimal Number of Clusters......... 183
Chapter 5. Detection of the Number of Clusters
through Non-Parametric Clustering Algorithms . 185
5.1. Introduction...................... 185
5.2. New hierarchical pole-based clustering
algorithm........................ 186
5.2.1. Pole-based clustering basis module...... 187
5.2.2. Hierarchical pole-based clustering...... 189
5.3. Evaluation....................... 190
5.3.1. Cluster evaluation metrics........... 191
5.4. Datasets ........................ 192
x Machine Learning
5.4.1. Results....................... 192
5.4.2. Complexity considerations for large
databases...................... 195
5.5. Summary........................ 197
Chapter 6. Detecting the Number of Clusters
through Cluster Validation................ 199
6.1. Introduction...................... 199
6.2. Cluster validation methods............. 201
6.2.1. Dunn index..................... 201
6.2.2. Hartigan...................... 201
6.2.3. Davies Bouldin index.............. 202
6.2.4. Krzanowski and Lai index........... 202
6.2.5. Silhouette ..................... 203
6.2.6. Hubert s 7..................... 204
6.2.7. Gap statistic.................... 205
6.3. Combination approach based on quantiles . . . 206
6.4. Datasets ........................ 212
6.4.1. Mixtures of Gaussians.............. 212
6.4.2. Cancer DNA-microarray dataset....... 213
6.4.3. Iris dataset..................... 214
6.5. Results......................... 214
6.5.1. Validation results of the five Gaussian
dataset....................... 215
6.5.2. Validation results of the mixture of seven
Gaussians..................... 220
6.5.3. Validation results of the NCI60 dataset ... 220
6.5.4. Validation results of the Iris dataset..... 221
6.5.5. Discussion..................... 222
6.6. Application of speech utterances ......... 223
6.7. Summary........................ 224
Bibliography.......................... 227
Index............................... 243
|
any_adam_object | 1 |
author | Albalate, Amparo |
author_facet | Albalate, Amparo |
author_role | aut |
author_sort | Albalate, Amparo |
author_variant | a a aa |
building | Verbundindex |
bvnumber | BV037238875 |
classification_rvk | ST 302 |
ctrlnum | (OCoLC)700509842 (DE-599)BVBBV037238875 |
dewey-full | 006.3 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.3 |
dewey-search | 006.3 |
dewey-sort | 16.3 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
edition | 1. publ. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01902nam a2200457 c 4500</leader><controlfield tag="001">BV037238875</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20120109 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">110221s2011 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781848212039</subfield><subfield code="9">978-1-84821-203-9</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)700509842</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV037238875</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-11</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Albalate, Amparo</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Semi-supervised and unsupervised machine learning</subfield><subfield code="b">novel strategies</subfield><subfield code="c">Amparo Albalate ; Wolfgang Minker</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">London [u.a.]</subfield><subfield code="b">ISTE [u.a.]</subfield><subfield code="c">2011</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">X, 244 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="8">1\p</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="8">2\p</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Minker, Wolfgang</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021152396&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-021152396</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">2\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield></record></collection> |
id | DE-604.BV037238875 |
illustrated | Illustrated |
indexdate | 2024-07-09T22:54:10Z |
institution | BVB |
isbn | 9781848212039 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-021152396 |
oclc_num | 700509842 |
open_access_boolean | |
owner | DE-11 |
owner_facet | DE-11 |
physical | X, 244 S. graph. Darst. |
publishDate | 2011 |
publishDateSearch | 2011 |
publishDateSort | 2011 |
publisher | ISTE [u.a.] |
record_format | marc |
spelling | Albalate, Amparo Verfasser aut Semi-supervised and unsupervised machine learning novel strategies Amparo Albalate ; Wolfgang Minker 1. publ. London [u.a.] ISTE [u.a.] 2011 X, 244 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Includes bibliographical references and index Sprachverarbeitung (DE-588)4116579-2 gnd rswk-swf Maschinelles Lernen (DE-588)4193754-5 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 s Maschinelles Lernen (DE-588)4193754-5 s 1\p DE-604 Sprachverarbeitung (DE-588)4116579-2 s 2\p DE-604 Minker, Wolfgang Sonstige oth HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021152396&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk 2\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk |
spellingShingle | Albalate, Amparo Semi-supervised and unsupervised machine learning novel strategies Sprachverarbeitung (DE-588)4116579-2 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd |
subject_GND | (DE-588)4116579-2 (DE-588)4193754-5 (DE-588)4003961-4 |
title | Semi-supervised and unsupervised machine learning novel strategies |
title_auth | Semi-supervised and unsupervised machine learning novel strategies |
title_exact_search | Semi-supervised and unsupervised machine learning novel strategies |
title_full | Semi-supervised and unsupervised machine learning novel strategies Amparo Albalate ; Wolfgang Minker |
title_fullStr | Semi-supervised and unsupervised machine learning novel strategies Amparo Albalate ; Wolfgang Minker |
title_full_unstemmed | Semi-supervised and unsupervised machine learning novel strategies Amparo Albalate ; Wolfgang Minker |
title_short | Semi-supervised and unsupervised machine learning |
title_sort | semi supervised and unsupervised machine learning novel strategies |
title_sub | novel strategies |
topic | Sprachverarbeitung (DE-588)4116579-2 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd |
topic_facet | Sprachverarbeitung Maschinelles Lernen Automatische Spracherkennung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021152396&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT albalateamparo semisupervisedandunsupervisedmachinelearningnovelstrategies AT minkerwolfgang semisupervisedandunsupervisedmachinelearningnovelstrategies |