Data mining methods for the content analyst: an introduction to the computational analysis of content
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
New York [u.a.]
Routledge
2012
|
Ausgabe: | 1. publ. |
Schriftenreihe: | Routledge communication series
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | X, 102 S. graph. Darst. |
ISBN: | 9780415895132 9780415895149 9780203149386 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV039339490 | ||
003 | DE-604 | ||
005 | 20140723 | ||
007 | t | ||
008 | 110803s2012 d||| |||| 00||| eng d | ||
020 | |a 9780415895132 |9 978-0-415-89513-2 | ||
020 | |a 9780415895149 |9 978-0-415-89514-9 | ||
020 | |a 9780203149386 |9 978-0-203-14938-6 | ||
035 | |a (OCoLC)731925041 | ||
035 | |a (DE-599)BVBBV039339490 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-188 |a DE-634 |a DE-19 |a DE-355 | ||
082 | 0 | |a 006.3/12 |2 23 | |
084 | |a AP 15965 |0 (DE-625)6964: |2 rvk | ||
084 | |a DF 2520 |0 (DE-625)19543:761 |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Leetaru, Kalev |e Verfasser |4 aut | |
245 | 1 | 0 | |a Data mining methods for the content analyst |b an introduction to the computational analysis of content |c Kalev Hannes Leetaru |
250 | |a 1. publ. | ||
264 | 1 | |a New York [u.a.] |b Routledge |c 2012 | |
300 | |a X, 102 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Routledge communication series | |
650 | 4 | |a Data mining | |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024192272&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-024192272 |
Datensatz im Suchindex
_version_ | 1804148026661404672 |
---|---|
adam_text | Titel: Data mining methods for the content analyst
Autor: Leetaru, Kalev
Jahr: 2012
CONTENTS
List of Tables and Figures xi
Acknowledgments xiii
1 Introduction 1
What Is Content Analysis? 1
Why Use Computerized Analysis Techniques? 2
Standalone Tools or Integrated Suites 3
Transitioning from Theory to Practice 5
Chapter in Summary 6
2 Obtaining and Preparing Data 7
Collecting Data from Digital Text Repositories 7
Are the Data Meaningful? 8
Using Data in Unintended Ways 9
Analytical Resolution 10
Types ofData Sources 11
Finding Sources 12
Searching Text Collections 13
Sources of Incompleteness 14
Licensing Restrictions and Content Blackouts 16
Measuring Viewership 17
Accuracy and Convenience Samples 17
Random Samples 18
Multimedia Content 19
Converting to Textual Format 19
viii Contents
Prosody 19
Example Data Sources 20
Pattems in Historical War Coverage 20
Competitive Intelligence 20
Global News Coverage 21
Downloading Content 22
Digital Content 22
Print Content 23
Preparing Content 23
Document Extraction 23
Cleaning 24
Post Filtering 24
Reforming/Reshaping 25
Content Proxy Extraction 25
Chapter in Summary 25
3 Vocabulary Analysis 26
The Basics 26
Word Histograms 26
Readability Indexes 27
Normative Comparison 28
Non-word Analysis 28
Colloquialisms: Abbreviations and Slang 29
Restricting the Analytical Window 29
Vocabulary Comparison and Evolution/Chronemics 30
Advanced Topics 32
Syllables, Rhyming, and Sounds Like 32
Gender and Language 33
Authorship Attribution 33
Word Morphology, Stemming, and Lemmatization 33
Chapter in Summary 34
4 Correlation and Co-occurrence 36
Understanding Correlation 36
Computing Word Correlations 37
Directionality 38
Concordance 39
Co-occurrence and Search 40
Language Variation and Lexicons 40
Non-co-occurrence 41
Contents ix
Correlation with Metadata 41
Chapter in Summary 42
5 Lexicons, Entity Extraction, and Geocoding 43
Lexicons 43
Lexicons and Categorization 44
Lexical Correlation 45
Lexicon Consistency Checks 45
Thesauri and Vocabulary Expanders 47
Named Entity Extraction 48
Lexicons and Processing 48
Applications 49
Geocoding, Gazetteers, and Spatial Analysis 51
Geocoding 51
Gazetteers and the Geocoding Process 52
Operating Under Uncertainty 54
Spatial Analysis 55
Chapter in Summary 56
6 Topic Extraction 57
How Machines Process Text 57
Unstructured Text 58
Extracting Meaningfrom Text 58
Applications ofTopic Extraction 59
Comparing/Clustering Documents 60
Automatic Summarization 60
Automatic Keyword Generation 61
Multilingual Analysis: Topic Extraction with Multiple Languages 62
Chapter in Summary 63
7 Sentiment Analysis 65
Examining Emotions 65
Evolution 65
Evaluation 66
Analytical Resolution: Documents versus Objects 67
Hand-crafted versus Automatically Generated Lexicons 68
Other Sentiment Scales 68
x Contents
Limitations 69
Measuring Language Rather Than Worldview 69
Chapter in Summary 70
8 Similarity, Categorization and Clustering 71
Categorization 71
The Vector Space Model 72
Feature Selection 72
Feature Reduction 73
Leaming Algorithm 74
Evaluating ATC Results 75
Benefits ofATC over Human Categorization 11
Limitations ofATC 78
Applications ofATC 80
Clustering 80
Automated Clustering 81
Hierarchical Clustering 82
Partitional Clustering 82
Document Similarity 83
Vector Space Model 84
Contingency Tables 84
Chapter in Summary 85
9 Network Analysis 86
Understanding Network Analysis 86
Network Content Analysis 87
Representing Network Data 88
Constructing the Network 89
Network Structure 89
The Triad Census 91
Network Evolution 91
Visualization and Clustering 92
Chapter in Summary 96
References 97
Index 100
|
any_adam_object | 1 |
author | Leetaru, Kalev |
author_facet | Leetaru, Kalev |
author_role | aut |
author_sort | Leetaru, Kalev |
author_variant | k l kl |
building | Verbundindex |
bvnumber | BV039339490 |
classification_rvk | AP 15965 DF 2520 ST 530 |
ctrlnum | (OCoLC)731925041 (DE-599)BVBBV039339490 |
dewey-full | 006.3/12 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.3/12 |
dewey-search | 006.3/12 |
dewey-sort | 16.3 212 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Allgemeines Pädagogik Informatik |
edition | 1. publ. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01553nam a2200409 c 4500</leader><controlfield tag="001">BV039339490</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20140723 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">110803s2012 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780415895132</subfield><subfield code="9">978-0-415-89513-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780415895149</subfield><subfield code="9">978-0-415-89514-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780203149386</subfield><subfield code="9">978-0-203-14938-6</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)731925041</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV039339490</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-188</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-19</subfield><subfield code="a">DE-355</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3/12</subfield><subfield code="2">23</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">AP 15965</subfield><subfield code="0">(DE-625)6964:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DF 2520</subfield><subfield code="0">(DE-625)19543:761</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Leetaru, Kalev</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data mining methods for the content analyst</subfield><subfield code="b">an introduction to the computational analysis of content</subfield><subfield code="c">Kalev Hannes Leetaru</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">New York [u.a.]</subfield><subfield code="b">Routledge</subfield><subfield code="c">2012</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">X, 102 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Routledge communication series</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024192272&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-024192272</subfield></datafield></record></collection> |
id | DE-604.BV039339490 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:00:35Z |
institution | BVB |
isbn | 9780415895132 9780415895149 9780203149386 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-024192272 |
oclc_num | 731925041 |
open_access_boolean | |
owner | DE-188 DE-634 DE-19 DE-BY-UBM DE-355 DE-BY-UBR |
owner_facet | DE-188 DE-634 DE-19 DE-BY-UBM DE-355 DE-BY-UBR |
physical | X, 102 S. graph. Darst. |
publishDate | 2012 |
publishDateSearch | 2012 |
publishDateSort | 2012 |
publisher | Routledge |
record_format | marc |
series2 | Routledge communication series |
spelling | Leetaru, Kalev Verfasser aut Data mining methods for the content analyst an introduction to the computational analysis of content Kalev Hannes Leetaru 1. publ. New York [u.a.] Routledge 2012 X, 102 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Routledge communication series Data mining Data Mining (DE-588)4428654-5 gnd rswk-swf Data Mining (DE-588)4428654-5 s DE-604 HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024192272&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Leetaru, Kalev Data mining methods for the content analyst an introduction to the computational analysis of content Data mining Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4428654-5 |
title | Data mining methods for the content analyst an introduction to the computational analysis of content |
title_auth | Data mining methods for the content analyst an introduction to the computational analysis of content |
title_exact_search | Data mining methods for the content analyst an introduction to the computational analysis of content |
title_full | Data mining methods for the content analyst an introduction to the computational analysis of content Kalev Hannes Leetaru |
title_fullStr | Data mining methods for the content analyst an introduction to the computational analysis of content Kalev Hannes Leetaru |
title_full_unstemmed | Data mining methods for the content analyst an introduction to the computational analysis of content Kalev Hannes Leetaru |
title_short | Data mining methods for the content analyst |
title_sort | data mining methods for the content analyst an introduction to the computational analysis of content |
title_sub | an introduction to the computational analysis of content |
topic | Data mining Data Mining (DE-588)4428654-5 gnd |
topic_facet | Data mining Data Mining |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024192272&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT leetarukalev dataminingmethodsforthecontentanalystanintroductiontothecomputationalanalysisofcontent |