Statistical data cleaning with applications in R:
10.11.1 Formal Description -- 10.11.2 Application to Imputed Data -- 10.11.3 Adjusting Imputed Values with the rspa Package -- Chapter 11 Example: A Small Data-Cleaning System -- 11.1 Setup -- 11.1.1 Deterministic Methods -- 11.1.2 Error Localization -- 11.1.3 Imputation -- 11.1.4 Adjusting Imputed...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Hoboken, NJ
John Wiley & Sons
2018
|
Schlagworte: | |
Online-Zugang: | DE-473 Volltext |
Zusammenfassung: | 10.11.1 Formal Description -- 10.11.2 Application to Imputed Data -- 10.11.3 Adjusting Imputed Values with the rspa Package -- Chapter 11 Example: A Small Data-Cleaning System -- 11.1 Setup -- 11.1.1 Deterministic Methods -- 11.1.2 Error Localization -- 11.1.3 Imputation -- 11.1.4 Adjusting Imputed Data -- 11.2 Monitoring Changes in Data -- 11.2.1 Data Diff (Daff) -- 11.2.2 Summarizing Cell Changes -- 11.2.3 Summarizing Changes in Conformance to Validation Rules -- 11.2.4 Track Changes in Data Automatically with lumberjack -- 11.3 Integration and Automation -- 11.3.1 Using RScript -- 11.3.2 The docopt Package -- 11.3.3 Automated Data Cleaning -- References -- Index -- EULA. 3.4 Notes on Locale Settings -- Chapter 4 Data Structure -- 4.1 Introduction -- 4.2 Tabular Data -- 4.2.1 data.frame -- 4.2.2 Databases -- 4.2.3 dplyr -- 4.3 Matrix Data -- 4.4 Time Series -- 4.5 Graph Data -- 4.6 Web Data -- 4.6.1 Web Scraping -- 4.6.2 Web API -- 4.7 Other Data -- 4.8 Tidying Tabular Data -- 4.8.1 Variable Per Column -- 4.8.2 Single Observation Stored in Multiple Tables -- Chapter 5 Cleaning Text Data -- 5.1 Character Normalization -- 5.1.1 Encoding Conversion and Unicode Normalization -- 5.1.2 Character Conversion and Transliteration -- 5.2 Pattern Matching with Regular Expressions -- 5.2.1 Basic Regular Expressions -- 5.2.2 Practical Regular Expressions -- 5.2.3 Generating Regular Expressions in R -- 5.3 Common String Processing Tasks in R -- 5.4 Approximate Text Matching -- 5.4.1 String Metrics -- 5.4.2 String Metrics and Approximate Text Matching in R -- Chapter 6 Data Validation -- 6.1 Introduction -- 6.2 A First Look at the validate Package -- 6.2.1 Quick Checks with check_that -- 6.2.2 The Basic Workflow: validator and confront -- 6.2.3 A Little Background on validate and DSLs -- 6.3 Defining Data Validation -- 6.3.1 Formal Definition of Data Validation -- 6.3.2 Operations on Validation Functions -- 6.3.3 Validation and Missing Values -- 6.3.4 Structure of Validation Functions -- 6.3.5 Demarcating Validation Rules in validate -- 6.4 A Formal Typology of Data Validation Functions -- 6.4.1 A Closer Look at Measurement -- 6.4.2 Classification of Validation Rules -- 6.5 Validating Data with the validate Package -- 6.5.1 Validation Rules in the Console and the validator Object -- 6.5.2 Validating in the Pipeline -- 6.5.3 Raising Errors or Warnings -- 6.5.4 Tolerance for Testing Linear Equalities -- 6.5.5 Setting and Resetting Options -- 6.5.6 Importing and Exporting Validation Rules from and to File. 6.5.7 Checking Variable Types and Metadata -- 6.5.8 Checking Value Ranges and Code Lists -- 6.5.9 Checking In-Record Consistency Rules -- 6.5.10 Checking Cross-Record Validation Rules -- 6.5.11 Checking Functional Dependencies -- 6.5.12 Cross-Dataset Validation -- 6.5.13 Macros, Variable Groups, Keys -- 6.5.14 Analyzing Output: validation Objects -- 6.5.15 Output Dimensionality and Output Selection -- 6.5.15 Exercises for Section -- Chapter 7 Localizing Errors in Data Records -- 7.1 Error Localization -- 7.2 Error Localization with R -- 7.2.1 The Errorlocate Package -- 7.3 Error Localization as MIP-Problem -- 7.3.1 Error Localization and Mixed-Integer Programming -- 7.3.2 Linear Restrictions -- 7.3.3 Categorical Restrictions -- 7.3.4 Mixed-Type Restrictions -- 7.4 Numerical Stability Issues -- 7.4.1 A Short Overview of MIP Solving -- 7.4.2 Scaling Numerical Records -- 7.4.3 Setting Numerical Threshold Values -- 7.5 Practical Issues -- 7.5.1 Setting Reliability Weights -- 7.5.2 Simplifying Conditional Validation Rules -- 7.6 Conclusion -- Chapter 8 Rule Set Maintenance and Simplification -- 8.1 Quality of Validation Rules -- 8.1.1 Completeness -- 8.1.2 Superfluous Rules and Infeasibility -- 8.2 Rules in the Language of Logic -- 8.2.1 Using Logic to Rewrite Rules -- 8.3 Rule Set Issues -- 8.3.1 Infeasible Rule Set -- 8.3.2 Fixed Value -- 8.3.3 Redundant Rule -- 8.3.4 Nonrelaxing Clause -- 8.3.5 Nonconstraining Clause -- 8.4 Detection and Simplification Procedure -- 8.4.1 Mixed-Integer Programming -- 8.4.2 Detecting Feasibility -- 8.4.3 Finding Rules Causing Infeasibility -- 8.4.4 Detecting Conflicting Rules -- 8.4.5 Detect Partial Infeasibility -- 8.4.6 Detect Fixed Values -- 8.4.7 Detect Nonrelaxing Clauses -- 8.4.8 Detect Nonconstraining Clauses -- 8.4.9 Detect Redundant Rules -- 8.5 Conclusion. |
Beschreibung: | 1 Online-Ressource (xiii, 300 Seiten) Illustrationen |
ISBN: | 9781118897140 9781118897126 |
Internformat
MARC
LEADER | 00000nmm a2200000 c 4500 | ||
---|---|---|---|
001 | BV049479308 | ||
003 | DE-604 | ||
005 | 20240715 | ||
007 | cr|uuu---uuuuu | ||
008 | 231222s2018 |||| o||u| ||||||eng d | ||
020 | |a 9781118897140 |c PDF |9 978-1-118-89714-0 | ||
020 | |a 9781118897126 |c eBook |9 978-1-118-89712-6 | ||
024 | 7 | |a 10.1002/9781118897126 |2 doi | |
035 | |a (ZDB-35-WIC)on1019853884 | ||
035 | |a (OCoLC)1446253662 | ||
035 | |a (DE-599)BSZ507539184 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-473 | ||
084 | |a SK 850 |0 (DE-625)143263: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a 31.73 |2 bkl | ||
100 | 1 | |a Loo, Mark van der |d 1976- |e Verfasser |0 (DE-588)1155844939 |4 aut | |
245 | 1 | 0 | |a Statistical data cleaning with applications in R |c Mark van der Loo, Statistics Netherlands, the Netherlands, Edwin de Jonge, Statistics Netherlands, the Netherlands |
264 | 1 | |a Hoboken, NJ |b John Wiley & Sons |c 2018 | |
300 | |a 1 Online-Ressource (xiii, 300 Seiten) |b Illustrationen | ||
336 | |b txt |2 rdacontent | ||
337 | |b c |2 rdamedia | ||
338 | |b cr |2 rdacarrier | ||
520 | 3 | |a 10.11.1 Formal Description -- 10.11.2 Application to Imputed Data -- 10.11.3 Adjusting Imputed Values with the rspa Package -- Chapter 11 Example: A Small Data-Cleaning System -- 11.1 Setup -- 11.1.1 Deterministic Methods -- 11.1.2 Error Localization -- 11.1.3 Imputation -- 11.1.4 Adjusting Imputed Data -- 11.2 Monitoring Changes in Data -- 11.2.1 Data Diff (Daff) -- 11.2.2 Summarizing Cell Changes -- 11.2.3 Summarizing Changes in Conformance to Validation Rules -- 11.2.4 Track Changes in Data Automatically with lumberjack -- 11.3 Integration and Automation -- 11.3.1 Using RScript -- 11.3.2 The docopt Package -- 11.3.3 Automated Data Cleaning -- References -- Index -- EULA. | |
520 | 3 | |a 3.4 Notes on Locale Settings -- Chapter 4 Data Structure -- 4.1 Introduction -- 4.2 Tabular Data -- 4.2.1 data.frame -- 4.2.2 Databases -- 4.2.3 dplyr -- 4.3 Matrix Data -- 4.4 Time Series -- 4.5 Graph Data -- 4.6 Web Data -- 4.6.1 Web Scraping -- 4.6.2 Web API -- 4.7 Other Data -- 4.8 Tidying Tabular Data -- 4.8.1 Variable Per Column -- 4.8.2 Single Observation Stored in Multiple Tables -- Chapter 5 Cleaning Text Data -- 5.1 Character Normalization -- 5.1.1 Encoding Conversion and Unicode Normalization -- 5.1.2 Character Conversion and Transliteration -- 5.2 Pattern Matching with Regular Expressions -- 5.2.1 Basic Regular Expressions -- 5.2.2 Practical Regular Expressions -- 5.2.3 Generating Regular Expressions in R -- 5.3 Common String Processing Tasks in R -- 5.4 Approximate Text Matching -- 5.4.1 String Metrics -- 5.4.2 String Metrics and Approximate Text Matching in R -- Chapter 6 Data Validation -- 6.1 Introduction -- 6.2 A First Look at the validate Package -- 6.2.1 Quick Checks with check_that -- 6.2.2 The Basic Workflow: validator and confront -- 6.2.3 A Little Background on validate and DSLs -- 6.3 Defining Data Validation -- 6.3.1 Formal Definition of Data Validation -- 6.3.2 Operations on Validation Functions -- 6.3.3 Validation and Missing Values -- 6.3.4 Structure of Validation Functions -- 6.3.5 Demarcating Validation Rules in validate -- 6.4 A Formal Typology of Data Validation Functions -- 6.4.1 A Closer Look at Measurement -- 6.4.2 Classification of Validation Rules -- 6.5 Validating Data with the validate Package -- 6.5.1 Validation Rules in the Console and the validator Object -- 6.5.2 Validating in the Pipeline -- 6.5.3 Raising Errors or Warnings -- 6.5.4 Tolerance for Testing Linear Equalities -- 6.5.5 Setting and Resetting Options -- 6.5.6 Importing and Exporting Validation Rules from and to File. | |
520 | 3 | |a 6.5.7 Checking Variable Types and Metadata -- 6.5.8 Checking Value Ranges and Code Lists -- 6.5.9 Checking In-Record Consistency Rules -- 6.5.10 Checking Cross-Record Validation Rules -- 6.5.11 Checking Functional Dependencies -- 6.5.12 Cross-Dataset Validation -- 6.5.13 Macros, Variable Groups, Keys -- 6.5.14 Analyzing Output: validation Objects -- 6.5.15 Output Dimensionality and Output Selection -- 6.5.15 Exercises for Section -- Chapter 7 Localizing Errors in Data Records -- 7.1 Error Localization -- 7.2 Error Localization with R -- 7.2.1 The Errorlocate Package -- 7.3 Error Localization as MIP-Problem -- 7.3.1 Error Localization and Mixed-Integer Programming -- 7.3.2 Linear Restrictions -- 7.3.3 Categorical Restrictions -- 7.3.4 Mixed-Type Restrictions -- 7.4 Numerical Stability Issues -- 7.4.1 A Short Overview of MIP Solving -- 7.4.2 Scaling Numerical Records -- 7.4.3 Setting Numerical Threshold Values -- 7.5 Practical Issues -- 7.5.1 Setting Reliability Weights -- 7.5.2 Simplifying Conditional Validation Rules -- 7.6 Conclusion -- Chapter 8 Rule Set Maintenance and Simplification -- 8.1 Quality of Validation Rules -- 8.1.1 Completeness -- 8.1.2 Superfluous Rules and Infeasibility -- 8.2 Rules in the Language of Logic -- 8.2.1 Using Logic to Rewrite Rules -- 8.3 Rule Set Issues -- 8.3.1 Infeasible Rule Set -- 8.3.2 Fixed Value -- 8.3.3 Redundant Rule -- 8.3.4 Nonrelaxing Clause -- 8.3.5 Nonconstraining Clause -- 8.4 Detection and Simplification Procedure -- 8.4.1 Mixed-Integer Programming -- 8.4.2 Detecting Feasibility -- 8.4.3 Finding Rules Causing Infeasibility -- 8.4.4 Detecting Conflicting Rules -- 8.4.5 Detect Partial Infeasibility -- 8.4.6 Detect Fixed Values -- 8.4.7 Detect Nonrelaxing Clauses -- 8.4.8 Detect Nonconstraining Clauses -- 8.4.9 Detect Redundant Rules -- 8.5 Conclusion. | |
650 | 0 | 7 | |a R |g Programm |0 (DE-588)4705956-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenverarbeitung |0 (DE-588)4011152-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Statistik |0 (DE-588)4056995-0 |2 gnd |9 rswk-swf |
653 | 0 | |a Statistics / Data processing | |
653 | 0 | |a R (Computer program language) | |
653 | 0 | |a MATHEMATICS ; Applied | |
653 | 0 | |a MATHEMATICS ; Probability & Statistics ; General | |
653 | 0 | |a R (Computer program language) | |
653 | 0 | |a Statistics ; Data processing | |
689 | 0 | 0 | |a Statistik |0 (DE-588)4056995-0 |D s |
689 | 0 | 1 | |a Datenverarbeitung |0 (DE-588)4011152-0 |D s |
689 | 0 | 2 | |a R |g Programm |0 (DE-588)4705956-4 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Jonge, Edwin de |d 1972- |e Verfasser |0 (DE-588)1059409615 |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Druck-Ausgabe |z 9781118897157 |
856 | 4 | 0 | |u https://onlinelibrary.wiley.com/doi/book/10.1002/9781118897126 |x Verlag |z URL des Erstveröffentlichers |3 Volltext |
912 | |a ZDB-35-WIC | ||
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-034824776 | |
966 | e | |u https://onlinelibrary.wiley.com/doi/book/10.1002/9781118897126 |l DE-473 |p ZDB-35-WIC |q UBG_PDA_WIC_Kauf23 |x Verlag |3 Volltext |
Datensatz im Suchindex
_version_ | 1806142296537694208 |
---|---|
adam_text | |
adam_txt | |
any_adam_object | |
any_adam_object_boolean | |
author | Loo, Mark van der 1976- Jonge, Edwin de 1972- |
author_GND | (DE-588)1155844939 (DE-588)1059409615 |
author_facet | Loo, Mark van der 1976- Jonge, Edwin de 1972- |
author_role | aut aut |
author_sort | Loo, Mark van der 1976- |
author_variant | m v d l mvd mvdl e d j ed edj |
building | Verbundindex |
bvnumber | BV049479308 |
classification_rvk | SK 850 ST 530 |
collection | ZDB-35-WIC |
ctrlnum | (ZDB-35-WIC)on1019853884 (OCoLC)1446253662 (DE-599)BSZ507539184 |
discipline | Informatik Mathematik |
discipline_str_mv | Informatik Mathematik |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nmm a2200000 c 4500</leader><controlfield tag="001">BV049479308</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20240715</controlfield><controlfield tag="007">cr|uuu---uuuuu</controlfield><controlfield tag="008">231222s2018 |||| o||u| ||||||eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781118897140</subfield><subfield code="c">PDF</subfield><subfield code="9">978-1-118-89714-0</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781118897126</subfield><subfield code="c">eBook</subfield><subfield code="9">978-1-118-89712-6</subfield></datafield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1002/9781118897126</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ZDB-35-WIC)on1019853884</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1446253662</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BSZ507539184</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-473</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 850</subfield><subfield code="0">(DE-625)143263:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">31.73</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Loo, Mark van der</subfield><subfield code="d">1976-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1155844939</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Statistical data cleaning with applications in R</subfield><subfield code="c">Mark van der Loo, Statistics Netherlands, the Netherlands, Edwin de Jonge, Statistics Netherlands, the Netherlands</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Hoboken, NJ</subfield><subfield code="b">John Wiley & Sons</subfield><subfield code="c">2018</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Ressource (xiii, 300 Seiten)</subfield><subfield code="b">Illustrationen</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">10.11.1 Formal Description -- 10.11.2 Application to Imputed Data -- 10.11.3 Adjusting Imputed Values with the rspa Package -- Chapter 11 Example: A Small Data-Cleaning System -- 11.1 Setup -- 11.1.1 Deterministic Methods -- 11.1.2 Error Localization -- 11.1.3 Imputation -- 11.1.4 Adjusting Imputed Data -- 11.2 Monitoring Changes in Data -- 11.2.1 Data Diff (Daff) -- 11.2.2 Summarizing Cell Changes -- 11.2.3 Summarizing Changes in Conformance to Validation Rules -- 11.2.4 Track Changes in Data Automatically with lumberjack -- 11.3 Integration and Automation -- 11.3.1 Using RScript -- 11.3.2 The docopt Package -- 11.3.3 Automated Data Cleaning -- References -- Index -- EULA.</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">3.4 Notes on Locale Settings -- Chapter 4 Data Structure -- 4.1 Introduction -- 4.2 Tabular Data -- 4.2.1 data.frame -- 4.2.2 Databases -- 4.2.3 dplyr -- 4.3 Matrix Data -- 4.4 Time Series -- 4.5 Graph Data -- 4.6 Web Data -- 4.6.1 Web Scraping -- 4.6.2 Web API -- 4.7 Other Data -- 4.8 Tidying Tabular Data -- 4.8.1 Variable Per Column -- 4.8.2 Single Observation Stored in Multiple Tables -- Chapter 5 Cleaning Text Data -- 5.1 Character Normalization -- 5.1.1 Encoding Conversion and Unicode Normalization -- 5.1.2 Character Conversion and Transliteration -- 5.2 Pattern Matching with Regular Expressions -- 5.2.1 Basic Regular Expressions -- 5.2.2 Practical Regular Expressions -- 5.2.3 Generating Regular Expressions in R -- 5.3 Common String Processing Tasks in R -- 5.4 Approximate Text Matching -- 5.4.1 String Metrics -- 5.4.2 String Metrics and Approximate Text Matching in R -- Chapter 6 Data Validation -- 6.1 Introduction -- 6.2 A First Look at the validate Package -- 6.2.1 Quick Checks with check_that -- 6.2.2 The Basic Workflow: validator and confront -- 6.2.3 A Little Background on validate and DSLs -- 6.3 Defining Data Validation -- 6.3.1 Formal Definition of Data Validation -- 6.3.2 Operations on Validation Functions -- 6.3.3 Validation and Missing Values -- 6.3.4 Structure of Validation Functions -- 6.3.5 Demarcating Validation Rules in validate -- 6.4 A Formal Typology of Data Validation Functions -- 6.4.1 A Closer Look at Measurement -- 6.4.2 Classification of Validation Rules -- 6.5 Validating Data with the validate Package -- 6.5.1 Validation Rules in the Console and the validator Object -- 6.5.2 Validating in the Pipeline -- 6.5.3 Raising Errors or Warnings -- 6.5.4 Tolerance for Testing Linear Equalities -- 6.5.5 Setting and Resetting Options -- 6.5.6 Importing and Exporting Validation Rules from and to File.</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">6.5.7 Checking Variable Types and Metadata -- 6.5.8 Checking Value Ranges and Code Lists -- 6.5.9 Checking In-Record Consistency Rules -- 6.5.10 Checking Cross-Record Validation Rules -- 6.5.11 Checking Functional Dependencies -- 6.5.12 Cross-Dataset Validation -- 6.5.13 Macros, Variable Groups, Keys -- 6.5.14 Analyzing Output: validation Objects -- 6.5.15 Output Dimensionality and Output Selection -- 6.5.15 Exercises for Section -- Chapter 7 Localizing Errors in Data Records -- 7.1 Error Localization -- 7.2 Error Localization with R -- 7.2.1 The Errorlocate Package -- 7.3 Error Localization as MIP-Problem -- 7.3.1 Error Localization and Mixed-Integer Programming -- 7.3.2 Linear Restrictions -- 7.3.3 Categorical Restrictions -- 7.3.4 Mixed-Type Restrictions -- 7.4 Numerical Stability Issues -- 7.4.1 A Short Overview of MIP Solving -- 7.4.2 Scaling Numerical Records -- 7.4.3 Setting Numerical Threshold Values -- 7.5 Practical Issues -- 7.5.1 Setting Reliability Weights -- 7.5.2 Simplifying Conditional Validation Rules -- 7.6 Conclusion -- Chapter 8 Rule Set Maintenance and Simplification -- 8.1 Quality of Validation Rules -- 8.1.1 Completeness -- 8.1.2 Superfluous Rules and Infeasibility -- 8.2 Rules in the Language of Logic -- 8.2.1 Using Logic to Rewrite Rules -- 8.3 Rule Set Issues -- 8.3.1 Infeasible Rule Set -- 8.3.2 Fixed Value -- 8.3.3 Redundant Rule -- 8.3.4 Nonrelaxing Clause -- 8.3.5 Nonconstraining Clause -- 8.4 Detection and Simplification Procedure -- 8.4.1 Mixed-Integer Programming -- 8.4.2 Detecting Feasibility -- 8.4.3 Finding Rules Causing Infeasibility -- 8.4.4 Detecting Conflicting Rules -- 8.4.5 Detect Partial Infeasibility -- 8.4.6 Detect Fixed Values -- 8.4.7 Detect Nonrelaxing Clauses -- 8.4.8 Detect Nonconstraining Clauses -- 8.4.9 Detect Redundant Rules -- 8.5 Conclusion.</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenverarbeitung</subfield><subfield code="0">(DE-588)4011152-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Statistics / Data processing</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">R (Computer program language)</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">MATHEMATICS ; Applied</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">MATHEMATICS ; Probability & Statistics ; General</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">R (Computer program language)</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Statistics ; Data processing</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Datenverarbeitung</subfield><subfield code="0">(DE-588)4011152-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Jonge, Edwin de</subfield><subfield code="d">1972-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1059409615</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Druck-Ausgabe</subfield><subfield code="z">9781118897157</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://onlinelibrary.wiley.com/doi/book/10.1002/9781118897126</subfield><subfield code="x">Verlag</subfield><subfield code="z">URL des Erstveröffentlichers</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-35-WIC</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034824776</subfield></datafield><datafield tag="966" ind1="e" ind2=" "><subfield code="u">https://onlinelibrary.wiley.com/doi/book/10.1002/9781118897126</subfield><subfield code="l">DE-473</subfield><subfield code="p">ZDB-35-WIC</subfield><subfield code="q">UBG_PDA_WIC_Kauf23</subfield><subfield code="x">Verlag</subfield><subfield code="3">Volltext</subfield></datafield></record></collection> |
id | DE-604.BV049479308 |
illustrated | Not Illustrated |
index_date | 2024-07-03T23:18:01Z |
indexdate | 2024-08-01T00:18:39Z |
institution | BVB |
isbn | 9781118897140 9781118897126 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-034824776 |
oclc_num | 1019853884 1446253662 |
open_access_boolean | |
owner | DE-473 DE-BY-UBG |
owner_facet | DE-473 DE-BY-UBG |
physical | 1 Online-Ressource (xiii, 300 Seiten) Illustrationen |
psigel | ZDB-35-WIC ZDB-35-WIC UBG_PDA_WIC_Kauf23 |
publishDate | 2018 |
publishDateSearch | 2018 |
publishDateSort | 2018 |
publisher | John Wiley & Sons |
record_format | marc |
spelling | Loo, Mark van der 1976- Verfasser (DE-588)1155844939 aut Statistical data cleaning with applications in R Mark van der Loo, Statistics Netherlands, the Netherlands, Edwin de Jonge, Statistics Netherlands, the Netherlands Hoboken, NJ John Wiley & Sons 2018 1 Online-Ressource (xiii, 300 Seiten) Illustrationen txt rdacontent c rdamedia cr rdacarrier 10.11.1 Formal Description -- 10.11.2 Application to Imputed Data -- 10.11.3 Adjusting Imputed Values with the rspa Package -- Chapter 11 Example: A Small Data-Cleaning System -- 11.1 Setup -- 11.1.1 Deterministic Methods -- 11.1.2 Error Localization -- 11.1.3 Imputation -- 11.1.4 Adjusting Imputed Data -- 11.2 Monitoring Changes in Data -- 11.2.1 Data Diff (Daff) -- 11.2.2 Summarizing Cell Changes -- 11.2.3 Summarizing Changes in Conformance to Validation Rules -- 11.2.4 Track Changes in Data Automatically with lumberjack -- 11.3 Integration and Automation -- 11.3.1 Using RScript -- 11.3.2 The docopt Package -- 11.3.3 Automated Data Cleaning -- References -- Index -- EULA. 3.4 Notes on Locale Settings -- Chapter 4 Data Structure -- 4.1 Introduction -- 4.2 Tabular Data -- 4.2.1 data.frame -- 4.2.2 Databases -- 4.2.3 dplyr -- 4.3 Matrix Data -- 4.4 Time Series -- 4.5 Graph Data -- 4.6 Web Data -- 4.6.1 Web Scraping -- 4.6.2 Web API -- 4.7 Other Data -- 4.8 Tidying Tabular Data -- 4.8.1 Variable Per Column -- 4.8.2 Single Observation Stored in Multiple Tables -- Chapter 5 Cleaning Text Data -- 5.1 Character Normalization -- 5.1.1 Encoding Conversion and Unicode Normalization -- 5.1.2 Character Conversion and Transliteration -- 5.2 Pattern Matching with Regular Expressions -- 5.2.1 Basic Regular Expressions -- 5.2.2 Practical Regular Expressions -- 5.2.3 Generating Regular Expressions in R -- 5.3 Common String Processing Tasks in R -- 5.4 Approximate Text Matching -- 5.4.1 String Metrics -- 5.4.2 String Metrics and Approximate Text Matching in R -- Chapter 6 Data Validation -- 6.1 Introduction -- 6.2 A First Look at the validate Package -- 6.2.1 Quick Checks with check_that -- 6.2.2 The Basic Workflow: validator and confront -- 6.2.3 A Little Background on validate and DSLs -- 6.3 Defining Data Validation -- 6.3.1 Formal Definition of Data Validation -- 6.3.2 Operations on Validation Functions -- 6.3.3 Validation and Missing Values -- 6.3.4 Structure of Validation Functions -- 6.3.5 Demarcating Validation Rules in validate -- 6.4 A Formal Typology of Data Validation Functions -- 6.4.1 A Closer Look at Measurement -- 6.4.2 Classification of Validation Rules -- 6.5 Validating Data with the validate Package -- 6.5.1 Validation Rules in the Console and the validator Object -- 6.5.2 Validating in the Pipeline -- 6.5.3 Raising Errors or Warnings -- 6.5.4 Tolerance for Testing Linear Equalities -- 6.5.5 Setting and Resetting Options -- 6.5.6 Importing and Exporting Validation Rules from and to File. 6.5.7 Checking Variable Types and Metadata -- 6.5.8 Checking Value Ranges and Code Lists -- 6.5.9 Checking In-Record Consistency Rules -- 6.5.10 Checking Cross-Record Validation Rules -- 6.5.11 Checking Functional Dependencies -- 6.5.12 Cross-Dataset Validation -- 6.5.13 Macros, Variable Groups, Keys -- 6.5.14 Analyzing Output: validation Objects -- 6.5.15 Output Dimensionality and Output Selection -- 6.5.15 Exercises for Section -- Chapter 7 Localizing Errors in Data Records -- 7.1 Error Localization -- 7.2 Error Localization with R -- 7.2.1 The Errorlocate Package -- 7.3 Error Localization as MIP-Problem -- 7.3.1 Error Localization and Mixed-Integer Programming -- 7.3.2 Linear Restrictions -- 7.3.3 Categorical Restrictions -- 7.3.4 Mixed-Type Restrictions -- 7.4 Numerical Stability Issues -- 7.4.1 A Short Overview of MIP Solving -- 7.4.2 Scaling Numerical Records -- 7.4.3 Setting Numerical Threshold Values -- 7.5 Practical Issues -- 7.5.1 Setting Reliability Weights -- 7.5.2 Simplifying Conditional Validation Rules -- 7.6 Conclusion -- Chapter 8 Rule Set Maintenance and Simplification -- 8.1 Quality of Validation Rules -- 8.1.1 Completeness -- 8.1.2 Superfluous Rules and Infeasibility -- 8.2 Rules in the Language of Logic -- 8.2.1 Using Logic to Rewrite Rules -- 8.3 Rule Set Issues -- 8.3.1 Infeasible Rule Set -- 8.3.2 Fixed Value -- 8.3.3 Redundant Rule -- 8.3.4 Nonrelaxing Clause -- 8.3.5 Nonconstraining Clause -- 8.4 Detection and Simplification Procedure -- 8.4.1 Mixed-Integer Programming -- 8.4.2 Detecting Feasibility -- 8.4.3 Finding Rules Causing Infeasibility -- 8.4.4 Detecting Conflicting Rules -- 8.4.5 Detect Partial Infeasibility -- 8.4.6 Detect Fixed Values -- 8.4.7 Detect Nonrelaxing Clauses -- 8.4.8 Detect Nonconstraining Clauses -- 8.4.9 Detect Redundant Rules -- 8.5 Conclusion. R Programm (DE-588)4705956-4 gnd rswk-swf Datenverarbeitung (DE-588)4011152-0 gnd rswk-swf Statistik (DE-588)4056995-0 gnd rswk-swf Statistics / Data processing R (Computer program language) MATHEMATICS ; Applied MATHEMATICS ; Probability & Statistics ; General Statistics ; Data processing Statistik (DE-588)4056995-0 s Datenverarbeitung (DE-588)4011152-0 s R Programm (DE-588)4705956-4 s DE-604 Jonge, Edwin de 1972- Verfasser (DE-588)1059409615 aut Erscheint auch als Druck-Ausgabe 9781118897157 https://onlinelibrary.wiley.com/doi/book/10.1002/9781118897126 Verlag URL des Erstveröffentlichers Volltext |
spellingShingle | Loo, Mark van der 1976- Jonge, Edwin de 1972- Statistical data cleaning with applications in R R Programm (DE-588)4705956-4 gnd Datenverarbeitung (DE-588)4011152-0 gnd Statistik (DE-588)4056995-0 gnd |
subject_GND | (DE-588)4705956-4 (DE-588)4011152-0 (DE-588)4056995-0 |
title | Statistical data cleaning with applications in R |
title_auth | Statistical data cleaning with applications in R |
title_exact_search | Statistical data cleaning with applications in R |
title_exact_search_txtP | Statistical data cleaning with applications in R |
title_full | Statistical data cleaning with applications in R Mark van der Loo, Statistics Netherlands, the Netherlands, Edwin de Jonge, Statistics Netherlands, the Netherlands |
title_fullStr | Statistical data cleaning with applications in R Mark van der Loo, Statistics Netherlands, the Netherlands, Edwin de Jonge, Statistics Netherlands, the Netherlands |
title_full_unstemmed | Statistical data cleaning with applications in R Mark van der Loo, Statistics Netherlands, the Netherlands, Edwin de Jonge, Statistics Netherlands, the Netherlands |
title_short | Statistical data cleaning with applications in R |
title_sort | statistical data cleaning with applications in r |
topic | R Programm (DE-588)4705956-4 gnd Datenverarbeitung (DE-588)4011152-0 gnd Statistik (DE-588)4056995-0 gnd |
topic_facet | R Programm Datenverarbeitung Statistik |
url | https://onlinelibrary.wiley.com/doi/book/10.1002/9781118897126 |
work_keys_str_mv | AT loomarkvander statisticaldatacleaningwithapplicationsinr AT jongeedwinde statisticaldatacleaningwithapplicationsinr |