图书介绍
Hadoop权威指南 英文 第4版【2025|PDF|Epub|mobi|kindle电子书版本百度云盘下载】

- (美)怀特著 著
- 出版社: 南京:东南大学出版社
- ISBN:9787564159177
- 出版时间:2015
- 标注页数:730页
- 文件大小:77MB
- 文件页数:756页
- 主题词:数据处理软件-指南-英文
PDF下载
下载说明
Hadoop权威指南 英文 第4版PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
Part Ⅰ.Hadoop Fundamentals3
1.Meet Hadoop3
Data!3
Data Storage and Analysis5
Querying All Your Data6
Beyond Batch7
Comparison with Other Systems8
Relational Database Management Systems8
Grid Computing10
Volunteer Computing11
A Brief History of Apache Hadoop12
What's in This Book?15
2.MapReduce19
A Weather Dataset19
Data Format19
Analyzing the Data with Unix Tools21
Analyzing the Data with Hadoop22
Map and Reduce22
Java Map Reduce24
Scaling Out30
Data Flow30
Combiner Functions34
Running a Distributed Map Reduce Job37
Hadoop Streaming37
Ruby37
Python40
3.The Hadoop Distributed Filesystem43
The Design of HDFS43
HDFS Concepts45
Blocks45
Namenodes and Datanodes46
Block Caching47
HDFS Federation48
HDFS High Availability48
The Command-Line Interface50
Basic Filesystem Operations51
Hadoop Filesystems53
Interfaces54
The Java Interface56
Reading Data from a Hadoop URL57
Reading Data Using the FileSystem API58
Writing Data61
Directories63
Querying the Filesystem63
Deleting Data68
Data Flow69
Anatomy of a File Read69
Anatomy of a File Write72
Coherency Model74
Parallel Copying with distcp76
Keeping an HDFS Cluster Balanced77
4.YARN79
Anatomy of a YARN Application Run80
Resource Requests81
Application Lifespan82
Building YARN Applications82
YARN Compared to MapReduce 183
Scheduling in YARN85
Scheduler Options86
Capacity Scheduler Configuration88
Fair Scheduler Configuration90
Delay Scheduling94
Dominant Resource Fairness95
Further Reading96
5.Hadoop I/O97
Data Integrity97
Data Integrity in HDFS98
LocalFileSystem99
ChecksumFileSystem99
Compression100
Codecs101
Compression and Input Splits105
Using Compression in MapReduce107
Serialization109
The Writable Interface110
Writable Classes113
Implementing a Custom Writable121
Serialization Frameworks126
File-Based Data Structures127
SequenceFile127
MapFile135
Other File Formats and Column-Oriented Formats136
Part Ⅱ.MapReduce141
6.Developing a MapReduce Application141
The Configuration API141
Combining Resources143
Variable Expansion143
Setting Up the Development Environment144
Managing Configuration146
GeneficOptionsParser,Tool,and ToolRunner148
Writing a Unit Test with MRUnit152
Mapper153
Reducer156
Running Locally on Test Data156
Running a Job in a Local Job Runner157
Testing the Driver158
Running on a Cluster160
Packaging a Job160
Launching a Job162
The MapReduce Web UI165
Retrieving the Results167
Debugging a Job168
Hadoop Logs172
Remote Debugging174
Tuning a Job175
Profiling Tasks175
MapReduce Workflows177
Decomposing a Problem into MapReduce Jobs177
JobControl178
Apache Oozie179
7.How Map Reduce Works185
Anatomy of a MapReduce Job Run185
Job Submission186
Job Initialization187
Task Assignment188
Task Execution189
Progress and Status Updates190
Job Completion192
Failures193
Task Failure193
Application Master Failure194
Node Manager Failure195
Resource Manager Failure196
Shuffle and Sort197
The Map Side197
The Reduce Side198
Configuration Tuning201
Task Execution203
The Task Execution Environment203
Speculative Execution204
Output Committers206
8.MapReduce Types and Formats209
MapReduce Types209
The Default MapReduce Job214
Input Formats220
Input Splits and Records220
Text Input232
Binary Input236
Multiple Inputs237
Database Input(and Output)238
Output Formats238
Text Output239
Binary Output239
Multiple Outputs240
Lazy Output245
Database Output245
9.MapReduce Features247
Counters247
Built-in Counters247
User-Defined Java Counters251
User-Defined Streaming Counters255
Sorting255
Preparation256
Partial Sort257
Total Sort259
Secondary Sort262
Joins268
Map-Side Joins269
Reduce-Side Joins270
Side Data Distribution273
Using the Job Configuration273
Distributed Cache274
MapReduce Library Classes279
Part Ⅲ.Hadoop Operations283
1O.Setting Up a Hadoop Cluster283
Cluster Specification284
Cluster Sizing285
Network Topology286
Cluster Setup and Installation288
Installing Java288
Creating Unix User Accounts288
Installing Hadoop289
Configuring SSH289
Configuring Hadoop290
Formatting the HDFS Filesystem290
Starting and Stopping the Daemons290
Creating User Directories292
Hadoop Configuration292
Configuration Management293
Environment Settings294
Important Hadoop Daemon Properties296
Hadoop Daemon Addresses and Ports304
Other Hadoop Properties307
Security309
Kerberos and Hadoop309
Delegation Tokens312
Other Security Enhancements313
Benchmarking a Hadoop Cluster314
Hadoop Benchmarks314
User Jobs316
11.Administering Hadoop317
HDFS317
Persistent Data Structures317
Safe Mode322
Audit Logging324
Tools325
Monitoring330
Logging330
Metrics and JMX331
Maintenance332
Routine Administration Procedures332
Commissioning and Decommissioning Nodes334
Upgrades337
Part Ⅳ.Related Projects345
12.Avro345
Avro Data Types and Schemas346
In-Memory Serialization and Deserialization349
The Specific API351
Avro Datafiles352
Interoperability354
Python API354
Avro Tools355
Schema Resolution355
Sort Order358
Avro MapReduce359
Sorting Using Avro MapReduce363
Avro in Other Languages365
13.Parquet367
Data Model368
Nested Encoding370
Parquet File Format370
Parquet Configuration372
Writing and Reading Parquet Files373
Avro,Protocol Buffers,and Thrift375
Parquet MapReduce377
14.Flume381
Installing Flume381
An Example382
Transactions and Reliability384
Batching385
The HDFS Sink385
Partitioning and Interceptors387
File Formats387
Fan Out388
Delivery Guarantees389
Replicating and Multiplexing Selectors390
Distribution:Agent Tiers390
Delivery Guarantees393
Sink Groups395
Integrating Flume with Applications398
Component Catalog399
Further Reading400
15.Sqoop401
Getting Sqoop401
Sqoop Connectors403
A Sample Import404
Text and Binary File Formats406
Generated Code407
Additional Serialization Systems408
Imports:A Deeper Look408
Controlling the Import410
Imports and Consistency411
Incremental Imports411
Direct-Mode Imports411
Working with Imported Data412
Imported Data and Hive413
Importing Large Objects415
Performing an Export417
Exports:A Deeper Look419
Exports and Transactionality420
Exports and SequenceFiles421
Further Reading422
16.Pig423
Installing and Running Pig424
Execution Types424
Running Pig Programs426
Grunt426
Pig Latin Editors427
An Example427
Generating Examples429
Comparison with Databases430
Pig Latin432
Structure432
Statements433
Expressions438
Types439
Schemas441
Functions445
Macros447
User-Defined Functions448
A Filter UDF448
An Eval UDF452
A Load UDF453
Data Processing Operators457
Loading and Storing Data457
Filtering Data457
Grouping and Joining Data459
Sorting Data465
Combining and Splitting Data466
Pig in Pracfice467
Parallelism467
Anonymous Relations467
Parameter Substitution468
Further Reading469
17.Hive471
Installing Hive472
The Hive Shell473
An Example474
Running Hive475
Configuring Hive475
Hive Services478
The Metastore480
Comparison with Traditional Databases482
Schema on Read Versus Schema on Write482
Updates,Transactions,and Indexes483
SQL-on-Hadoop Alternatives484
HiveQL485
Data Types486
Operators and Functions488
Tables489
Managed Tables and External Tables490
Partitions and Buckets491
Storage Formats496
Importing Data500
Altering Tables502
Dropping Tables502
Querying Data503
Sorting and Aggregating503
MapReduce Scripts503
Joins505
Subqueries508
Views509
User-Defined Functions510
Writing a UDF511
Writing a UDAF513
Further Reading518
18.Crunch519
An Example520
The Core Crunch API523
Primitive Operations523
Types528
Sources and Targets531
Functions533
Materialization535
Pipeline Execution538
Running a Pipeline538
Stopping a Pipeline539
Inspecting a Crunch Plan540
Iterative Algorithms543
Checkpointing a Pipeline545
Crunch Libraries545
Further Reading548
19.Spark549
Installing Spark550
An Example550
Spark Applications,Jobs,Stages,and Tasks552
A Scala Standalone Application552
A Java Example554
A Python Example555
Resilient Distributed Datasets556
Creation556
Transformations and Actions557
Persistence560
Serialization562
Shared Variables564
Broadcast Variables564
Accumulators564
Anatomy of a Spark Job Run565
Job Submission565
DAG Construction566
Task Scheduling569
Task Execution570
Executors and Cluster Managers570
Spark on YARN571
Further Reading574
20.HBase575
HBasics575
Backdrop576
Concepts576
Whirlwind Tour of the Data Model576
Implementation578
Installation581
Test Drive582
Clients584
Java584
MapReduce587
REST and Thrift589
Building an Online Query Application589
Schema Design590
Loading Data591
Online Queries594
HBase Versus RDBMS597
Successful Service598
HBase599
Praxis600
HDFS600
UI601
Metrics601
Counters601
Further Reading601
21.ZooKeeper603
Installing and Running ZooKeeper604
An Example606
Group Membership in ZooKeeper606
Creating the Group607
Joining a Group609
Listing Members in a Group610
Deleting a Group612
The ZooKeeper Service613
Data Model614
Operations616
Implementation620
Consistency622
Sessions624
States625
Building Applications with ZooKeeper627
A Configuration Service627
The Resilient ZooKeeper Application630
A Lock Service634
More Distributed Data Structures and Protocols636
ZooKeeper in Production637
Resilience and Performance637
Configuration639
Further Reading640
Part Ⅴ.Case Studies643
22.Composable Data at Cerner643
From CPUs to Semantic Integration643
Enter Apache Crunch644
Building a Complete Picture644
Integrating Healthcare Data647
Composability over Frameworks650
Moving Forward651
23.Biological Data Science:Saving Lives with Software653
The Structure of DNA655
The Genetic Code:Turning DNA Letters into Proteins656
Thinking of DNA as Source Code657
The Human Genome Project and Reference Genomes659
Sequencing and Aligning DNA660
ADAM,A Scalable Genome Analysis Platform661
Literate programming with the Avro interface description language(IDL)662
Column-oriented access with Parquet663
A simple example:k-mer counting using Spark and ADAM665
From Personalized Ads to Personalized Medicine667
Join In668
24.Cascading669
Fields,Tuples,and Pipes670
Operations673
Taps,Schemes,and Flows675
Cascading in Practice676
Flexibility679
Hadoop and Cascading at ShareThis680
Summary684
A.Installing Apache Hadoop685
B.Cloudera's Distribution Including Apache Hadoop691
C.Preparing the NCDC Weather Data693
D.The Old and New Java MapReduce APIs697
Index701
热门推荐
- 2082688.html
- 1731635.html
- 3118271.html
- 3646386.html
- 1860663.html
- 102433.html
- 844275.html
- 824572.html
- 871890.html
- 931968.html
- http://www.ickdjs.cc/book_2117051.html
- http://www.ickdjs.cc/book_1978508.html
- http://www.ickdjs.cc/book_1859641.html
- http://www.ickdjs.cc/book_214166.html
- http://www.ickdjs.cc/book_2665577.html
- http://www.ickdjs.cc/book_2659075.html
- http://www.ickdjs.cc/book_3785299.html
- http://www.ickdjs.cc/book_1701126.html
- http://www.ickdjs.cc/book_3231294.html
- http://www.ickdjs.cc/book_2380426.html