Handbook on Data Centers

von: Samee Ullah Khan, Albert Y. Zomaya

Springer-Verlag, 2015

ISBN: 9781493920921 , 1309 Seiten

Format: PDF, OL

Kopierschutz: Wasserzeichen

Windows PC,Mac OSX geeignet für alle DRM-fähigen eReader Apple iPad, Android Tablet PC's Online-Lesen für: Windows PC,Mac OSX,Linux

Preis: 213,99 EUR

Mehr zum Inhalt

Handbook on Data Centers


 

Preface

5

Contents

8

Part I Energy Efficiency

13

Energy-Efficient and High-Performance Processing of Large-Scale Parallel Applications in Data Centers

14

1 Introduction

14

1.1 Motivation

14

1.2 Our Contributions

16

2 Related Work

17

3 Preliminaries

18

3.1 Power and Task Models

19

3.2 Problems

21

3.3 Lower Bounds

21

4 Heuristic Algorithms

22

4.1 Precedence Constraining

22

4.2 System Partitioning

23

4.3 Task Scheduling

25

5 Optimal Energy/Time/Power Allocation

26

5.1 Minimizing Schedule Length

26

5.1.1 Level 1

26

5.1.2 Level 2

27

5.1.3 Level 3

27

5.1.4 Level 4

28

5.2 Minimizing Energy Consumption

32

5.2.1 Level 1

32

5.2.2 Level 2

32

5.2.3 Level 3

33

5.2.4 Level 4

33

6 Simulation Data

36

7 Summary and Future Research

43

References

44

Energy-Aware Algorithms for Task Graph Scheduling, Replica Placement and Checkpoint Strategies

47

1 Introduction

47

2 Energy Models

49

2.1 Literature Survey

50

2.1.1 DVFS and Optimization Problems

51

2.1.2 Energy Models

52

2.2 Example

52

3 Minimizing the Energy of a Schedule

54

3.1 Optimization Problem

54

3.2 The CONTINUOUS Model

55

3.2.1 Special Execution Graphs

56

3.2.2 General DAGs

57

3.3 Discrete Models

57

3.3.1 The VDD-HOPPING Model

58

3.3.2 NP-Completeness and Approximation Results

58

3.4 Final Remarks

59

4 Replica Placement

59

4.1 Framework

60

4.1.1 Replica Servers

61

4.1.2 With Power Consumption

62

4.1.3 Objective Functions

63

4.1.4 Summary of Results

63

4.2 Complexity Results: Update Strategies

64

4.2.1 Running Example

64

4.2.2 Dynamic Programming Algorithm

65

4.3 Complexity Results with Power

67

4.3.1 Running Example

67

4.3.2 NP-Completeness of MINPOWER

68

4.3.3 A Pseudo-polynomial Algorithm for MINPOWER-BOUNDEDCOST

70

4.4 Simulations

71

4.4.1 Impact of Pre-existing Servers

71

4.4.2 With Power Consumption

73

4.4.3 Running Time of the Algorithms

74

4.5 Concluding Remarks

74

5 Checkpointing Strategies

75

5.1 Framework

76

5.1.1 Model

76

5.1.2 Optimization Problems

77

5.2 With a Single Chunk

78

5.2.1 SINGLESPEED Model

78

5.2.2 MULTIPLESPEEDS Model

79

5.3 Several Chunks

80

5.3.1 Single Speed Model

81

5.3.2 Multiple Speeds Model

82

5.4 Simulations

83

5.4.1 Simulation Settings

83

5.4.2 Comparison with Single Speed

85

5.4.3 Comparison Between EXPECTED-DEADLINE and Hard-Deadline

86

5.5 Concluding Remarks

86

6 Conclusion

87

References

88

Energy Efficiency in HPC Data Centers: Latest Advances to Build the Path to Exascale

91

1 Introduction

91

2 Computing Systems Architectures

92

2.1 Architecture of the Current HPC Facilities

92

2.2 Overview of the Main HPC Components

95

2.3 HPC Performance and Energy Efficiency Evaluation

99

3 Energy-Efficiency in HPC Data-Center: Overview & Challenges

102

3.1 The Exascale Challenge

102

3.2 Hardware Approaches Using Low-Power processors

103

3.3 Energy Efficiency of Virtualization Frameworks over HPC Workloads

105

3.4 Energy Efficiency in Resource and Job Management Systems (RJMSs)

110

4 Conclusion: Open Challenges

114

References

115

Techniques to Achieve Energy Proportionality in Data Centers: A Survey

118

1 Introduction

118

2 Energy Proportionality

120

2.1 Energy Proportionality at the Server Level

121

2.2 Energy Proportionality at Data Center Level

123

2.3 Overview on Power Proportionality Techniques at Different Data Center Levels

124

3 Energy Proportionality at Component Level

127

3.1 Energy Proportionality at the CPU

127

3.2 Energy Proportionality at the Memory

129

3.3 Energy Proportionality at the Disk

131

3.4 Energy Proportionality at the Networking Interface

132

4 Power Management Techniques at Server Level

133

5 Data Center/Cluster Level Power Management

135

5.1 Server Provisioning in Internet Data Centers (IDCs)

136

5.2 Virtual Machine Management

144

5.3 Other Data Center Level Power Management Techniques

148

6 Energy Cost Minimization Through Workload Distribution Across Data Centers

152

7 Data Center Simulation Tools

157

8 Performance of Server and Data Center Level Power Management Techniques

159

9 Conclusions

161

References

162

A Power-Aware Autonomic Approach for Performance Management of Scientific Applications in a Data Center Environment

172

1 Introduction

172

2 Background

175

3 An Online Look-Ahead Control-based Management Approach

182

4 Case Study: Performance Management of a Parallel Loop Execution Environment

187

5 Benefits of the Proposed Approach

193

6 Combining DLS Techniques with the Proposed Approach

194

7 Conclusion

195

References

196

CoolEmAll: Models and Tools for Planning and Operating Energy Efficient Data Centres

199

1 Introduction

199

1.1 The CoolEmAll Project

202

1.2 RelatedWork

204

2 Simulation, Visualisation and Decision Support Toolkit

205

2.1 Architecture

206

2.2 Application Profiler

208

2.3 Data Center Workload and Resource Management Simulator

209

2.3.1 Architecture

209

2.3.2 Workload Modelling

210

2.3.3 Resource Description

211

2.3.4 Simulation of Energy Efficiency

212

2.3.5 Application Performance Modelling

213

2.4 Interactive Computational Fluid Dynamics Simulation

214

2.5 Visualization

216

3 Data centre Efficiency Building Blocks

217

3.1 DEBB Concept and Structure

217

3.2 Hardware Models for Workload Simulation

220

3.2.1 Hardware Modelling in DCworms Workload Simulator

220

3.2.2 Hardware Power Profiles

222

3.2.3 Electrical Model of the Power Supply Unit 2.0

222

3.3 Hardware Models for Thermodynamic Profiles and Cooling Equipment

223

3.4 Hardware Models for CFD Simulation

225

3.5 Assessment of DEBBs

227

4 Energy Efficiency Metrics

227

4.1 State of the Art

228

4.2 Selected Metrics for CoolEmAll

229

4.2.1 Resource Usage Metrics

230

4.2.2 Energy Based Metrics

231

4.2.3 Heat-Aware Metrics

232

4.3 Application Power Model

233

5 Validation of the CoolEmAll Approach

234

5.1 Validation Approach

234

5.1.1 Capacity Management

236

5.1.2 Optimisation of Rack Arrangement in a Compute Room Using Open Data Centre Building Blocks

236

5.1.3 Analysis of Free Cooling Efficiency for Various Inlet Temperatures

237

5.2 Testbed

237

5.3 Analysis and Optimization of Data Centre Efficiency

239

5.3.1 Capacity Management

239

5.3.2 Analysing Cooling Efficiency in Compute-room

246

6 Business Impact

248

7 Summary

250

References

251

Smart Data Center

254

1 Introduction

254

2 System Model

255

2.1 Long Term Power Purchase

256

2.2 Real Time Power Purchase

257

3 Constraints

257

3.1 Purchasing Accuracy and Cost

257

3.2 Data Center Availability

258

3.3 UPS Lifetime

258

4 Cost Minimization

259

5 Algorithm Design

259

5.1 Drift Plus Penalty Upper Bound

260

5.2 Relaxed Optimization

262

5.3 Two Timescale Smart Data Center Algorithm

263

6 Performance Analysis

264

7 Related Work

267

8 Conclusions

267

References

268

Power and Thermal Efficient Numerical Processing

270

1 Introduction

270

2 Floating-Point Representation

271

2.1 Formats

272

2.2 Rounding Modes

272

2.3 Operations

273

2.4 Exceptions

273

3 Floating-Point Addition

273

4 Floating-Point Multiplication

275

5 Floating-Point Fused Multiply-Add

277

6 Floating-Point Division

279

6.1 Division by Digit Recurrence

279

6.1.1 Radix-4 Division Algorithm

280

6.1.2 Intel Penryn Division Unit

281

6.1.3 Radix-16 by Overlapping Two Radix-4 Stages

281

6.2 Division by Multiplication

283

7 Energy dissipation in FP-units

286

7.1 Energy Metrics

286

7.2 Implementation of the FP-Units

287

7.3 Energy Consumption in Floating-Point Workloads

288

7.4 Thermal Analysis

290

8 Conclusions and Outlook on FP-Units

292

References

292

Providing Green Services in HPC Data Centers: A Methodology Based on Energy Estimation

294

1 Introduction

294

2 Identifying Operations in a Service

297

2.1 Fault Tolerance Case

297

2.2 Data Broadcasting Case

298

2.3 Associated Parameters

299

3 Energy Calibration Methodology

300

3.1 Calibration of the Power Consumption op

301

3.2 Calibration of the Execution Time top

302

3.2.1 Fault Tolerance Case

303

3.2.2 Data Broadcasting Case

304

4 Energy Estimation Methodology

305

4.1 Fault Tolerance Case

306

4.1.1 Checkpointing

307

4.1.2 Message Logging

307

4.1.3 Coordination

308

4.2 Data Broadcasting Case

309

4.2.1 MPI/SAG and Hybrid/SAG

309

4.2.2 MPI/Pipeline and Hybrid/Pipeline

310

5 Validation of the Estimations

311

5.1 Calibration Results of the Platform

311

5.1.1 Calibrating the Power Consumption

311

5.1.2 Calibration of the Execution Time

314

5.2 Accuracy of the Estimations

320

5.2.1 Fault Tolerance Case

321

5.2.2 Data Broadcasting Case

323

6 Energy-Aware Choice of Services for HPC applications

325

6.1 Fault Tolerance Protocols

325

6.2 Data Broadcasting Algorithms

326

7 Conclusion

327

References

329

Part II Networking

331

Network Virtualization in Data Centers: A Data Plane Perspective

332

1 Introduction

332

1.1 Network Link Virtualization

333

1.2 Network Node Virtualization

333

1.3 Organization

334

2 Flexible Flow Matching for Network Link Virtualization

334

2.1 Background

334

2.2 Existing Solutions

336

2.3 Algorithmic Solution for Efficient Flexible Flow Matching

337

2.3.1 Motivations

337

2.3.2 Algorithms

339

2.3.3 Architecture

340

2.4 Performance Evaluation

342

2.4.1 Experimental Setup

342

2.4.2 Algorithm Evaluation

342

2.4.3 Hardware Implementation

344

3 Resource Consolidation in Network Node Virtualization

344

3.1 Background

345

3.2 Existing Solutions

346

3.3 Efficient Algorithm for Resource Consolidation

346

3.3.1 Motivations

346

3.3.2 Trie Merging

348

3.3.3 Lookup Process

349

3.3.4 Traffic Isolation

349

3.4 Analysis and Evaluation

350

3.4.1 Theoretical Comparison

350

3.4.2 Experimental Setup

350

3.4.3 Scalability

351

3.4.4 Execution Time

352

4 Summary and Discussion

352

References

353

Optical Data Center Networks: Architecture, Performance, and Energy Efficiency

355

1 Introduction

355

2 Optical Switches Used in Optical Data Center Networks

357

2.1 Optical Packet Switches

357

2.2 Optical Circuit Switches

358

3 Approach 1: Optical Data Center Networks to Provide Large Bandwidth for All-to-All Communication

360

3.1 Optical Packet Switches with Large Bandwidth

361

3.2 Data Center Network Structure Using Optical Packet Switches

362

3.2.1 Connection Within Group

364

3.2.2 Connection Between Groups

364

3.2.3 Routing in Topology

364

3.3 Parameter Settings

366

3.3.1 Parameters for Connection Between Groups

367

3.3.2 Parameters for Connection within Group

367

3.4 Evaluation

368

3.4.1 Topologies

368

3.4.2 Properties of Topologies

370

3.4.3 Maximum Link Load

373

4 Approach 2: Networks to Achieve Low Energy Consumption

374

4.1 Overview

376

4.2 Virtual Network Topologies Suitable for Optical Data Center Networks

377

4.2.1 Requirements

377

4.2.2 Existing Network Structures for Data Centers

378

4.2.3 Generalized Flattened Butterfly

380

4.3 Control of Virtual Network Topology to Achieve Low Energy Consumption

388

4.3.1 Outline

388

4.3.2 Control of Topology to Satisfy Requirements

389

4.4 Evaluation

391

5 Conclusion

393

References

394

Scalable Network Communication Using Unreliable RDMA

396

1 Introduction

396

1.1 The Significance of Data Communication

397

1.2 Datacenter Computing and RDMA

399

1.3 High-Performance Computing and RDMA

399

1.4 RDMA and the Current Unreliable Datagram Network Transports

400

2 Overview of RDMA Technology

401

2.1 Overview of the iWARP Standard

402

2.2 Overview of the InfiniBand Standard

404

3 The Case for RDMA over Unreliable Transports

405

3.1 Importance of Unreliable Connectionless RDMA

405

3.2 Benefits of RDMA over Unreliable Datagrams for iWARP

406

4 RDMA over Unreliable Datagrams

408

4.1 Related Work and Development History

409

4.2 iWARP Extension Methodology

410

4.3 iWARP Design Changes

410

4.4 RDMA Write-Record

413

4.5 Packet Loss Design Considerations

416

5 Datagram-iWARP Software Implementation

416

5.1 iWARP Socket Interface

418

6 Experimental Results and Analysis

418

6.1 Verbs-Layer Microbenchmarks

419

6.2 Send/Recv Broadcast

419

6.3 Packet Loss and Performance

420

6.4 Datacenter Application Results

422

7 Summary

425

References

426

Packet Classification on Multi-core Platforms

428

1 Introduction

428

2 Background

429

2.1 Multi-field Packet Classification

429

2.2 Related Work

430

2.3 Multi-core Processor

431

3 Decision-Tree Based Approaches

432

3.1 Algorithms

432

3.2 Challenges and Prior Work

434

4 Decomposition-Based Approaches

435

4.1 Overview

435

4.2 Challenges and Prior Work

436

4.3 Preprocessing

437

4.4 Searching

440

4.5 Merging

441

5 Performance Evaluation and Summary of Results

441

5.1 Experimental Setup

441

5.2 Latency

443

5.3 Throughput

444

5.4 Cache Performance

445

5.5 Impact of the Number of Threads

447

5.6 Comparison with Existing Approaches

447

6 Conclusion

449

References

449

Optical Interconnects for Data Center Networks

451

1 Introduction

451

2 Need for Optical Interconnects in Data Center Networks

452

3 Optical Components in Data Centers

455

3.1 Semiconductor Optical Amplifier (SOA)

456

3.2 Silicon Micro Ring Resonator

456

3.3 ArrayedWaveguide Grating

456

3.4 Wavelength Selective Switch

458

3.5 MEMS Switch(Optical Switching Matrix, Optical Crossbar)

459

3.6 Circulators

461

3.7 Optical Multiplexer and De-multiplexer

461

4 Optical Interconnects in Data Center Networks and their Performance

461

4.1 Reconfigurable Architectures

461

4.1.1 An Enhanced Optically Connected Network Architecture

462

4.1.2 OSA, a Novel Optical Switching Architecture for DCNs

462

4.1.3 Wavelength-reconfigurable optical packet and circuit switched platform for DCNs

463

4.1.4 Next-Generation Optically-Interconnected High-Performance Data Centers

464

4.1.5 The Data Vortex Optical Packet Switched Interconnection Network

465

4.1.6 Proteus: A Topology Malleable Data Center Network

465

4.1.7 A Hybrid Optical Packet and Wavelength Selective Switch for High-Performance DCNs

466

4.2 Power Saving Architectures

467

4.2.1 VCSEL Based Energy Efficient and Bandwidth Reconfigurable Architecture

467

4.2.2 A Wavelength Striped, Packet Switched, Optical Interconnection Network

468

4.2.3 SPRINT: Scalable Photonic Switching Fabric for HIGH PERFORMANCE COMPUTING

468

4.3 Low Latency Architectures

470

4.3.1 DOS: A Scalable Optical Switch for Data Centers

470

4.3.2 Scalable Optical Packet Switch Architecture for Low Latency and High Load

471

4.3.3 AWGR Based Data Center Switches Using RSOA-based Optical Mutual Exclusion

472

4.3.4 A Petabit Photonic Packet Switch (P3S)

472

4.3.5 Optical Interconnection Networks: The OSMOSIS Project

473

4.3.6 A Scalable Optical Multi-Plane Interconnection Architecture

474

4.3.7 Low Latency and Large Port Count OPS for Data Center Network Interconnects

474

4.4 Link Bandwidth Scaling Architectures

476

4.4.1 Data Center Network Based on Flexible Bandwidth MIMO OFDM Optical Interconnects

476

4.4.2 Photonic Terabit Routers Employing WDM

477

4.5 High Radix Switch Design

478

5 Data center traffic characteristics

478

6 Energy Requirements for Data Center Networks

480

7 Routing in Data Centers

482

References

483

TCP Congestion Control in Data Center Networks

486

1 Introduction

486

2 TCP Impairments in Data Center Networks

487

2.1 TCP Incast

488

2.2 TCP Outcast

489

2.3 Queue Buildup

490

2.4 Buffer Pressure

491

2.5 Pseudo-Congestion Effect

491

2.6 Summary: TCP Impairments and Causes

492

3 TCP Variants for Data Center Networks

493

TCP with FG-RTO + Delayed ACKs Disabled [3]

493

3.3.1 Explicit Congestion Notification (ECN)

494

4 Summary: TCP Variants for DCNs

503

5 Open Issues

505

6 Concluding Remarks

505

References

505

Routing Techniques in Data Center Networks

507

1 Introduction

507

2 Classification of Routing Schemes in Data Centers

510

2.1 Topology-Aware Routing

511

2.1.1 Server-Centric Approach

511

2.1.2 Switch-centric Approach

512

2.2 Energy-Aware Routing

516

2.2.1 Green Routing

516

2.2.2 Power-Aware Routing

518

2.3 Traffic-sensitive Routing

519

2.3.1 DARD

520

2.3.2 Hedera

522

2.3.3 ESM: Multicast Routing for Data Centers

523

2.3.4 GARDEN

524

2.4 Routing for Content Distribution Networks (CDN)

525

2.4.1 Request-Routing in CDNs

526

2.4.2 Symbiotic Routing

527

2.4.3 fs-PGBR: A Scalable and Delay Sensitive Cloud Routing Protocol

528

2.5 Summary of All Routing and Forwarding Techniques

528

3 Open Issues and Challenges

529

4 Conclusions

530

References

531

Part III Cloud Computing

533

Auditing for Data Integrity and Reliability in Cloud Storage

534

1 Introduction

534

2 Information Auditing: Objective and Approaches

536

2.1 Definition of Information Auditing

536

2.2 Three Approaches of Information Auditing

537

3 Auditing for Data Integrity in Distributed Systems

538

3.1 Strategies of Auditing Data Integrity

538

3.2 Proof of Retrievability

539

3.3 Provable Data Possession

542

3.3.1 Preliminaries

543

3.3.2 Defining the PDP Protocol

544

3.3.3 The Secure PDP Scheme (S-PDP)

545

3.3.4 The Efficient PDP Scheme (E-PDP)

547

3.4 Compact Proof of Retrievability

547

3.4.1 System Model

547

3.4.2 Private Verification Construction

548

3.4.3 Public Verification Construction

549

4 Auditing in Cloud Storage Platform

550

4.1 Challenges

551

4.2 Public Verifiability

552

4.3 Dynamic Data Operations Support

552

4.4 Privacy Preserving

554

4.5 Multiple Verifications

555

5 Open Questions

556

6 Conclusions

557

References

557

I/O and File Systems for Data-Intensive Applications

559

1 Parallel File Systems vs. Data-Intensive File Systems: A Comparison

559

2 Chunk-Aware I/O: Enabling HPC on Data-Intensive File Systems

562

2.1 Motivation

562

2.2 Chunk-Aware I/O Design

564

2.3 Chunk-Aware I/O Implementation

569

2.4 Chunk-Aware I/O Analysis

569

2.5 CHAIO Performance

570

2.5.1 Experiment Setup

570

2.5.2 Performance with Different Request Sizes

570

2.5.3 Performance with Two Replicas

571

2.5.4 Performance with Different Number of Nodes

572

2.5.5 Overhead Analysis in Large-Scale Computing Environments

573

2.5.6 Load Balance

575

3 Related Works

575

3.1 HPC on Data-Intensive File Systems

576

3.2 N-1 Data Access and its Handling

577

4 Summary

578

References

579

Cloud Resource Pricing Under Tenant Rationality

581

1 Introduction

581

2 The Game Model

582

2.1 User Model and Virtual Instances Pricing

582

2.2 Modeling Cloud Revenue and Tenant Surplus

583

2.2.1 Stage I: Cloud Revenue Maximization

583

2.2.2 Stage II: Tenant Surplus Maximization

584

2.3 Stackelberg Equilibrium

584

3 Usage-Based Cloud Resource Pricing

585

3.1 Non-Uniform Pricing

585

3.1.1 Stage II: Tenant Surplus Maximization

585

3.1.2 Stage I: Cloud Pricing Choices

586

3.2 Uniform Pricing

590

3.2.1 Stage II: Tenant Surplus Maximization

590

3.2.2 Stage I: Cloud Pricing Choices

591

4 The Effectiveness of Stackelberg Strategies

592

4.1 Centralized Aggregate Network Utility Maximization

592

4.2 Total Network Utility Under Selfish Interactions

595

4.3 Asymptotic Analysis of Price of Anarchy

597

5 Broker Resource Pricing

598

6 Performance Evaluation

600

6.1 Setup

600

6.2 Economic Implications of Cloud Resource Pricing

600

6.3 Social Welfare Tradeoffs, and Hidden Effects

601

7 Related Work

602

8 Concluding Remarks

603

References

603

Online Resource Management for Carbon-Neutral Cloud Computing

604

1 Introduction

604

1.1 Background

605

1.2 Carbon Neutrality: Benefits and Challenges

606

1.3 Current Research and Limitations

606

1.4 Contributions

607

2 Model

608

2.1 Some Assumptions

609

2.2 Energy Sources

609

2.3 Data Center

610

2.4 Workload

611

3 Problem Formulation

612

3.1 Objective and Constraints

612

3.2 Offline Problem Formulation

614

4 Algorithm for Cost Optimization and Carbon Neutrality

614

4.1 Carbon Deficit Queue

614

4.2 Optimizing for Cost Minimization and Carbon Neutrality

615

4.2.1 Working Principle of COCA

615

4.2.2 Distributed Implementation

616

4.3 Performance Analysis

617

5 Simulation

619

5.1 Data Sets

619

5.2 Results

621

5.2.1 Efficiency of COCA

621

5.2.2 Comparison with Prediction-Based Method

623

6 Extension to Geographic Load Balancing

624

7 Conclusions

625

References

625

A Big Picture of Integrity Verification of Big Data in Cloud Computing

628

1 Introduction

628

2 Motivating Examples

630

3 Problem Analysis---Framework and Lifecycle

631

4 Representative Approaches and Analysis

633

4.1 Preliminaries

633

4.1.1 RSA Signature

633

4.1.2 Bilinear Pairing and BLS Signature

634

4.1.3 Merkle Hash Tree

634

4.2 Representative Schemes

635

4.2.1 PDP

635

4.2.2 Compact POR

636

4.2.3 DPDP

637

4.2.4 Public Auditing of Dynamic Data

637

4.2.5 Authorized Auditing with Fine-Grained Data Updates

638

5 Other Related Work

638

6 Conclusions and Future Work

639

References

640

An Out-of-Core Task-based Middleware for Data-Intensive Scientific Computing

643

1 Introduction

643

2 Related Work

646

3 An Out-of-Core Task-based Middleware

647

3.1 Global and Local Schedulers

649

3.2 Storage Service

650

4 Linear Algebra Frontend (LAF)

651

5 A Case Study: Block Iterative Eigensolver Using DOoC+LAF

652

5.1 Eigenvalue Problem in the Configuration Interaction Approach

652

5.2 Implementation Using 1D partitioning

654

5.3 Implementation Using a 2D Partitioning

656

6 Experiments

656

6.1 Practical Considerations

657

6.2 Performance Results for Nmax=8

658

7 Conclusions

660

References

661

Building Scalable Software for Data Centers: An Approach to Distributed Computing at Enterprise Level

664

1 Introduction to Big Data Problems

664

2 Known Solutions at Design Phase: Overview of Design Patterns for Parallel & Distributed Computing

666

3 Introduction to MapReduce Programming Model

669

4 Overview of Apache Hadoop: A Framework for Distributed Computing

672

4.1 Distributed File System: HDFS

672

4.2 MapReduce Framework & API

674

4.3 Database Support: HBase

678

4.4 High Level Programming Language: Pig

679

4.5 Hive: Another Database Support & High Level Programming Language

680

5 Conclusions

682

References

682

Cloud Storage over Multiple Data Centers

685

1 Introduction

685

2 Cloud Storage in a Nutshell

687

2.1 Architecture

687

2.2 Metadata Service

689

2.2.1 Layout Manager

689

2.2.2 Meta-Server

689

2.2.3 Lock Service

690

2.3 Storage Service

690

2.3.1 Namenode

690

2.3.2 Chunk Servers

691

3 Replication Strategies

691

3.1 Introduction

691

3.2 Asynchronous Replication

692

3.3 Synchronous Replication

694

3.4 Placement of Replicas

695

4 Data Striping Methods

696

4.1 Introduction

696

4.2 Erasure Code Types

697

4.3 Erasure Codes in Data Centers

698

5 Consistency Models

699

5.1 Introduction

699

5.2 Strong Consistency

700

5.3 Weak Consistency

701

6 Cloud of Multiple Clouds

703

6.1 Introduction

703

6.2 Architecture

704

6.3 Data Striping

705

6.4 Retrieving Strategy

707

6.5 Mutual Exclusion

707

7 Privacy and Security of Storage System

709

7.1 Introduction

709

7.2 Fine-Grained Data Access Control

710

7.3 Security on Storage Server

712

8 Conclusion and Future Directions

714

References

715

Part IV Hardware

720

Realizing Accelerated Cost-Effective Distributed RAID

721

1 Introduction

721

2 Background

723

2.1 Rationale

723

2.1.1 Backend vs. Client-driven Parity Generation

723

2.1.2 Block-Based vs. Per-File RAID

724

2.1.3 Hardware vs. Accelerated Software RAID

724

2.1.4 Discussion

725

2.2 Enabling Technologies

725

2.2.1 Erasure Codes

725

2.2.2 The Lustre Parallel File System

727

2.2.3 KGPU

727

3 Design

728

3.1 System Overview

728

3.2 RAID-enabled PFS Design

729

3.3 Control Flow

730

3.4 Degraded Array Reconstruction

732

4 Implementation

732

4.1 Basic GPU Implementation

733

4.2 Optimizations

733

5 Evaluation

734

5.1 Experimental Setup

734

5.2 I/O Throughput Measurement

735

5.2.1 Raw Throughput

735

5.2.2 Encoding Throughput

736

5.2.3 Impact of Number of Disks on Throughput

737

5.2.4 End-to-End Data Integrity

739

5.3 RAID Reconstruction Cost

739

5.4 Impact on Applications

740

6 Related Work

740

7 Conclusion

742

References

742

Efficient Hardware-Supported Synchronization Mechanisms for Manycores

745

1 Introduction

745

2 The G-Lines Technology

746

3 Hardware Barrier Synchronization

747

4 The GBarrier Synchronization Mechanism

748

4.1 Dedicated On-Chip Network Architecture

749

4.2 Synchronization Protocol

750

4.3 Programmability Issues

753

5 Performance Implications

754

5.1 Implementation Technologies

754

5.1.1 G-Lines Technology

754

5.1.2 Standard Technology

754

5.2 Raw Performance Statistics

755

6 Evaluation

757

6.1 Experimental Setup

757

6.2 Barrier Implementations

758

6.3 Performance Results

759

6.3.1 Execution Time

759

6.3.2 Network Traffic

763

6.3.3 Energy Efficiency

765

7 Related Work

766

8 Hardware Lock Synchronization

768

9 The GLock Synchronization Mechanism

770

9.1 Dedicated On-Chip Network Architecture

770

9.2 Synchronization Protocol

771

9.3 Programmability Issues

774

10 Performance Implications

776

10.1 Implementation Technologies

776

10.1.1 G-Lines Technology

776

10.1.2 Standard Technology

777

10.2 Raw Performance Statistics

778

11 Evaluation

779

11.1 Experimental Setup

779

11.2 Post-mortem Analysis of Benchmarks

781

11.3 Lock Implementations

782

11.4 Performance Results

783

11.4.1 Execution Time

783

11.4.2 Network Traffic

786

11.4.3 Energy Efficiency

788

12 Related Work

789

13 Conclusions

791

References

793

Hardware Approaches to Transactional Memory in Chip Multiprocessors

796

1 Introduction

796

2 Why Transactional Memory Is Going Mainstream

798

2.1 The Drawbacks of Lock-Based Synchronization

799

2.2 The Transactional Abstraction

799

2.3 High-Performance Transactional Memory

800

2.4 Industrial Adoption of Hardware Transactional Memory

801

3 Fundamentals of Transactional Memory

802

4 Hardware Mechanisms for Transactional Memory

803

4.1 ISA Extensions

803

4.2 Transactional Book-Keeping

804

4.3 Data Versioning

805

4.4 Conflict Detection and Resolution

805

4.5 Transaction Commit

807

4.6 Transaction Abort

807

5 Intel TSX: TM Support in Mainstream Processors

808

5.1 Hardware Lock Elision

809

5.2 Restricted Transactional Memory

810

6 Analysing Intel TSX Performance on Haswell

810

7 An Overview of Hardware TM Research

815

8 Conclusions

821

References

821

Part V Modeling and Simulation

827

Data Center Modeling and Simulation Using OMNeT++

828

1 Introduction to Modeling and Simulation (M&S) Methodology

829

1.1 Parallel Discrete Event Simulation---PDES

830

2 Data Center Architectures

831

3 Data Center Modeling Using OMNeT++

833

3.1 Simple Two Node Simulation

833

3.2 Advance Level Simulation

836

3.3 Data Center Simulation Model

839

4 Wrap Up

843

References

843

Power-Thermal Modeling and Control of Energy-Efficient Servers and Datacenters

845

1 Introduction

845

1.1 Overall Datacenter Architecture

847

1.2 Datacenter Workload Characteristics

848

1.3 Energy Efficiency of Datacenters

850

1.4 Chapter Organization

851

2 State-of-the-Art in Datacenter Design

852

2.1 Computing Servers

852

2.2 Cooling Infrastructure

854

3 Power and Temperature Modeling and Monitoring

857

3.1 Server Modeling

858

3.2 Datacenter Modeling

861

3.3 Monitoring System for Datacenters

863

4 Power and Thermal Managements of Servers

864

4.1 Overview of CPU Power and Thermal Management Techniques

865

4.2 Run-Time Hierarchical Power and Thermal Management for Server Architectures

867

4.3 Design-Time Power and Thermal Optimizations

871

5 Power and Thermal Managements for Server Clusters

876

5.1 Conventional Solution to Minimize Power Consumption for Server Clusters

876

5.2 Correlation-Aware Power and Temperature Management

877

6 Power Minimization of Datacenters with Hybrid Cooling Architectures

886

6.1 Formal Problem Definition

888

6.2 Multi-objective Trade-offs Exploration Between Cooling Mode and Utilization Threshold

889

6.3 Simulation Results

893

7 Conclusions

895

References

896

Thermal Modeling and Management of Storage Systems in Data Centers

902

1 Introduction

902

2 Related Work

904

2.1 Efficient Data Centers

904

2.2 Thermal Modeling

905

2.3 Thermal Management

905

3 Thermal Modeling

906

3.1 CPU Thermal Model

907

3.2 Disk Thermal Model

909

3.3 Thermal Model of Data Nodes

911

3.4 Evaluation of Temperature Models

912

4 Thermal Management Strategies

913

4.1 Task Scheduling

914

4.2 Predictive Thermal-Aware Data Transmission

917

5 Results

919

5.1 Task Scheduling

919

5.1.1 CPU-Intensive Workload

920

5.1.2 I/O-Intensive Workloads

922

5.2 Predictive Thermal-Aware Management System

922

6 Conclusion

926

References

927

Modeling and Simulation of Data Center Networks

931

1 Data Centers and Cloud Computing

931

2 DCN Architectures

933

3 DCN Graph Modeling

935

3.1 ThreeTier DCN Model

936

3.2 FatTree DCN Model

937

3.3 DCell DCN Model

938

4 DCNs Implementation in ns-3

939

4.1 ThreeTier DCN Implementation Details

939

4.2 FatTree DCN Implementation Details

940

4.3 DCell DCN Implementation Details

942

References

944

Part VI Security

945

C2Hunter: Detection and Mitigation of Covert Channels in Data Centers

946

1 Introduction

946

2 Background

949

3 Threat Model, Scenarios and Assumptions

950

3.1 Threat of Data Center

950

3.2 Threat Categories of Covert Channels

951

3.3 Threat Scenarios of Covert Channels

952

3.4 Assumptions

953

4 Overview of C2Hunter

953

4.1 Challenges

953

4.2 Formal Requirements

954

4.3 C2Hunter Framework Summary

954

4.4 Covert Channel Modeling

956

5 Two-Phase Synthesis Detection Algorithm

958

5.1 Markov Detection Algorithm

959

5.2 Bayesian Detection Algorithm

962

6 Mitigation Algorithm

963

7 Implementation and Evaluation

964

7.1 Covert Channels Scenarios

965

7.2 Captor and Detector

966

7.3 Interrupter in Hypervisor

967

7.4 Experimental Settings

967

7.5 Detection Analysis

969

7.6 Mitigation Analysis

972

8 Discussion

974

9 Related Work

976

10 Conclusion

977

References

978

Selective and Private Access to Outsourced Data Centers

982

1 Introduction

982

2 Access Control Enforcement

984

2.1 Selective Encryption

984

2.2 Updates to the Access Control Policy

988

2.3 Write Privileges

992

2.4 Attribute-Based Encryption

994

3 Efficient Access to Encrypted Data

995

4 Protecting Access Privacy

998

4.1 Oblivious RAM

999

4.2 Dynamically Allocated Data Structures

1000

4.3 Shuffle Index

1002

5 Combining Access Control and Indexing Techniques

1007

6 Conclusions

1010

References

1010

Privacy in Data Centers: A Survey of Attacks and Countermeasures

1013

1 Introduction

1013

2 Privacy

1015

3 Privacy Enhancing Technologies

1016

4 Anonymous Communications

1017

5 Mix Networks

1019

6 Traffic Analysis

1019

7 Mix Systems Attacks

1020

8 The Disclosure Attack

1020

9 The Statistical Disclosure Attack (SDA)

1021

10 Extending and Resisting Statistical Disclosure

1022

11 Two Sided Statistical Disclosure Attack (TS-SDA)

1022

12 Perfect Matching Disclosure Attack (PMDA)

1023

13 Vida: How to Use Bayesian Inference to De-anonymize Persistent Communications

1024

14 SDA with Two Heads (SDA-2H)

1024

15 Conclusions

1025

References

1025

Part VII Data Services

1028

Quality-of-Service in Data Center Stream Processing for Smart City Applications

1029

1 Introduction

1029

2 Distributed Stream Processing Systems

1030

2.1 Abstract Model

1031

2.2 Development Model

1033

2.3 Execution Model

1034

3 Platforms for Distributed Stream Processing

1036

3.1 IBM InfoSphere Streams

1036

3.2 Apache S4

1037

3.3 Storm

1038

4 QoS-Aware Stream Processing

1039

5 Quasit

1041

5.1 Quasit Abstract Model

1042

5.2 Quasit Development Model

1043

5.3 Quasit Execution Model

1048

6 Load-Adaptive Active Replication (LAAR)

1049

7 Conclusions

1054

References

1055

Opportunistic Databank: A context Aware on-the-fly Data Center for Mobile Networks

1059

1 Introduction

1059

2 Data Replication in Manets---A Brief Overview

1062

3 Data Replication in DTNs

1064

3.1 System Model

1065

3.2 Hybrid Scheme for Message Replication (HSM) for DTNs

1067

3.3 Empirical Setups and Results

1069

3.3.1 Performance Metrics

1070

3.3.2 Related DTN Replication Schemes

1071

3.3.3 Simulation Results

1072

4 Conclusions

1074

References

1074

Data Management: State-of-the-Practice at Open-Science Data Centers

1077

1 Introduction

1077

2 Data Storage Infrastructure

1079

2.1 Data Storage Media

1079

2.2 General Architecture of a Data Storage System

1080

2.3 Supporting Databases for Structured and Semi-Structured Datasets

1080

2.4 Examples of Notable Storage Systems at Open-Science Data Centers

1081

3 Data Movement

1082

3.1 Parallel File-System Associated with Computational Resources---Secondary Storage

1082

3.2 Optimizing Data Movement in Context of Secondary Storage System

1085

3.3 Optimizing Data Movement in Context of Tertiary Storage System

1086

4 Data Archiving

1087

5 Data Preservation

1088

6 Conclusion

1089

References

1089

Data Summarization Techniques for Big Data---A Survey

1091

1 Introduction

1091

2 Applications of Data Summarization

1093

3 Clustering Algorithms

1095

3.1 Background

1095

3.2 Hierarchical Clustering

1097

3.3 Partitioning Clustering

1101

3.4 Density-Based Clustering Algorithms

1103

3.5 Grid-Based Clustering Algorithms

1105

4 Sampling

1107

4.1 Probability Sampling

1108

4.2 Non-Probabilistic Sampling

1114

5 Compression

1115

6 Wavelets

1120

7 Histograms

1123

8 Micro-Clustering

1125

9 Conclusion

1126

References

1126

Part VIII Monitoring

1135

Central Management of Datacenters

1136

1 Introduction

1136

2 Organization of the Chapter

1137

2.1 Management Layer Network

1137

2.2 Provisioning of Servers

1139

2.2.1 Reason to Use Provisioning Servers

1139

2.3 Platform Configuration Management System

1140

2.4 Resource Utilization Monitoring

1140

2.5 Alerting and Alarming System

1142

2.6 Central Logging System

1142

2.6.1 Security Information Event Management

1144

2.7 Intrusion Detection and Prevention System

1144

2.7.1 Types of Intrusion Detection System (IDS)

1145

Network-Based Intrusion Detection System (NIDS)

1146

Host-Based Intrusion Detection System (HIDS)

1146

2.7.2 How Intrusion Detection System Works?

1146

Anomaly-Based Intrusion Detection System

1146

Signature-Based Intrusion Detection System

1146

2.8 Datacenter Backup and Restore

1147

2.8.1 The Components of Data Backup and Recovery

1148

Cold and Hot Backup

1148

Enterprise Backup and Restore Software

1148

Online and Offline Storage

1148

2.9 Security Management Systems

1149

3 Conclusion

1149

References

1151

Monitoring of Data Centers using Wireless Sensor Networks

1152

1 Introduction

1152

2 Survey Study

1155

3 Conclusion

1163

References

1163

Network Intrusion Detection Systems in Data Centers

1165

1 Introduction

1165

2 Origin and Standardization

1170

3 Architecture

1172

4 Subjects of Study

1175

5 Detection Strategies

1177

6 Alert Correlation

1181

7 Summary

1183

References

1184

Software Monitoring in Data Centers

1188

1 Introduction

1188

1.1 Performance Degradation

1189

1.2 Function Failure

1190

1.3 Energy Conservation

1191

2 Monitoring Content

1192

2.1 Basic Software

1193

2.2 Middleware

1193

2.3 Database

1194

2.4 Application Software

1194

2.5 PM (Physical Machine) and VM (Virtual Machine)

1196

2.6 User Behavior Analysis

1198

2.7 Hot-Spot Evaluation

1198

2.8 Performance Prediction and Advanced Warning

1200

2.9 The Performance Bottlenecks Analysis

1201

3 Monitoring Timing

1202

3.1 Resource-Oriented Monitoring

1202

3.2 Business-Oriented Monitoring

1205

4 Participators

1207

4.1 Resource Managers

1207

4.2 Service Operators

1208

4.3 Data Owner

1209

4.4 Software Developers

1209

5 Monitoring Site

1210

5.1 On-Site Monitor

1211

5.2 Off-Site Monitor

1211

6 Monitoring Methods

1212

6.1 Visualization Monitoring

1212

6.2 Hot-Spot Evaluation

1214

6.3 Performance Prediction

1220

6.4 Analyzing User's Habits

1226

6.5 Tools

1227

References

1229

Part IX Resource Management

1233

Usage Patterns in Multi-tenant Data Centers: a Large-Case Field Study

1234

1 Introduction

1234

2 Multi-tenant Datacenters

1236

2.1 Evolution of Resource Demands

1236

2.2 CPU Load Balancing

1237

2.3 The Impact of Time Scales

1240

3 Summary

1242

References

1242

On Scheduling in Distributed Transactional Memory: Techniques and Tradeoffs

1244

1 Introduction

1244

2 Preliminaries and System Model

1246

2.1 Distributed Transactions

1246

2.2 Definitions

1247

2.3 Transactional Scheduler

1247

3 Bi-interval

1248

3.1 Motivation

1248

3.2 Scheduler Design

1249

3.3 Analysis

1250

3.4 Evaluation

1252

4 Cluster-Based Transactional Scheduler

1253

4.1 Motivation

1253

4.2 Scheduler Design

1254

4.3 Analysis

1256

4.4 Evaluation

1257

5 Summary and Conclusion

1258

References

1259

Dependability-Oriented Resource Management Schemes for Cloud Computing Data Centers

1261

1 Introduction

1261

2 System Model and Failure Behavior of Data Center Components

1262

2.1 Overview of the Data Center Architecture

1262

2.2 Failure Behavior of Servers

1263

2.3 Failure Behavior of Network Components

1264

2.4 Analysis of the Impact of Failures on Applications

1265

3 Resource Management in Data Center Environments

1266

3.1 Global Constraints

1268

3.2 Infrastructure-Oriented Constraints

1269

3.3 Application-Oriented Constraints

1270

4 Initial Allocation of Virtual Machines in Data Center Environments

1271

4.1 A Comprehensive Scheme for Virtual Machines Allocation

1271

4.2 Other Schemes for Virtual Machines Allocation

1273

5 Runtime Adaption of Virtual Machine Allocation in Data Center Environments

1275

5.1 Runtime Adaption to Balance Availability and Performance

1276

5.2 Other Schemes for Runtime Virtual Machines Allocation Adaption

1277

6 Conclusions

1279

References

1279

Resource Scheduling in Data-Centric Systems

1282

1 Introduction

1282

2 Terminology

1284

3 Classification and State-of-the-Art

1285

3.1 Hierarchy of Resource Scheduling in DCS

1285

3.2 Resource Provision

1287

3.2.1 Economic-Based Resource Provision

1287

3.2.2 SLA-Oriented Resource Provision

1288

3.2.3 Utility-Oriented Resource Provision

1288

3.3 Job Scheduling

1289

3.3.1 Static Job Scheduling

1290

3.3.2 Dynamic Job Scheduling

1290

3.4 Data Scheduling

1292

3.4.1 Online Data Scheduling

1293

3.4.2 Offline Data Scheduling

1293

4 Case Studies

1294

4.1 Amazon EC2

1294

4.2 Dawning Nebulae

1295

4.3 Taobao Yunti

1296

4.4 Microsoft SCOPE

1297

5 Future Trends and Challenges

1298

6 Conclusions

1299

References

1300

Index

1306