Subscribe For Free Updates!

We'll not spam mate! We promise.

Saturday, April 9, 2011

Hadoop


What is hadoop?

Open source software for reliable, scalable, and distributed computing.

Flexible infrastructure for large scale computation and data processing on a network of commodity hardware.

The Linux of distributed processing.

Why hadoop?

Very large distributed file system.

The data is distributed across data nodes .

Reliability and availability.

Files are replicated to handle hardware failure.

Detects failures and recovers from them.

Ability to run on cheap hardware.

Open source flexibility.

Runs on heterogeneous OS.

Scalability.

The number of nodes in a cluster is not constant.

Parallel processing through MapReduce.

Main components:

HDFS(Hadoop file system) for storing.

The Map-Reduce programming model for processing.

Hadoop Distributed File System (HDFS)

A distributed file system based on GFS, as its shared filesystem.

Distributed across data servers.

Data files partitioned into large chunks (64MB), replicated on multiple data nodes.

NameNode stores metadata information (block locations, directory structure).

Map-Reduce:

Framework for distributed processing of large data sets.

Please Give Us Your 1 Minute In Sharing This Post!
SOCIALIZE IT →
FOLLOW US →
SHARE IT →
Powered By: BloggerYard.Com

0 comments:

Post a Comment