Performance Parameter define / change
Parameters are defined in HDFS-site.xml.
Cloudera manager has friendly GUI for end-user to change the para, without going with xml file modification manually.
Start Cloudera manager:
On terminal, run: $ sudo /home/cloudera/cloudera-manager --express --force
Then, on firefox: access : quickstart.cloudera:7180/cmf/services/8/config
4 main parameters impact performances:
DFS Block size -- dfs.blocksize : default 64M. Impact directly the name node mamory usage and mumber of map tasks.
HDFS Replication -- dfs.replication : default 3. Reducing replication has a trade off with regards to robustness. It mitigates the failure and is achieved from perspectives below:
periodicaly heartbeat from data node to name node.
file's checksum stored in name node, to verify the re-read from other healthy nodes.
Number of handlers on each data node -- dfs.datanode.handler.count
Maximum number of blocks per file -- dfs.namenode.fs-limits.max-blocks-per-file