Hadoop CDH:关于HDFS的文件存储
创建目录:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
[root@elephant ~]# id hdfs uid=989(hdfs) gid=985(hdfs) groups=985(hdfs),987(hadoop) [root@elephant ~]# [root@elephant ~]# su - hdfs [hdfs@elephant ~]$ [hdfs@elephant ~]$ hdfs dfs -ls / Found 3 items drwxrwxrwt - hdfs supergroup 0 2017-09-10 17:32 /tmp drwxr-xr-x - hdfs supergroup 0 2017-09-10 01:19 /user drwxr-xr-x - hdfs supergroup 0 2017-09-11 17:32 /usr [hdfs@elephant ~]$ [hdfs@elephant ~]$ hdfs dfs -ls /user Found 1 items drwxrwxrwx - mapred hadoop 0 2017-09-10 01:36 /user/history [hdfs@elephant ~]$ [hdfs@elephant ~]$ hdfs dfs -mkdir /user/training [hdfs@elephant ~]$ [hdfs@elephant ~]$ hdfs dfs -ls /user Found 2 items drwxrwxrwx - mapred hadoop 0 2017-09-10 01:36 /user/history drwxr-xr-x - hdfs supergroup 0 2017-09-11 17:34 /user/training [hdfs@elephant ~]$ [hdfs@elephant ~]$ hdfs dfs -chown training /user/training [hdfs@elephant ~]$ [hdfs@elephant ~]$ hdfs dfs -ls /user Found 2 items drwxrwxrwx - mapred hadoop 0 2017-09-10 01:36 /user/history drwxr-xr-x - training supergroup 0 2017-09-11 17:34 /user/training [hdfs@elephant ~]$ [hdfs@elephant ~]$ exit logout [root@elephant ~]# [root@elephant ~]# su - training Last login: Sat Sep 9 12:53:39 CST 2017 from connecttocluster on pts/0 [training@elephant ~]$ [training@elephant ~]$ hdfs dfs -mkdir weblog [training@elephant ~]$ [training@elephant ~]$ hdfs dfs -ls Found 1 items drwxr-xr-x - training supergroup 0 2017-09-11 18:17 weblog [training@elephant ~]$ [training@elephant ~]$ hdfs dfs -ls /user/training Found 1 items drwxr-xr-x - training supergroup 0 2017-09-11 18:17 /user/training/weblog [training@elephant ~]$ |
NameNode UI:
节点【elephant】
http://10.158.1.97:50070
在NameNode访问文件目录:
可以看到,当前没有任何文件。
解压【access_log】:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
[training@elephant ~]$ hdfs dfs -ls Found 1 items drwxr-xr-x - training supergroup 0 2017-09-11 18:17 weblog [training@elephant ~]$ [training@elephant ~]$ hdfs dfs -ls weblog [training@elephant ~]$ [training@elephant ~]$ cd ~/training_materials/admin/data/ [training@elephant data]$ ls -ltr | grep access -rw-r--r-- 1 training training 54547198 Sep 9 10:46 access_log.gz [training@elephant data]$ [training@elephant data]$ gunzip access_log.gz Message from syslogd@elephant at Sep 11 18:34:15 ... kernel:BUG: soft lockup - CPU#1 stuck for 21s! [kswapd0:32] Message from syslogd@elephant at Sep 11 18:34:57 ... kernel:BUG: soft lockup - CPU#1 stuck for 23s! [kswapd0:32] Message from syslogd@elephant at Sep 11 18:35:24 ... kernel:BUG: soft lockup - CPU#1 stuck for 24s! [java:21267] [training@elephant data]$ [training@elephant data]$ [training@elephant data]$ ls -ltr | grep --color access -rw-r--r-- 1 training training 504941532 Sep 9 10:46 access_log [training@elephant data]$ [training@elephant data]$ du -sh access_log 482M access_log [training@elephant data]$ |
当前HADOOP,我们一个块的大小是:128 MB:
(图:HADOOP一个BLOCK SIZE的大小)
也就是access_logs会占用4个数据块,最后一个数据块空余30M。
上传【access_log】:
上传时可以查看后台日志:
1. tail -f /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-elephant.log.out
2. tail -f /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-elephant.log.out
1 2 |
[training@elephant data]$ hdfs dfs -put access_log weblog [training@elephant data]$ |
1 2 3 4 |
[training@elephant data]$ hdfs dfs -ls weblog Found 1 items -rw-r--r-- 3 training supergroup 504941532 2017-09-11 19:02 weblog/access_log [training@elephant data]$ |
【NameNode】网页中查看一下:
可以看到,确实生成了四个数据块,并且各自有不同的BLOCK ID。
这里,我们使用BLOCK 0 的 BLOCK ID做接下来的事情:
1073744276
看看数据块到底在哪里:
1 2 3 4 5 |
[root@elephant ~]# find /dfs/dn -name '*1073744276*' -ls 76260482 1028 -rw-r--r-- 1 hdfs hdfs 1048583 Sep 11 18:55 /dfs/dn/current/BP-669392105-10.158.1.97-1504977425108/current/finalized/subdir0/subdir9/blk_1073744276_3453.meta 76260477 131072 -rw-r--r-- 1 hdfs hdfs 134217728 Sep 11 18:55 /dfs/dn/current/BP-669392105-10.158.1.97-1504977425108/current/finalized/subdir0/subdir9/blk_1073744276 [root@elephant ~]# [root@elephant ~]# |
它们的大小:
1 2 3 4 5 6 |
[root@elephant ~]# du -sh /dfs/dn/current/BP-669392105-10.158.1.97-1504977425108/current/finalized/subdir0/subdir9/blk_1073744276_3453.meta 1.1M /dfs/dn/current/BP-669392105-10.158.1.97-1504977425108/current/finalized/subdir0/subdir9/blk_1073744276_3453.meta [root@elephant ~]# [root@elephant ~]# du -sh /dfs/dn/current/BP-669392105-10.158.1.97-1504977425108/current/finalized/subdir0/subdir9/blk_1073744276 128M /dfs/dn/current/BP-669392105-10.158.1.97-1504977425108/current/finalized/subdir0/subdir9/blk_1073744276 [root@elephant ~]# |
比较文件头:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
[root@elephant ~]# head /dfs/dn/current/BP-669392105-10.158.1.97-1504977425108/current/finalized/subdir0/subdir9/blk_1073744276 10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET / HTTP/1.1" 403 202 10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET / HTTP/1.1" 200 9157 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/js/lowpro.js HTTP/1.1" 200 10469 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/reset.css HTTP/1.1" 200 1014 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/960.css HTTP/1.1" 200 6206 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/the-associates.css HTTP/1.1" 200 15779 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/js/the-associates.js HTTP/1.1" 200 4492 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/js/lightbox.js HTTP/1.1" 200 25960 10.223.157.186 - - [15/Jul/2009:15:50:36 -0700] "GET /assets/img/search-button.gif HTTP/1.1" 200 168 [root@elephant ~]# [root@elephant ~]# su - training Last login: Mon Sep 11 18:17:15 CST 2017 on pts/0 [training@elephant ~]$ hdfs dfs -cat weblog/access_log | head - 10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET / HTTP/1.1" 403 202 10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET / HTTP/1.1" 200 9157 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/js/lowpro.js HTTP/1.1" 200 10469 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/reset.css HTTP/1.1" 200 1014 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/960.css HTTP/1.1" 200 6206 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/the-associates.css HTTP/1.1" 200 15779 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/js/the-associates.js HTTP/1.1" 200 4492 10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/js/lightbox.js HTTP/1.1" 200 25960 10.223.157.186 - - [15/Jul/2009:15:50:36 -0700] "GET /assets/img/search-button.gif HTTP/1.1" 200 168 cat: Unable to write to output stream. [training@elephant ~]$ |
Datanode Information:
http://10.158.1.97:50070/dfshealth.html#tab-datanode
————————————
Done。