怎麼調試hdfs源碼_hadoop hdfs 源碼怎麼看

㈠如何在win7下的eclipse中調試Hadoop2.2.0的程序

在上一篇博文中，散仙已經講了Hadoop的單機偽分布的部署，本篇，散仙就說下，如何eclipse中調試hadoop2.2.0,如果你使用的還是hadoop1.x的版本，那麼，也沒事，散仙在以前的博客里，也寫過eclipse調試1.x的hadoop程序，兩者最大的不同之處在於使用的eclipse插件不同，hadoop2.x與hadoop1.x的API，不太一致，所以插件也不一樣，我們只需要使用分別對應的插件即可.

下面開始進入正題:

序號名稱描述

1 eclipse Juno Service Release 4.2的本

2 操作系統 Windows7

3 hadoop的eclipse插件 hadoop-eclipse-plugin-2.2.0.jar

4 hadoop的集群環境虛擬機Linux的Centos6.5單機偽分布式

5 調試程序 Hellow World

遇到的幾個問題如下：

java代碼

INFO-Configuration.warnOnceIfDeprecated(840)|mapred.job.trackerisdeprecated.Instead,usemaprece.jobtracker.address

模式：local

輸出路徑存在，已刪除！

INFO-Configuration.warnOnceIfDeprecated(840)|session.idisdeprecated.Instead,usedfs.metrics.session-id

INFO-JvmMetrics.init(76)|=JobTracker,sessionId=

WARN-JobSubmitter.AndConfigureFiles(149)|Hadoopcommand-lineoptionparsingnotperformed.dythis.

WARN-JobSubmitter.AndConfigureFiles(258)|Nojobjarfileset.Userclassesmaynotbefound.SeeJoborJob#setJar(String).

INFO-FileInputFormat.listStatus(287)|Totalinputpathstoprocess:1

INFO-JobSubmitter.submitJobInternal(394)|numberofsplits:1

INFO-Configuration.warnOnceIfDeprecated(840)|user.nameisdeprecated.Instead,usemaprece.job.user.name

INFO-Configuration.warnOnceIfDeprecated(840)|mapred.output.value.classisdeprecated.Instead,usemaprece.job.output.value.class

INFO-Configuration.warnOnceIfDeprecated(840)|mapred.mapoutput.value.classisdeprecated.Instead,usemaprece.map.output.value.class

INFO-Configuration.warnOnceIfDeprecated(840)|maprece.map.classisdeprecated.Instead,usemaprece.job.map.class

INFO-C

㈡ HDFS源碼解析(5)-replication

replication在HDFS中的地位極高，很多地方都用到了它。比如我們前面介紹的lease recovery，以及你通過 hdfs dfs -setrep -R 命令設置的replica數量，等等很多場景。

在這篇文章中，我們會介紹，NameNode如何指示DataNode進行replication。

我們先來看一幅流程圖:

NameNode中有一個專門的線程，在 BlockManager 中，叫做 ReplicationMonitor ，會檢測有沒有需要replication的block。

當看到有需要replication的block的時候，它會按照優先順序進行replication。

總共有五種優先順序。比如說，只有一個replica的block的replication的優先順序要比有兩個replica的block的replication優先順序更高。因為前者更容易丟失數據。具體是哪五種，請自行查看源碼。

然後，會選擇一個DataNode作為Source，即我們常說的 replication pipeline的Source，來進行replication。

在選擇Source時，會優先選擇那些處於 DECOMMISSION_INPROGRESS 狀態的DataNode，因為通常由於不會給這些節點分配寫請求，所以它們的負載更低。

然後，NameNode會選擇targets，即replication pipeline中的其他節點，根據我們熟悉的block分配策略。

然後，NameNode會把block replication放入到一個 pending 隊列中，這樣我們就可以進行失敗重試。

然後，通過heartbeat的 BlockCommand.TRANSFER command來告訴Source開始replication pipeline。

然後DataNode都會順著這個pipeline發送給下一個DataNode，並且受到下一個DataNode的ACK時，才會給前面的DataNode發送ACK。

㈢如何在win7下的eclipse中調試Hadoop2.2.0的程序

win7下調試Hadoop2.2.0程序的方法:
一、環境准備
1 eclipse Juno Service Release 4.2的本
2 操作系統 Windows7
3 hadoop的eclipse插件 hadoop-eclipse-plugin-2.2.0.jar
4 hadoop的集群環境虛擬機Linux的Centos6.5單機偽分布式
5 調試程序 Hellow World
二、注意事項：
異常如下：
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
解決辦法:
在org.apache.hadoop.util.Shell類的checkHadoopHome()方法的返回值里寫固定的
本機hadoop的路徑，在這里更改如下：
private static String checkHadoopHome() {

// first check the Dflag hadoop.home.dir with JVM scope
//System.setProperty("hadoop.home.dir", "...");
String home = System.getProperty("hadoop.home.dir");

// fall back to the system/user-global env variable
if (home == null) {
home = System.getenv("HADOOP_HOME");
}

try {
// couldn't find either setting for hadoop's home directory
if (home == null) {
throw new IOException("HADOOP_HOME or hadoop.home.dir are not set.");
}

if (home.startsWith("\"") && home.endsWith("\"")) {
home = home.substring(1, home.length()-1);
}

// check that the home setting is actually a directory that exists
File homedir = new File(home);
if (!homedir.isAbsolute() || !homedir.exists() || !homedir.isDirectory()) {
throw new IOException("Hadoop home directory " + homedir
+ " does not exist, is not a directory, or is not an absolute path.");
}

home = homedir.getCanonicalPath();

} catch (IOException ioe) {
if (LOG.isDebugEnabled()) {
LOG.debug("Failed to detect a valid hadoop home directory", ioe);
}
home = null;
}
//固定本機的hadoop地址
home="D:\\hadoop-2.2.0";
return home;
}
第二個異常，Could not locate executable D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe in the Hadoop binaries. 找不到win上的執行程序，可以去https://github.com/srccodes/hadoop-common-2.2.0-bin下載bin包，覆蓋本機的hadoop跟目錄下的bin包即可
第三個異常：
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://192.168.130.54:19000/user/hmail/output/part-00000, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at com.netease.hadoop.HDFSCatWithAPI.main(HDFSCatWithAPI.java:23)
出現這個異常，一般是HDFS的路徑寫的有問題，解決辦法，拷貝集群上的core-site.xml和hdfs-site.xml文件，放在eclipse的src根目錄下即可。

package com.qin.wordcount;

import java.io.IOException;

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.maprece.Job;
import org.apache.hadoop.maprece.Mapper;
import org.apache.hadoop.maprece.Recer;
import org.apache.hadoop.maprece.lib.input.FileInputFormat;
import org.apache.hadoop.maprece.lib.input.TextInputFormat;
import org.apache.hadoop.maprece.lib.output.FileOutputFormat;
import org.apache.hadoop.maprece.lib.output.TextOutputFormat;

/***
*
* Hadoop2.2.0測試
* 放WordCount的例子
*
* @author qindongliang
*
* hadoop技術交流群： 376932160
*
*
* */
public class MyWordCount {

/**
* Mapper
*
* **/
private static class WMapper extends Mapper<LongWritable, Text, Text, IntWritable>{

private IntWritable count=new IntWritable(1);
private Text text=new Text();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String values[]=value.toString().split("#");
//System.out.println(values[0]+"========"+values[1]);
count.set(Integer.parseInt(values[1]));
text.set(values[0]);
context.write(text,count);

}

}

/**
* Recer
*
* **/
private static class WRecer extends Recer<Text, IntWritable, Text, Text>{

private Text t=new Text();
@Override
protected void rece(Text key, Iterable<IntWritable> value,Context context)
throws IOException, InterruptedException {
int count=0;
for(IntWritable i:value){
count+=i.get();
}
t.set(count+"");
context.write(key,t);

}

}

/**
* 改動一
* (1)shell源碼里添加checkHadoopHome的路徑
* (2)974行，FileUtils裡面
* **/

public static void main(String[] args) throws Exception{

// String path1=System.getenv("HADOOP_HOME");
// System.out.println(path1);
// System.exit(0);

JobConf conf=new JobConf(MyWordCount.class);
//Configuration conf=new Configuration();
//conf.set("mapred.job.tracker","192.168.75.130:9001");
//讀取person中的數據欄位
// conf.setJar("tt.jar");
//注意這行代碼放在最前面，進行初始化，否則會報

/**Job任務**/
Job job=new Job(conf, "testwordcount");
job.setJarByClass(MyWordCount.class);
System.out.println("模式： "+conf.get("mapred.job.tracker"));;
// job.setCombinerClass(PCombine.class);

// job.setNumReceTasks(3);//設置為3
job.setMapperClass(WMapper.class);
job.setRecerClass(WRecer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

String path="hdfs://192.168.46.28:9000/qin/output";
FileSystem fs=FileSystem.get(conf);
Path p=new Path(path);
if(fs.exists(p)){
fs.delete(p, true);
System.out.println("輸出路徑存在，已刪除！");
}
FileInputFormat.setInputPaths(job, "hdfs://192.168.46.28:9000/qin/input");
FileOutputFormat.setOutputPath(job,p );
System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

㈣如何在win7下的eclipse中調試Hadoop2.2.0的程序

下面開始進入正題:

序號名稱描述

1 eclipse Juno Service Release 4.2的本

2 操作系統 Windows7

3 hadoop的eclipse插件 hadoop-eclipse-plugin-2.2.0.jar

4 hadoop的集群環境虛擬機Linux的Centos6.5單機偽分布式

5 調試程序 Hellow World
遇到的幾個問題如下：
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

解決辦法:

在org.apache.hadoop.util.Shell類的checkHadoopHome()方法的返回值里寫固定的

本機hadoop的路徑，散仙在這里更改如下：

private static String checkHadoopHome() {

// first check the Dflag hadoop.home.dir with JVM scope
//System.setProperty("hadoop.home.dir", "...");
String home = System.getProperty("hadoop.home.dir");

// fall back to the system/user-global env variable
if (home == null) {
home = System.getenv("HADOOP_HOME");
}

try {
// couldn't find either setting for hadoop's home directory
if (home == null) {
throw new IOException("HADOOP_HOME or hadoop.home.dir are not set.");
}

if (home.startsWith("\"") && home.endsWith("\"")) {
home = home.substring(1, home.length()-1);
}

// check that the home setting is actually a directory that exists
File homedir = new File(home);
if (!homedir.isAbsolute() || !homedir.exists() || !homedir.isDirectory()) {
throw new IOException("Hadoop home directory " + homedir
+ " does not exist, is not a directory, or is not an absolute path.");
}

home = homedir.getCanonicalPath();

} catch (IOException ioe) {
if (LOG.isDebugEnabled()) {
LOG.debug("Failed to detect a valid hadoop home directory", ioe);
}
home = null;
}
//固定本機的hadoop地址
home="D:\\hadoop-2.2.0";
return home;
}

第二個異常，Could not locate executable D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe in the Hadoop binaries. 找不到win上的執行程序，可以去https://github.com/srccodes/hadoop-common-2.2.0-bin下載bin包，覆蓋本機的hadoop跟目錄下的bin包即可

第三個異常：
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://192.168.130.54:19000/user/hmail/output/part-00000, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at com.netease.hadoop.HDFSCatWithAPI.main(HDFSCatWithAPI.java:23)

出現這個異常，一般是HDFS的路徑寫的有問題，解決辦法，拷貝集群上的core-site.xml和hdfs-site.xml文件，放在eclipse的src根目錄下即可。

第四個異常：

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

出現這個異常，一般是由於HADOOP_HOME的環境變數配置的有問題，在這里散仙特別說明一下，如果想在Win上的eclipse中成功調試Hadoop2.2，就需要在本機的環境變數上，添加如下的環境變數：

（1）在系統變數中，新建HADOOP_HOME變數，屬性值為D:\hadoop-2.2.0.也就是本機對應的hadoop目錄

(2)在系統變數的Path里，追加%HADOOP_HOME%/bin即可

以上的問題，是散仙在測試遇到的，經過對症下葯，我們的eclipse終於可以成功的調試MR程序了，散仙這里的Hellow World源碼如下：

package com.qin.wordcount;

import java.io.IOException;

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.maprece.Job;
import org.apache.hadoop.maprece.Mapper;
import org.apache.hadoop.maprece.Recer;
import org.apache.hadoop.maprece.lib.input.FileInputFormat;
import org.apache.hadoop.maprece.lib.input.TextInputFormat;
import org.apache.hadoop.maprece.lib.output.FileOutputFormat;
import org.apache.hadoop.maprece.lib.output.TextOutputFormat;

/***
*
* Hadoop2.2.0測試
* 放WordCount的例子
*
* @author qindongliang
*
* hadoop技術交流群： 376932160
*
*
* */
public class MyWordCount {

/**
* Mapper
*
* **/
private static class WMapper extends Mapper<LongWritable, Text, Text, IntWritable>{

private IntWritable count=new IntWritable(1);
private Text text=new Text();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String values[]=value.toString().split("#");
//System.out.println(values[0]+"========"+values[1]);
count.set(Integer.parseInt(values[1]));
text.set(values[0]);
context.write(text,count);

}

}

/**
* Recer
*
* **/
private static class WRecer extends Recer<Text, IntWritable, Text, Text>{

private Text t=new Text();
@Override
protected void rece(Text key, Iterable<IntWritable> value,Context context)
throws IOException, InterruptedException {
int count=0;
for(IntWritable i:value){
count+=i.get();
}
t.set(count+"");
context.write(key,t);

}

}

/**
* 改動一
* (1)shell源碼里添加checkHadoopHome的路徑
* (2)974行，FileUtils裡面
* **/

public static void main(String[] args) throws Exception{

// String path1=System.getenv("HADOOP_HOME");
// System.out.println(path1);
// System.exit(0);

JobConf conf=new JobConf(MyWordCount.class);
//Configuration conf=new Configuration();
//conf.set("mapred.job.tracker","192.168.75.130:9001");
//讀取person中的數據欄位
// conf.setJar("tt.jar");
//注意這行代碼放在最前面，進行初始化，否則會報

/**Job任務**/
Job job=new Job(conf, "testwordcount");
job.setJarByClass(MyWordCount.class);
System.out.println("模式： "+conf.get("mapred.job.tracker"));;
// job.setCombinerClass(PCombine.class);

// job.setNumReceTasks(3);//設置為3
job.setMapperClass(WMapper.class);
job.setRecerClass(WRecer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

String path="hdfs://192.168.46.28:9000/qin/output";
FileSystem fs=FileSystem.get(conf);
Path p=new Path(path);
if(fs.exists(p)){
fs.delete(p, true);
System.out.println("輸出路徑存在，已刪除！");
}
FileInputFormat.setInputPaths(job, "hdfs://192.168.46.28:9000/qin/input");
FileOutputFormat.setOutputPath(job,p );
System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

控制台，列印日誌如下：
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.job.tracker is deprecated. Instead, use maprece.jobtracker.address
模式： local
輸出路徑存在，已刪除！
INFO - Configuration.warnOnceIfDeprecated(840) | session.id is deprecated. Instead, use dfs.metrics.session-id
INFO - JvmMetrics.init(76) | Initializing JVM Metrics with processName=JobTracker, sessionId=
WARN - JobSubmitter.AndConfigureFiles(149) | Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN - JobSubmitter.AndConfigureFiles(258) | No job jar file set. User classes may not be found. See Job or Job#setJar(String).
INFO - FileInputFormat.listStatus(287) | Total input paths to process : 1
INFO - JobSubmitter.submitJobInternal(394) | number of splits:1
INFO - Configuration.warnOnceIfDeprecated(840) | user.name is deprecated. Instead, use maprece.job.user.name
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.output.value.class is deprecated. Instead, use maprece.job.output.value.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.mapoutput.value.class is deprecated. Instead, use maprece.map.output.value.class
INFO - Configuration.warnOnceIfDeprecated(840) | maprece.map.class is deprecated. Instead, use maprece.job.map.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.job.name is deprecated. Instead, use maprece.job.name
INFO - Configuration.warnOnceIfDeprecated(840) | maprece.rece.class is deprecated. Instead, use maprece.job.rece.class
INFO - Configuration.warnOnceIfDeprecated(840) | maprece.inputformat.class is deprecated. Instead, use maprece.job.inputformat.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.input.dir is deprecated. Instead, use maprece.input.fileinputformat.inputdir
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.output.dir is deprecated. Instead, use maprece.output.fileoutputformat.outputdir
INFO - Configuration.warnOnceIfDeprecated(840) | maprece.outputformat.class is deprecated. Instead, use maprece.job.outputformat.class
File System Counters
FILE: Number of bytes read=372
FILE: Number of bytes written=382174
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=76
HDFS: Number of bytes written=27
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Map-Rece Framework
Map input records=4
Map output records=4
Map output bytes=44
Map output materialized bytes=58
Input split bytes=109
Combine input records=0
Combine output records=0
Rece input groups=3
Rece shuffle bytes=0
Rece input records=4
Rece output records=3
Spilled Records=8
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=532938752
File Input Format Counters
Bytes Read=38
File Output Format Counters
Bytes Written=27

輸入的測試數據如下：

中國#1
美國#2
英國#3
中國#2

輸出的結果如下：

中國 3
美國 2
英國 3

至此，已經成功的在eclipse里遠程調試hadoop成功

㈤ hadoop hdfs 源碼怎麼看

在使用Hadoop的過程中，很容易通過FileSystem類的API來讀取HDFS中的文件內容，讀取內容的過程是怎樣的呢？今天來分析客戶端讀取HDFS文件的過程，下面的一個小程序完成的功能是讀取HDFS中某個目錄下的文件內容，然後輸出到控制台，代碼如下：

[java] view plain
public class LoadDataFromHDFS {
public static void main(String[] args) throws IOException {
new LoadDataFromHDFS().loadFromHdfs("hdfs://localhost:9000/user/wordcount/");
}

public void loadFromHdfs(String hdfsPath) throws IOException {
Configuration conf = new Configuration();

Path hdfs = new Path(hdfsPath);

FileSystem in = FileSystem.get(conf);
//in = FileSystem.get(URI.create(hdfsPath), conf);//這兩行都會創建一個DistributedFileSystem對象

FileStatus[] status = in.listStatus(hdfs);
for(int i = 0; i < status.length; i++) {
byte[] buff = new byte[1024];
FSDataInputStream inputStream = in.open(status[i].getPath());
while(inputStream.read(buff) > 0) {
System.out.print(new String(buff));
}
inputStream.close();
}
}
}

FileSystem in = FileSystem.get(conf)這行代碼創建一個DistributedFileSystem，如果直接傳入一個Configuration類型的參數，那麼默認會讀取屬性fs.default.name的值，根據這個屬性的值創建對應的FileSystem子類對象，如果沒有配置fs.default.name屬性的值，那麼默認創建一個org.apache.hadoop.fs.LocalFileSystem類型的對象。但是這里是要讀取HDFS中的文件，所以在core-site.xml文件中配置fs.default.name屬性的值為hdfs://localhost:9000，這樣FileSystem.get(conf)返回的才是一個DistributedFileSystem類的對象。還有一種創建DistributedFileSystem這種指定文件系統類型對像的方法是使用FileSystem.get(Configuration conf)的一個重載方法FileSystem.get(URI uri, Configuration)，其實調用第一個方法時在FileSystem類中先讀取conf中的屬性fs.default.name的值，再調用的FileSystem.get(URI uri, Configuration)方法。

導航:首頁 > 源碼編譯 > 怎麼調試hdfs源碼

怎麼調試hdfs源碼

與怎麼調試hdfs源碼相關的資料