㈠ 如何在win7下的eclipse中调试Hadoop2.2.0的程序
在上一篇博文中,散仙已经讲了Hadoop的单机伪分布的部署,本篇,散仙就说下,如何eclipse中调试hadoop2.2.0,如果你使用的还是hadoop1.x的版本,那么,也没事,散仙在以前的博客里,也写过eclipse调试1.x的hadoop程序,两者最大的不同之处在于使用的eclipse插件不同,hadoop2.x与hadoop1.x的API,不太一致,所以插件也不一样,我们只需要使用分别对应的插件即可.
下面开始进入正题:
序号 名称 描述
1 eclipse Juno Service Release 4.2的本
2 操作系统 Windows7
3 hadoop的eclipse插件 hadoop-eclipse-plugin-2.2.0.jar
4 hadoop的集群环境 虚拟机Linux的Centos6.5单机伪分布式
5 调试程序 Hellow World
遇到的几个问题如下:
java代码
INFO-Configuration.warnOnceIfDeprecated(840)|mapred.job.trackerisdeprecated.Instead,usemaprece.jobtracker.address
模式:local
输出路径存在,已删除!
INFO-Configuration.warnOnceIfDeprecated(840)|session.idisdeprecated.Instead,usedfs.metrics.session-id
INFO-JvmMetrics.init(76)|=JobTracker,sessionId=
WARN-JobSubmitter.AndConfigureFiles(149)|Hadoopcommand-lineoptionparsingnotperformed.dythis.
WARN-JobSubmitter.AndConfigureFiles(258)|Nojobjarfileset.Userclassesmaynotbefound.SeeJoborJob#setJar(String).
INFO-FileInputFormat.listStatus(287)|Totalinputpathstoprocess:1
INFO-JobSubmitter.submitJobInternal(394)|numberofsplits:1
INFO-Configuration.warnOnceIfDeprecated(840)|user.nameisdeprecated.Instead,usemaprece.job.user.name
INFO-Configuration.warnOnceIfDeprecated(840)|mapred.output.value.classisdeprecated.Instead,usemaprece.job.output.value.class
INFO-Configuration.warnOnceIfDeprecated(840)|mapred.mapoutput.value.classisdeprecated.Instead,usemaprece.map.output.value.class
INFO-Configuration.warnOnceIfDeprecated(840)|maprece.map.classisdeprecated.Instead,usemaprece.job.map.class
INFO-C
㈡ HDFS源码解析(5)-replication
replication在HDFS中的地位极高,很多地方都用到了它。比如我们前面介绍的lease recovery,以及你通过 hdfs dfs -setrep -R 命令设置的replica数量,等等很多场景。
在这篇文章中,我们会介绍,NameNode如何指示DataNode进行replication。
我们先来看一幅流程图:
NameNode中有一个专门的线程,在 BlockManager 中,叫做 ReplicationMonitor ,会检测有没有需要replication的block。
当看到有需要replication的block的时候,它会按照优先级进行replication。
总共有五种优先级。比如说,只有一个replica的block的replication的优先级要比有两个replica的block的replication优先级更高。因为前者更容易丢失数据。具体是哪五种,请自行查看源码。
然后,会选择一个DataNode作为Source,即我们常说的 replication pipeline的Source,来进行replication。
在选择Source时,会优先选择那些处于 DECOMMISSION_INPROGRESS 状态的DataNode,因为通常由于不会给这些节点分配写请求,所以它们的负载更低。
然后,NameNode会选择targets,即replication pipeline中的其他节点,根据我们熟悉的block分配策略。
然后,NameNode会把block replication放入到一个 pending 队列中,这样我们就可以进行失败重试。
然后,通过heartbeat的 BlockCommand.TRANSFER command来告诉Source开始replication pipeline。
然后DataNode都会顺着这个pipeline发送给下一个DataNode,并且受到下一个DataNode的ACK时,才会给前面的DataNode发送ACK。
㈢ 如何在win7下的eclipse中调试Hadoop2.2.0的程序
win7下调试Hadoop2.2.0程序的方法:
一、环境准备
1 eclipse Juno Service Release 4.2的本
2 操作系统 Windows7
3 hadoop的eclipse插件 hadoop-eclipse-plugin-2.2.0.jar
4 hadoop的集群环境 虚拟机Linux的Centos6.5单机伪分布式
5 调试程序 Hellow World
二、注意事项:
异常如下:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
解决办法:
在org.apache.hadoop.util.Shell类的checkHadoopHome()方法的返回值里写固定的
本机hadoop的路径,在这里更改如下:
private static String checkHadoopHome() {
// first check the Dflag hadoop.home.dir with JVM scope
//System.setProperty("hadoop.home.dir", "...");
String home = System.getProperty("hadoop.home.dir");
// fall back to the system/user-global env variable
if (home == null) {
home = System.getenv("HADOOP_HOME");
}
try {
// couldn't find either setting for hadoop's home directory
if (home == null) {
throw new IOException("HADOOP_HOME or hadoop.home.dir are not set.");
}
if (home.startsWith("\"") && home.endsWith("\"")) {
home = home.substring(1, home.length()-1);
}
// check that the home setting is actually a directory that exists
File homedir = new File(home);
if (!homedir.isAbsolute() || !homedir.exists() || !homedir.isDirectory()) {
throw new IOException("Hadoop home directory " + homedir
+ " does not exist, is not a directory, or is not an absolute path.");
}
home = homedir.getCanonicalPath();
} catch (IOException ioe) {
if (LOG.isDebugEnabled()) {
LOG.debug("Failed to detect a valid hadoop home directory", ioe);
}
home = null;
}
//固定本机的hadoop地址
home="D:\\hadoop-2.2.0";
return home;
}
第二个异常,Could not locate executable D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe in the Hadoop binaries. 找不到win上的执行程序,可以去https://github.com/srccodes/hadoop-common-2.2.0-bin下载bin包,覆盖本机的hadoop跟目录下的bin包即可
第三个异常:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://192.168.130.54:19000/user/hmail/output/part-00000, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at com.netease.hadoop.HDFSCatWithAPI.main(HDFSCatWithAPI.java:23)
出现这个异常,一般是HDFS的路径写的有问题,解决办法,拷贝集群上的core-site.xml和hdfs-site.xml文件,放在eclipse的src根目录下即可。
package com.qin.wordcount;
import java.io.IOException;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.maprece.Job;
import org.apache.hadoop.maprece.Mapper;
import org.apache.hadoop.maprece.Recer;
import org.apache.hadoop.maprece.lib.input.FileInputFormat;
import org.apache.hadoop.maprece.lib.input.TextInputFormat;
import org.apache.hadoop.maprece.lib.output.FileOutputFormat;
import org.apache.hadoop.maprece.lib.output.TextOutputFormat;
/***
*
* Hadoop2.2.0测试
* 放WordCount的例子
*
* @author qindongliang
*
* hadoop技术交流群: 376932160
*
*
* */
public class MyWordCount {
/**
* Mapper
*
* **/
private static class WMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
private IntWritable count=new IntWritable(1);
private Text text=new Text();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String values[]=value.toString().split("#");
//System.out.println(values[0]+"========"+values[1]);
count.set(Integer.parseInt(values[1]));
text.set(values[0]);
context.write(text,count);
}
}
/**
* Recer
*
* **/
private static class WRecer extends Recer<Text, IntWritable, Text, Text>{
private Text t=new Text();
@Override
protected void rece(Text key, Iterable<IntWritable> value,Context context)
throws IOException, InterruptedException {
int count=0;
for(IntWritable i:value){
count+=i.get();
}
t.set(count+"");
context.write(key,t);
}
}
/**
* 改动一
* (1)shell源码里添加checkHadoopHome的路径
* (2)974行,FileUtils里面
* **/
public static void main(String[] args) throws Exception{
// String path1=System.getenv("HADOOP_HOME");
// System.out.println(path1);
// System.exit(0);
JobConf conf=new JobConf(MyWordCount.class);
//Configuration conf=new Configuration();
//conf.set("mapred.job.tracker","192.168.75.130:9001");
//读取person中的数据字段
// conf.setJar("tt.jar");
//注意这行代码放在最前面,进行初始化,否则会报
/**Job任务**/
Job job=new Job(conf, "testwordcount");
job.setJarByClass(MyWordCount.class);
System.out.println("模式: "+conf.get("mapred.job.tracker"));;
// job.setCombinerClass(PCombine.class);
// job.setNumReceTasks(3);//设置为3
job.setMapperClass(WMapper.class);
job.setRecerClass(WRecer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
String path="hdfs://192.168.46.28:9000/qin/output";
FileSystem fs=FileSystem.get(conf);
Path p=new Path(path);
if(fs.exists(p)){
fs.delete(p, true);
System.out.println("输出路径存在,已删除!");
}
FileInputFormat.setInputPaths(job, "hdfs://192.168.46.28:9000/qin/input");
FileOutputFormat.setOutputPath(job,p );
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
㈣ 如何在win7下的eclipse中调试Hadoop2.2.0的程序
下面开始进入正题:
序号 名称 描述
1 eclipse Juno Service Release 4.2的本
2 操作系统 Windows7
3 hadoop的eclipse插件 hadoop-eclipse-plugin-2.2.0.jar
4 hadoop的集群环境 虚拟机Linux的Centos6.5单机伪分布式
5 调试程序 Hellow World
遇到的几个问题如下:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
解决办法:
在org.apache.hadoop.util.Shell类的checkHadoopHome()方法的返回值里写固定的
本机hadoop的路径,散仙在这里更改如下:
private static String checkHadoopHome() {
// first check the Dflag hadoop.home.dir with JVM scope
//System.setProperty("hadoop.home.dir", "...");
String home = System.getProperty("hadoop.home.dir");
// fall back to the system/user-global env variable
if (home == null) {
home = System.getenv("HADOOP_HOME");
}
try {
// couldn't find either setting for hadoop's home directory
if (home == null) {
throw new IOException("HADOOP_HOME or hadoop.home.dir are not set.");
}
if (home.startsWith("\"") && home.endsWith("\"")) {
home = home.substring(1, home.length()-1);
}
// check that the home setting is actually a directory that exists
File homedir = new File(home);
if (!homedir.isAbsolute() || !homedir.exists() || !homedir.isDirectory()) {
throw new IOException("Hadoop home directory " + homedir
+ " does not exist, is not a directory, or is not an absolute path.");
}
home = homedir.getCanonicalPath();
} catch (IOException ioe) {
if (LOG.isDebugEnabled()) {
LOG.debug("Failed to detect a valid hadoop home directory", ioe);
}
home = null;
}
//固定本机的hadoop地址
home="D:\\hadoop-2.2.0";
return home;
}
第二个异常,Could not locate executable D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe in the Hadoop binaries. 找不到win上的执行程序,可以去https://github.com/srccodes/hadoop-common-2.2.0-bin下载bin包,覆盖本机的hadoop跟目录下的bin包即可
第三个异常:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://192.168.130.54:19000/user/hmail/output/part-00000, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at com.netease.hadoop.HDFSCatWithAPI.main(HDFSCatWithAPI.java:23)
出现这个异常,一般是HDFS的路径写的有问题,解决办法,拷贝集群上的core-site.xml和hdfs-site.xml文件,放在eclipse的src根目录下即可。
第四个异常:
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
出现这个异常,一般是由于HADOOP_HOME的环境变量配置的有问题,在这里散仙特别说明一下,如果想在Win上的eclipse中成功调试Hadoop2.2,就需要在本机的环境变量上,添加如下的环境变量:
(1)在系统变量中,新建HADOOP_HOME变量,属性值为D:\hadoop-2.2.0.也就是本机对应的hadoop目录
(2)在系统变量的Path里,追加%HADOOP_HOME%/bin即可
以上的问题,是散仙在测试遇到的,经过对症下药,我们的eclipse终于可以成功的调试MR程序了,散仙这里的Hellow World源码如下:
package com.qin.wordcount;
import java.io.IOException;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.maprece.Job;
import org.apache.hadoop.maprece.Mapper;
import org.apache.hadoop.maprece.Recer;
import org.apache.hadoop.maprece.lib.input.FileInputFormat;
import org.apache.hadoop.maprece.lib.input.TextInputFormat;
import org.apache.hadoop.maprece.lib.output.FileOutputFormat;
import org.apache.hadoop.maprece.lib.output.TextOutputFormat;
/***
*
* Hadoop2.2.0测试
* 放WordCount的例子
*
* @author qindongliang
*
* hadoop技术交流群: 376932160
*
*
* */
public class MyWordCount {
/**
* Mapper
*
* **/
private static class WMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
private IntWritable count=new IntWritable(1);
private Text text=new Text();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String values[]=value.toString().split("#");
//System.out.println(values[0]+"========"+values[1]);
count.set(Integer.parseInt(values[1]));
text.set(values[0]);
context.write(text,count);
}
}
/**
* Recer
*
* **/
private static class WRecer extends Recer<Text, IntWritable, Text, Text>{
private Text t=new Text();
@Override
protected void rece(Text key, Iterable<IntWritable> value,Context context)
throws IOException, InterruptedException {
int count=0;
for(IntWritable i:value){
count+=i.get();
}
t.set(count+"");
context.write(key,t);
}
}
/**
* 改动一
* (1)shell源码里添加checkHadoopHome的路径
* (2)974行,FileUtils里面
* **/
public static void main(String[] args) throws Exception{
// String path1=System.getenv("HADOOP_HOME");
// System.out.println(path1);
// System.exit(0);
JobConf conf=new JobConf(MyWordCount.class);
//Configuration conf=new Configuration();
//conf.set("mapred.job.tracker","192.168.75.130:9001");
//读取person中的数据字段
// conf.setJar("tt.jar");
//注意这行代码放在最前面,进行初始化,否则会报
/**Job任务**/
Job job=new Job(conf, "testwordcount");
job.setJarByClass(MyWordCount.class);
System.out.println("模式: "+conf.get("mapred.job.tracker"));;
// job.setCombinerClass(PCombine.class);
// job.setNumReceTasks(3);//设置为3
job.setMapperClass(WMapper.class);
job.setRecerClass(WRecer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
String path="hdfs://192.168.46.28:9000/qin/output";
FileSystem fs=FileSystem.get(conf);
Path p=new Path(path);
if(fs.exists(p)){
fs.delete(p, true);
System.out.println("输出路径存在,已删除!");
}
FileInputFormat.setInputPaths(job, "hdfs://192.168.46.28:9000/qin/input");
FileOutputFormat.setOutputPath(job,p );
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
控制台,打印日志如下:
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.job.tracker is deprecated. Instead, use maprece.jobtracker.address
模式: local
输出路径存在,已删除!
INFO - Configuration.warnOnceIfDeprecated(840) | session.id is deprecated. Instead, use dfs.metrics.session-id
INFO - JvmMetrics.init(76) | Initializing JVM Metrics with processName=JobTracker, sessionId=
WARN - JobSubmitter.AndConfigureFiles(149) | Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN - JobSubmitter.AndConfigureFiles(258) | No job jar file set. User classes may not be found. See Job or Job#setJar(String).
INFO - FileInputFormat.listStatus(287) | Total input paths to process : 1
INFO - JobSubmitter.submitJobInternal(394) | number of splits:1
INFO - Configuration.warnOnceIfDeprecated(840) | user.name is deprecated. Instead, use maprece.job.user.name
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.output.value.class is deprecated. Instead, use maprece.job.output.value.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.mapoutput.value.class is deprecated. Instead, use maprece.map.output.value.class
INFO - Configuration.warnOnceIfDeprecated(840) | maprece.map.class is deprecated. Instead, use maprece.job.map.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.job.name is deprecated. Instead, use maprece.job.name
INFO - Configuration.warnOnceIfDeprecated(840) | maprece.rece.class is deprecated. Instead, use maprece.job.rece.class
INFO - Configuration.warnOnceIfDeprecated(840) | maprece.inputformat.class is deprecated. Instead, use maprece.job.inputformat.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.input.dir is deprecated. Instead, use maprece.input.fileinputformat.inputdir
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.output.dir is deprecated. Instead, use maprece.output.fileoutputformat.outputdir
INFO - Configuration.warnOnceIfDeprecated(840) | maprece.outputformat.class is deprecated. Instead, use maprece.job.outputformat.class
File System Counters
FILE: Number of bytes read=372
FILE: Number of bytes written=382174
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=76
HDFS: Number of bytes written=27
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Map-Rece Framework
Map input records=4
Map output records=4
Map output bytes=44
Map output materialized bytes=58
Input split bytes=109
Combine input records=0
Combine output records=0
Rece input groups=3
Rece shuffle bytes=0
Rece input records=4
Rece output records=3
Spilled Records=8
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=532938752
File Input Format Counters
Bytes Read=38
File Output Format Counters
Bytes Written=27
输入的测试数据如下:
中国#1
美国#2
英国#3
中国#2
输出的结果如下:
中国 3
美国 2
英国 3
至此,已经成功的在eclipse里远程调试hadoop成功
㈤ hadoop hdfs 源码怎么看
在使用Hadoop的过程中,很容易通过FileSystem类的API来读取HDFS中的文件内容,读取内容的过程是怎样的呢?今天来分析客户端读取HDFS文件的过程,下面的一个小程序完成的功能是读取HDFS中某个目录下的文件内容,然后输出到控制台,代码如下:
[java] view plain
public class LoadDataFromHDFS {
public static void main(String[] args) throws IOException {
new LoadDataFromHDFS().loadFromHdfs("hdfs://localhost:9000/user/wordcount/");
}
public void loadFromHdfs(String hdfsPath) throws IOException {
Configuration conf = new Configuration();
Path hdfs = new Path(hdfsPath);
FileSystem in = FileSystem.get(conf);
//in = FileSystem.get(URI.create(hdfsPath), conf);//这两行都会创建一个DistributedFileSystem对象
FileStatus[] status = in.listStatus(hdfs);
for(int i = 0; i < status.length; i++) {
byte[] buff = new byte[1024];
FSDataInputStream inputStream = in.open(status[i].getPath());
while(inputStream.read(buff) > 0) {
System.out.print(new String(buff));
}
inputStream.close();
}
}
}
FileSystem in = FileSystem.get(conf)这行代码创建一个DistributedFileSystem,如果直接传入一个Configuration类型的参数,那么默认会读取属性fs.default.name的值,根据这个属性的值创建对应的FileSystem子类对象,如果没有配置fs.default.name属性的值,那么默认创建一个org.apache.hadoop.fs.LocalFileSystem类型的对象。但是这里是要读取HDFS中的文件,所以在core-site.xml文件中配置fs.default.name属性的值为hdfs://localhost:9000,这样FileSystem.get(conf)返回的才是一个DistributedFileSystem类的对象。 还有一种创建DistributedFileSystem这种指定文件系统类型对像的方法是使用FileSystem.get(Configuration conf)的一个重载方法FileSystem.get(URI uri, Configuration),其实调用第一个方法时在FileSystem类中先读取conf中的属性fs.default.name的值,再调用的FileSystem.get(URI uri, Configuration)方法。