幻灯片1-SOURCEFORGE.ppt

资源描述

Team Of Programmers Parallel Programming Computing 2.0 By Barry.Cswords Team Of Programmers 并行计算的技术现状集群和分布式技术虚拟化 Map-Reduce 分布式散列存储演员模型和函数编程解决方案网格计算绿色计算云计算云计算 ? 设计模式 (关联底层设计 ) 虚拟资源池面向数据结构 ER模型 FP 典型方案负载均衡集群 Hadoop Core HBase， HDFS Erlang虚拟机面向资源事务、存储计算、存储计算存储计算资源利用率系统可扩展性高并发高吞吐海量存储复杂计算 (关联底层设计 ) 缺陷设计模式顶层化业务关联性强设计模式顶层化人工干预程度高设计模式底层化代码复杂设计模式底层化 PIG等还不成熟设计模式底层化代码晦涩 Team Of Programmers LISP对列表的处理先熟悉一下 LISP语法 MAP方法 FOLD方法 Scheme（ LISP的一种方言）用递归实现，但实际上都可以并行。 (define factorial (lambda (n) (if (= n 1) 1 (* n (factorial (- n 1) (factorial 6) 720 (map (lambda (x) (* x x) (1 2 3 4 5) (1 4 9 16 25) (fold + 0 (1 2 3 4 5) 15 (fold * 1 (1 2 3 4 5) 120 Team Of Programmers Map-Reduce (Hadoop) package org.myorg; import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class WordCount public static void main(String args) throws Exception /以这个类建立任务并起名 JobConf conf = new JobConf(WordCount.class); conf.setJobName(wordcount); /输出列表类型规约 conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); /Map-Reduce算法类规约 conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); /输入输出类型规约 conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); /设置输入、输出路径 FileInputFormat.setInputPaths(conf, new Path(args0); FileOutputFormat.setOutputPath(conf, new Path(args1); JobClient.runJob(conf); （统计计算一篇文章中不同词的个数） C+代码略简练使用管道上下文实现 Map和 Reduce关联使用工厂实现 IO和调用 Team Of Programmers Map-Reduce (Hadoop)续 public static class Map extends MapReduceBase implements Mapper private final static IntWritable one = new IntWritable(1);/封装个数 private Text word = new Text();/暂存每个词用 public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line);/将文章的每个词进行标记 while (tokenizer.hasMoreTokens() word.set(tokenizer.nextToken(); output.collect(word, one);/一个词算一个，以词为分配依据 Key，值 value为 1 ，置入输出 public static class Reduce extends MapReduceBase implements Reducer public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException int sum = 0; while (values.hasNext() sum += values.next().get();/多分到一个词 (key)就加一 (迭代器 values中的 value值 ) output.collect(key, new IntWritable(sum);/返回词，词的个数 Team Of Programmers PIG Latin (Hadoop) 熟悉一下 PIG的风格（查找最感兴趣页面） PIG实现 Map-Reduce（无实际含义）实际上只负责调度，但实现了一定的松耦合 Visits = load /data/visits as (user, url, time); Visits = foreach Visits generate user, Canonicalize(url), time; Pages = load /data/pages as (url, pagerank); VP = join Visits by url, Pages by url; UserVisits = group VP by user; Sessions = foreach UserVisits generate flatten(FindSessions(*); HappyEndings = filter Sessions by BestIsLast(*); store HappyEndings into /data/happy_endings; a = FOREACH input GENERATE flatten(Map(*); b = GROUP a BY $0; c = FOREACH b GENERATE Reduce(*); Team Of Programmers Actor Architecture package app.quickstart.hello; import aa.core.Actor; import aa.core.ActorName; import aa.core.CreateActorException; public class Hello extends Actor public Hello() System.out.print( Hello); try ActorName anWorld = create(app.quickstart.hello.World); send(anWorld, world); catch (CreateActorException e) System.out.println( Hello.Hello: + e); package app.quickstart.hello; import aa.core.Actor; public class World extends Actor public World() public void world() System.out.println( World!); （调用一个函数 world）内部定位固定、可移动的 Actor UAN: Universal Actor Name uan:/128.174.245.49:37 LAN: Location-based Actor Name lan:/128.174.244.147/128.174.245.49:37 这是一种值得学习的整体通信架构。 Team Of Programmers Thread实现的 MapReduce /by qinxiaoyuceclipse.org package org.ceclipse.cswords.mapreduce; import java.util.Iterator; import java.util.List; public abstract class MapReducer extends Thread protected List output=null; protected Iterator input=null; protected int received=0; public MapReducer(Iterator in,List out) this.input=in; this.output=out; public void run() int i=0;I v=null; while(this.input.hasNext() v=input.next(); Reducer r=this.map(v); r.index=i; r.start(); i+; public abstract Reducer map(I in); public abstract void finished(); public abstract class Reducer extends Thread protected I input=null; protected int index=0; public Reducer(I in) this.input=in; public void run() output.set(this.index, this.reduce(this.input); received+; if(received=output.size()finished(); public abstract O reduce(I in); 对简单运算效率低，但对复杂运算效果明显计算 40个斐波那契数的效率对比（有波动）：该算法： 3656MS For each： 5562MS Team Of Programmers 演员模型的 Map-Reduce (Erlang) -module(pmap). -export(pmap/2). pmap(F, L) - S = self(), Pids = lists:map(fun(I) - spawn(fun() - do_fun(S, F, I) end) end, L), gather(Pids). gather(FirstPID|OtherPIDs) - receive FirstPID, Result - Result|gather(OtherPIDs) end; gather() - . do_fun(Parent, F, I) - Parent ! self(), (catch F(I). -module(fib). -export(fib/1). fib(0) - 0; fib(1) - 1; fib(N) when N 1 - fib(N-1) + fib(N-2). Eshell L = lists:seq(0,35). Eshell lists:map(fun(X) - fib:fib(X) end, L). Eshell pmap:pmap(fun(X) - fib:fib(X) end, L). （计算斐波那契数列）注：左侧代码可以复用 pmap负责分发进程并建立 map存储 pid；每个进程中 do_fun向父进程发出自己的进程号和目标函数计算结果（ Reduce）； gather负责通过递归将每个进程号替换为子进程发出的函数计算结果。 Team Of Programmers 实验和观摩总结技术横向比较结果重点 Hadoop相关技术目前是最成熟的，使用 PIG很方便实现大数据处理；但有框架过重之嫌，代码移植、嵌入现有解决方案都很麻烦；在并非针对数据的并发中语义不明确； FP演员模型： Erlang表现优秀，处理并行、通信的语法简洁、明确，但整体语法晦涩；没有第二个类似的虚拟机，其它宿主语言还不成熟； Java原生方案演员模型 Kilim：调度器没有真正实现多核并行，仅仅实现了消息机制和纤程 (fiber)，无助于多核计算；易于实现代码、逻辑的移植，但并没有打破线程的思想，无助于设计； AA方案：架构清晰， Java风格明显，可以媲美 Erlang，效率未知，但还没有其上的开发模式，演员模型带来的差异性明显，需尝试进一步封装；线程 Map-Reduce方案：效率差，高并发的重量级应用可行。易行整体方案超多核虚拟机 +单机并行方案（演员模型？） Hadoop全套方案从海量存储、海量数据处理（搜索引擎？）开始需封装足够的算法（提供商？） Team Of Programmers 眼下存在的问题如何封装并行调度算法面向对象的语义问题函数语言的语法问题（以 Erlang为例）去除 spawn、 receive关键词去除模式匹配等方面的晦涩语法领域脚本语言的局限性 PIG LATIN和云数据库上的 SQL局限于面向存储的调度新的设计模式会是怎样的面向对象的子集消息机制面向对象的特例面向演员面向对象的补充小问题自然界是否真的存在无穷且无序的集合？ Round Robin如何最优实现“贪心”的目标？其他问题？ Team Of Programmers 在不久的将来所需要的一种高级语言支持并行处理，封装并行调度算法仅支持函数式编程没有代词、变量 Java/C风格与流行高级语言无缝接合相同编译、运行环境一种设计模式横向与面向对象无缝接合不支持面向过程的算法实现可以通过简单的面向过程因素更好地取代仅面向过程的算法实现一种架构与云计算基础架构的松耦合通过采用 REST风格的服务协议 (HTTP)实现大云 Team Of Programmers 参考资料 The Actor Architecture by Myeong-Wuk Jang Actor-based Programming for Scalable Concurrent Systems by Gul Agha Reflections on Middleware, Meta-Architectures, and Adaptive Distributed Monitoring by Gul Agha Cloud Computing Lecture by Jimmy Lin PIG: web-scale processing by Christopher Olston and many others Kilim: Isolation-typed actors for Java by Sriram Srinivasan MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean Sanjay Ghemawat 等 Team Of Programmers 3Q! Contact me: qinxiaoyuceclipse.org

展开阅读全文

幻灯片1-SOURCEFORGE.ppt

最新文档