OSKIAnAutomaticallyTunedLibraryofSparseMatrixKernelsoski自动调谐的图书馆的稀疏矩阵的内核

上传人:ra****d 文档编号:252675639 上传时间:2024-11-19 格式:PPT 页数:25 大小:178KB
返回 下载 相关 举报
OSKIAnAutomaticallyTunedLibraryofSparseMatrixKernelsoski自动调谐的图书馆的稀疏矩阵的内核_第1页
第1页 / 共25页
OSKIAnAutomaticallyTunedLibraryofSparseMatrixKernelsoski自动调谐的图书馆的稀疏矩阵的内核_第2页
第2页 / 共25页
OSKIAnAutomaticallyTunedLibraryofSparseMatrixKernelsoski自动调谐的图书馆的稀疏矩阵的内核_第3页
第3页 / 共25页
点击查看更多>>
资源描述
Click to edit the title text format,Click to edit the outline text format,Second Outline Level,Third Outline Level,Fourth Outline Level,Fifth Outline Level,Sixth Outline Level,Seventh Outline Level,Eighth Outline Level,Ninth Outline Level,OSKI:A Library of Automatically Tuned Sparse Matrix Kernels,Richard Vuduc(LLNL),James Demmel,Katherine Yelick,Berkeley Benchmarking and OPtimization(BeBOP)Project,EECS Department,University of California,Berkeley,SIAM CSE,February 12,2005,OSKI:Optimized Sparse Kernel Interface,Sparse kernels tuned for users matrix&machine,Hides complexity of run-time tuning,Low-level BLAS-style functionality,Sparse matrix-vector multiply(SpMV),triangular solve(TrSV),Includes fast locality-aware kernels:ATA*x,Initial target:cache-based superscalar uniprocessors,Faster than standard implementations,Up to 4x faster SpMV,1.8x TrSV,4x ATA*x,For“advanced users&solver library writers,Available as stand-alone open-source library(pre-release),PETSc extension in progress,Written in C(can call from Fortran),Motivation:The Difficulty of Tuning,n=21216,nnz=1.5 M,kernel:SpMV,Source:NASA structural analysis problem,8x8,dense substructure,Speedups on Itanium 2:The Need for Search,Reference,Best:4x2,Mflop/s,Mflop/s,SpMV Performanceraefsky3,SpMV Performanceraefsky3,How OSKI Tunes(Overview),Benchmark,data,1.Build for,Target,Arch.,2.Benchmark,Heuristic,models,1.Evaluate,Models,Generated,code,variants,2.Select,Data Struct.,&Code,Library Install-Time(offline),Application Run-Time,To user:,Matrix handle,for kernel,calls,Workload,from program,monitoring,Extensibility:Advanced users may write&dynamically add“Code variants and“Heuristic models to system.,History,Matrix,Cost of Tuning,Non-trivial run-time tuning cost:up to 40 mat-vecs,Dominated by conversion time,Design point:user calls“tune routine explicitly,Exposes cost,Tuning time limited using estimated workload,Provided by user or inferred by library,User may save tuning results,To apply on future runs with similar matrix,Stored in“human-readable format,How to Call OSKI:Basic Usage,May gradually migrate existing apps,Step 1:“Wrap existing data structures,Step 2:Make BLAS-like kernel calls,int*ptr=,*ind=;double*val=;,/*,Matrix,in CSR format,*/,double*x=,*y=;,/*,Let,x,and,y,be two dense vectors,*/,/*,Compute y=,y+,Ax,500 times,*/,for(i=0;i 500;i+),my_matmult(ptr,ind,val,x,b,y);,How to Call OSKI:Basic Usage,May gradually migrate existing apps,Step 1:“Wrap existing data structures,Step 2:Make BLAS-like kernel calls,int*ptr=,*ind=;double*val=;,/*,Matrix,in CSR format,*/,double*x=,*y=;,/*,Let,x,and,y,be two dense vectors,*/,/*,Step 1:Create OSKI wrappers around this data,*/,oski_matrix_t,A_tunable=,oski_CreateMatCSR,(ptr,ind,val,num_rows,num_cols,SHARE_INPUTMAT,);,oski_vecview_t,x_view=,oski_CreateVecView,(x,num_cols,UNIT_STRIDE,);,oski_vecview_t,y_view=,oski_CreateVecView,(y,num_rows,UNIT_STRIDE,);,/*,Compute y=,y+,Ax,500 times,*/,for(i=0;i 500;i+),my_matmult(ptr,ind,val,x,b,y);,How to Call OSKI:Basic Usage,May gradually migrate existing apps,Step 1:“Wrap existing data structures,Step 2:Make BLAS-like kernel calls,int*ptr=,*ind=;double*val=;,/*,Matrix,in CSR format,*/,double*x=,*y=;,/*,Let,x,and,y,be two dense vectors,*/,/*,Step 1:Create OSKI wrappers around this data,*/,oski_matrix_t,A_tunable=,oski_CreateMatCSR,(ptr,ind,val,num_rows,num_cols,SHARE_INPUTMAT,);,oski_vecview_t,x_view=,oski_CreateVecView,(x,num_cols,UNIT_STRIDE,);,oski_vecview_t,y_view=,oski_CreateVecView,(y,num_rows,UNIT_STRIDE,);,/*,Compute y=,y+,Ax,500 times,*/,for(i=0;i 500;i+),oski_MatMult,(A_tunable,OP_NORMAL,x_view,y_view);,/*Step 2*/,How to Call OSKI:Tune with Explicit Hints,User calls“tune routine,May provide explicit tuning hints(OPTIONAL),oski_matrix_t,A_tunable=,oski_CreateMatCSR,();,/*/,/*,Tell OSKI we will call SpMV 500 times(workload hint),*/,oski_SetHintMatMult,(A_tunable,OP_NORMAL,x_view,y_view,500,);,/*,Tell OSKI we think the matrix has 8x8 blocks(structural hint),*/,oski_SetHint,(A_tunable,HINT_SINGLE_BLOCKSIZE,8,8);,oski_TuneMat,(A_tunable);,/*,Ask OSKI to tune,*/,for(i=0;i 500;i+),oski_MatMult,(A_tunable,OP_NORMAL,x_view,y_view);,How the User Calls OSKI:Implicit Tuning,Ask library to infer workload,Library profiles all kernel calls,May periodically re-tune,oski_matrix_t,A_tunable=,oski_CreateMatCSR,();,/*/,for(i=0;i 500;i+),oski_MatMult,(A_tunable,OP_NORMAL,x_view,y_view);,oski_TuneMat,(A_tunable);,/*,Ask OSKI to tune,*/,Additional Features,Embedded scripting language for selecting customized,complex transformations,Mechanism to save/restore transformations,#In file,“my_xform.txt,#Compute Afast=P*A*PT using Pinars reordering algorithm,A_fast,P=reorder_TSP(InputMat);,#Split Afast=A1+A2,where A1 in 2x2 block format,A2 in CSR,A1,A2=A_fast.extract_blocks(2,2);,return transpose(P)*(A1+A2)*P;,/*In“my_app.c*/,fp=fopen(“my_xform.txt,“rt);,fgets(buffer,BUFSIZE,fp);,oski_ApplyMatTransform(A_tunable,buffer);,oski_MatMult(A_tunable,);,Additional Features,GNU AutoTools(autoconf)based install
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 商业管理 > 商业计划


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!