基于BURG算法的谱估计研究及其MATLAB实现毕业设计论文

资源描述

毕业设计（论文）题目：基于BURG算法的谱估计研究及其MATLAB实现目录一、毕业设计（论文）开题报告二、毕业设计（论文）外文资料翻译及原文三、学生“毕业论文（论文）计划、进度、检查及落实表”四、实习鉴定表XX大学XX学院毕业设计（论文）开题报告题目：基于BURG算法的谱估计研究及其MATLAB实现机电系电子信息工程专业学号：学生姓名：指导教师：（职称：讲师）（职称：） XXXX年XX月X日课题来源功率谱估计在近30年中获得了飞速发展。涉及到信号与系统、随机信号分析、概率统计、随机过程、矩阵代数等一系列学科，广泛应用于雷达、声纳、通信、地质、勘探、军事、天文、生物医学工程等众多领域。科学依据（包括课题的科学意义；国内外研究概况、水平和发展趋势；应用前景等）实际中，数字信号的功率谱只能用所得的有限次记录的有限长数据来予以估计，这就产生了功率谱估计这一研究领域。功率谱的估计大致可分为经典功率谱估计和现代功率谱估计，针对经典谱估计的分辨率低和方差性能不好等问题提出了现代谱估计，AR模型谱估计就是现代谱估计常用的方法之一。随着信号处理理论和电子技术的迅猛发展，谱估计分析在已经在国内外各个领域中得到广泛应用。例如在现代军事领域中，人们通过分析雷达信号，根据回波信号的功率谱密度、谱峰的宽度、高度与位置，可以确定运动目标的位置、辐射强度和运动速度。在引信系统中，通过对探测器接收到的信号进行谱分析，可以确定弹目之间的交会速度、方向，从而提高引信的打击精度和毁伤效率。现代谱分析受算法的复杂度高和集成电路处理速度的影响，因此在引信系统中的应用会受到很大的限制。但随着电子技术的大力发展，特别是DSP技术的不断发展，现代谱分析的方法在引信系统中的应用逐渐成为可能，并且极大地提高了无线电引信引战配合性能。研究内容熟悉谱估计的发展历程；熟练掌握经典谱估计方法：直接法和间接法、它们之间的关系、估计质量以及估计性能比较；熟练掌握现代谱估计方法：信号建模、AR模型参数求解的Levinson-Durbin算法和BURG算法，阶数的确定方法和原则，稳定性以及对信号建模的讨论；熟悉熟练使用MATLAB仿真。针对一个具体的随机信号，分别采用经典谱估计和现代谱估计方法估计出其功率谱，对经典谱估计和现代谱估计方法谱估计的分辨率和方差性能作一个综合评价；熟练使用MATLAB提供的图形用户界面（GUI）工具。拟采取的研究方法、技术路线、实验方案及可行性分析熟悉经典谱估计和现代谱估计相关的理论知识，并掌握MATLAB的使用方法，达到熟练使用并融会贯通。应用有关理论知识在MATLAB上进行仿真实验，然后对实验结果进行对比和分析。最后总结归纳实验结果，提出自己的观点。研究计划及预期成果研究计划：2009年10月12日-2009年12月25日：按照任务书要求查阅论文相关参考资料，并填写毕业设计开题报告书。2010年1月11日-2010年3月5日：进行毕业实习，并填写毕业实习报告。2010年3月8日-2010年3月14日：按照要求修改毕业设计开题报告。2010年3月15日-2010年3月28日：学习并翻译一篇与毕业设计相关的英文材料。2010年3月29日-2010年4月11日：MATLAB程序设计与MATLAB程序调试。2010年4月12日-2010年4月18日：GUI设计。2010年4月19日-2009年5月21日：毕业论文撰写和修改工作。预期成果：了解谱估计的发展进程，掌握各种算法，并熟练掌握MATLAB的基本操作。训练自己综合运用所学知识的能力，培养自己独立分析和解决问题的能力，启发自己奋发向上的精神。特色或创新之处由于经典谱估计中将数据工作区外的未知数据假设为零，相当于数据加窗，导致其分辨率降低。另外，功率谱密度原始定义中既无求均值又无求极限，所以使得经典谱估计方法的方差性能比较差，尤其是在数据纪录很短的情况下，这些问题更为突出。而现代谱估计则不再简单地将观察区外的未知假设为而零，而是先将信号的观测数据估计模型参数，按照求模型输出功率的方法估计信号功率谱，回避了数据观测区以外的数据假设问题，因而其谱的分辨率得到提高。可以看出现代谱估计性能优于经典谱估计。已具备的条件和尚需解决的问题已具备的条件：可通过书籍和网络了解功率谱估计相关的理论知识，并已学习过MATLAB程序设计。尚需解决的问题：运用MATLAB进行仿真，并分析结果。指导教师意见该生查阅了大量的相关资料，设计方案可行，同意开题。指导教师签名：年月日教研室（学科组、研究所）意见教研室主任签名：年月日系意见主管领导签名：年月日英文原文Parametric spectral estimation on a single FPGAABSTRACTParametric, model based, spectral estimation techniques can offer increased frequency resolution over conventional short-term fast Fourier transform methods, overcoming limitations caused by the windowing of sampled, time domain, input data. However, parametric techniques are significantly more computationally demanding than the Fourier based methods and require a wider range of arithmetic functionality; for example, operations such as division and square-root are often necessary. These arithmetic processes exhibit communication bottleneck and their hardware implementation can be inefficient when used in conjunction with multipliers. A programmable, bit-serial, multiplier/divider, which overcomes the bottleneck problems by using a data interleaving scheme, is introduced in this paper. This interleaved processor is used to show how the parametric Modified Covariance spectral estimator can be efficiently routed on a field programmable gate array for real-time applications.1. INTRODUCTIONDue to its ease of hardware and software implementation the shortterm fast Fourier transform(STFFT)is widely used for spectral estimation and is known as the conventional method. However, the technique has drawbacks in terms of spectral resolution and accuracy caused by the finite length of the input data sequence used. Windowing of input data causes spectral broadening and Gibbs phenomenon of spectral leakage can mask the weaker frequency components of the true power spectral density(PSD)1. These unwanted effects can be reduced by using longer data sequence lengths, so that the transformed signal becomes a better representation of the infinite data sequence, but in real life this usually is not feasible as the characteristics of the input data may change with time. Over short periods of time the data signals can often be assumed to exhibit wide-sense stationarity, where the signal characteristics are assumed approximately constant but the spectral resolution is therefore limited. In attempts to improve the PSD estimation, windowing functions, Bartlett or Hanning for example, can be used to reduce side-lobe levels but these lower spectral resolution by broadening the main lobe of the PSD2.Model based, parametric spectral estimation techniques can alternatively be used, where the unrealistic assumption that data is zero outside the window of interest is dropped1. Either knowledge of the underlying process or reasonable assumptions about the nature of the unobserved data are used to improve frequency resolution over the conventional approaches. The computational burden of such processors is however much higher than the STFFT and arithmetic functions such as division and square-root often become necessary. In the division and square-root non-restoring algorithms there is an inherent dependency that the result bits must be computed in a most significant bit(MSB)first manner, with the computation of a bit directly dependent upon the result of the previous one3. This interdependency makes it difficult to efficiently realize such arithmetic functions in hardware, and implementations are usually much slower than other basic functions such as multiplication, addition and subtraction. Communication bottlenecks can therefore easily occur in systolic arrays where different types of processors are interconnected.The difficulties with hardware implementation of parametric spectral estimators have led to a preference of software implementation on homogeneous DSP networks4. However, high levels of processing capacity have not been fully reflected in system throughput since the increased communication incurred as a result of parallelism is constrained by communication bus performance. This restricts the range of problems that can be computed in realtime and the software approach may sometimes be inadequate for real-time spectral estimation.In this paper, hardware implementation of a parametric spectral estimator is addressed. A bit-serial processor capable of division and inner product step computation is developed by combining separate processors for these functions. The design uses a high level of pipelining so that division can be computed at a high rate and multiplication is performed on a MSB first data stream, eliminating the bottleneck problem. The high level of pipelining allows many independent computations to be performed simultaneously or interleaved. The use of the interleaving scheme is demonstrated by implementing the design of a Modified Covariance type of parametric spectral estimator, to produce a field programmable gate array(FPGA)based system for the spectral analysis of Doppler signals from ultrasonic blood flow detectors.2. MODIFIED COVARIANCE SPECTRAL ESTIMATIONThe model order p=4 Modified Covariance(MC)spectral estimator, proven to be optimally cost efficient for the blood flow application where mean velocity and flow disturbance are of interest5, involves solving the following linear system of covariance matrix equations: (1) where each element is obtained from: (2) for a window of length N data samples. The filter parameter estimates are obtained by solution of the linear system(1), using the Cholesky, forward elimination and back substitution algorithms. The signal white noise variance estimate, is calculated as: (3) and the power spectral density(PSD), , is obtained from: (4) Hence, the MC spectral estimator may be partitioned onto four different programming modules:CMR-calculation of the elements of the covariance matrix and right-hand side vector, 5N multiply accumulates taking into account matrix symmetry.Cholesky-solution of the linear system of equations, 6 divisions and 10 inner step products for non-square-root Cholesky, 4 divisions and 12 inner step products for solving triangular systems.WNV-calculation of the white noise variance, 4 multiply accumulates.PSD-computation of the power spectral density, 4N inner step products for a zero padded DFT, N multiplications to find absolute value of DFT and N/2 divisions for the PSD.The number of samples, over the fixed time duration window of 10ms, is required to be either 64, 128, 256 or 512 depending on Doppler signal conditions. Implementation of the algorithm in Matlab software proved to be in excess of a factor of times too slow for real-time operation and that a performance of up to 13.5 MFLOPS/s is required4. Execution times of MC algorithm implementation using various topologies of Texas Instruments TMS320C40 DSPs with T8 transputers as routers have also fallen short of the real-time requirements, where processing time is over 150ms too long in the worst case46. Use of a single DSP in a PC hosted system has been shown sufficient for the smaller N but the specification of N=512 could not be achieved4, thus prompting consideration of the hardware approach.3. BIT-SERIAL INTERLEAVED PROCESSORStudy of word-parallel systolic implementations of the MC method has shown the method to provide more than adequate throughput for the specified real-time blood flow application but the cost of such a system is very high in terms of arithmetic units, communication burden and control complexity7. For example, a systolic array processor for non-square root Cholesky decomposition8 requires 13 processing elements(PEs), each PE having 2 to 6 ports of either m(single precision)or 2m(double precision)lines, and control is necessary to reverse data streams before back substitution. An alternative way to approach the hardware design involves consideration of bit-serial processing techniques.The nature of multiplication algorithms normally involve the computation of least significant bits(LSBs)first and bit-serial multipliers reflect this in their output ordering. Conversely, division algorithms such as non-restoring are MSB first in nature9. Computation of each quotient bit can be performed from m controlled add subtract(CAS)operations, the decision on whether to add or subtract being taken given the result of the previous bit computed(except on the first operation where the signs of the input operands are used to decide). Allowing carries to ripple through therefore leads to a propagation delay greater than m CAS cells. In a bit-serial multiplier, the delay between successive bits being output is likely to be around a single full adder(FA)delay, leading to a maximum clock frequency approximately m times higher and a communication bottleneck with the divider. The clockrate of the divider can be increased to a similar maximum rate as the multiplier by pipelining the carries in each individual CAS stage. However, this means that each output bit is then available only once in every m clock cycles. There is also the problem that data streams must be reversed between multipliers and dividers. One possibility is to use registers and extra control logic to reorder the bit stream from the divider but the operation time is still limited.The efficiency of the divider with the pipelined carry can be greatly improved by using the redundant slots between the output of successive bits to perform other separate divisions. The bit-serial/word-parallel divider shown in3allows m+1 individual divisions to be performed simultaneously or interleaved. This decreases the mean division operation time to achieve similar performance to a bit-serial multiplier but there is still the problem of data stream matching when interfacing such devices. One way to tackle this problem is to redesign the multiplier so that it works on a MSB first data stream, rather than storing and reordering the divider outputs which increases latency and control requirement10. MSB first multiplication, first demonstrated by McCanny et al.11, shows it is possible to perform multiplication on positive numbers by summing partial products(PPs)in reverse order to the norm. This also requires inclusion of an MSB first addition unit to ensure that output carries from the PPs are added into the final product. Larsson-Edefors and Marnane12, extend the concept of MSB first multiplication to the twos complement number system and show bit-serial architectures for this application. In order to match the divider bit-streams exactly to the multiplier bit-streams it is then just a matter of inserting extra delays along the FA sum pipeline so that the addition of PPs from a number of different multiplications can be performed simultaneously as shown by Bellis et al13.Study of the bit-serial interleaved divider and multiplier reveals that both architectures show a large degree of similarity. Both work in load/operational phases; the loading networks for the divisor and multiplier both consist of m+1 delay feedback SISO registers and the FA sum/carry pipelines are alike. Both designs also require MSB first, half adder(HA)cell, addition stages; the divider requires m PEs, for 1s complement error correction which occurs for negative dividends, and the multiplier requires m-1 HA PEs to add the output carries from the PPs. Therefore, it is possible to combine the two designs to make a programmable bitserial device which allows m+1 computations to be simultaneously interleaved, as shown in figure 1.The processor has two mode selection inputs DIVi and SUBi, which control four modes of operation or where and are both double precision. Ldi is the load/operational mode select signal for the storage of and over the first m(m+1) clock cycles. Ldi switches into operational mode over the next m(m+1) clock cycles where the remaining data is input and the bulk of the computation is performed in the FA array. All control signals are fully pipelined similarly to the data, allowing the shortest possible block pipeline period of 2m(m+1) clock cycles and continuous input/output of data(i.e. while one block set of m+1 computations are being output, the next block set may be loaded in). The pipeline also allows independent functionality between each of the separate interleaves and on the same interleave a division may immediately follow an inner step product computation and vice-versa.4. INTERLEAVED PROCESSOR BASED MODIFIED COVARIANCE SYSTEMCost-benet analysis on systolic array implementation of the CMR and Cholesky sections of the MC spectral estimator shows that a 12 bit xed point word-length is sufcient for these computations7. Using the bit-serial processor with a 12 bit word-length results in the capacity for interleaving 13 computations.On interleaves 0 to 4 the CMR multiplications are performed over N consecutive block sets, such that the products are produced on interleave and blockset . A bit-serial systolic array provides the correct input data sequencing from consecutive Doppler signal samples and a separate MSB first double precision accumulator, whose architecture is similar to that of HA section in figure1, computes the covariance matrix elements, which are then stored in RAM. The system for computing the CMR calculation is shown in figure 2.The entire Cholesky, forward elimination, back substitution and WNV computations are performed on interleave 5 on the system shown in figure 3. Here division and inner product step computation are necessary. Once the covariance matrix elements are stored in the dual port RAM after block set N the Cholesky decomposition can commence on interleave 5 while in parallel the CMR computation on the next set of data can be processed on interleaves 0 to 4. A ROM block controls the addressing of the dual port RAM for retrieval of stored data to go onto the processor inputs and storage of the processor results. To achieve good dynamic resolution for the low word-length used, a systolic array scaling module is included between the RAM and the processor, whose scaling factors are also produced by the ROM controller along with the mode control. Overall timing in the system is controlled by three counters, qi(range 0 to 12),qb(range 0 to 23) and qw(range 0 to N)corresponding to the interleaves, bit-position and input word.A zero padded point DFT is computed on interleaves 6, 7, 8 and 9. This is basically amatrix vector multiplication and is computed by using the processor in inner product step mode. The system for this section consists of a ROM to provide storage of the twiddle factor matrix , another ROM to control the addressing of the twiddle factors for a particular qw and 4 registers which continuously recirculate the filter parameter results()from the Cholesky decomposition stage. On interleave 6 the real and imaginary parts of the first set of products are alternately formed. Using a single flip-flop delay the results of these computations are then fed back into the input of the interleaved processor to be added to the products and the DFT is built up in this way. The dynamicrange of the PSD computation is quite high compared to the rest of the system, therefore, at this stage a floating point representation of the DFT results is taken using a systolic based conversion circuit. PIPO registers are used to store the 6 bit exponents of the real and imaginary parts of the DFT, whose squares are computed on interleave 10. On interleave 11 the absolute value of the DFT is computed. The maximum of each pair of real and imaginary results from interleave 10 is fed to the input while the other value is piped into the to be appropriately scaled by the difference in the two squared exponents appearing on the input. The PSD is then computed on interleave 12, involving N/2 divisions of the WNV formed on interleave 5 with the absolute values from interleave 11. The exponents of the PSD are then easily derived from the exponents of the DFT results.5. CONCLUSIONThis paper has proposed a bit-serial interleaved processor which can be programmed for use in division or inner product step computations. The interleaving idea was introduced in order to perform bit-serial division at the same high clock rate as multiplication without resorting to carry look-ahead schemes to remove the communication bottleneck. The result is a high throughput processor which is cost efcient in terms of VLSI implementation, since communication between PEs in the linear array is localised and control is very simple. An application in parametric spectral estimation, namely implementation of theModied Covariance spectral estimator, which makes full use of the interleaving scheme, was described. This system has been programmed and simulated using VHDL. Synthesis was targeted to exploit the resources of a Xilinx XC4036EX-2 FPGA. This type of FPGA has dual port RAM capability, where a 16x1 bit dual port RAM can be implemented in a single congurable logic block (CLB). A dual port RAM cell is an area efcient method to implement a 13 or 14 bit SISO register, as used in the interleaving process. Such registers would otherwise have to be implemented using the pairs of ipops in each CLB, i.e.7 CLBs. CLBs can also be congured as ROM blocks which are useful for generating the address signals in the Cholesky and PSD modules, and for storage of the DFT twiddle factors. The p

展开阅读全文

基于BURG算法的谱估计研究及其MATLAB实现毕业设计论文

最新文档