This paper describes a new approach to implement the 513 integer Lifting Scheme for the wavelet transform on a VLIW CPU core, with the goal to improve computational performance in terms of cycles and memory accesses. The lifting scheme is part of the most recent standard for image coding (JPEG2000), for which a highly optimized software implementation is mandatory on embedded processor systems. We use one such processor as reference, to highlight the requirements on VLIW architectures that offer a limited form of instruction level parallelism and a fixed ratio of memory-to-general purpose instructions within a long word. We show that a careful analysis of the data access typical of the lifting scheme allows reducing by a factor of over 60% data misses and execution times measured in clock cycles with respect to a straightforward implementation.
Optimization of the DWT Lifting Scheme on a VLIW Processor
FERRETTI, MARCO
2006-01-01
Abstract
This paper describes a new approach to implement the 513 integer Lifting Scheme for the wavelet transform on a VLIW CPU core, with the goal to improve computational performance in terms of cycles and memory accesses. The lifting scheme is part of the most recent standard for image coding (JPEG2000), for which a highly optimized software implementation is mandatory on embedded processor systems. We use one such processor as reference, to highlight the requirements on VLIW architectures that offer a limited form of instruction level parallelism and a fixed ratio of memory-to-general purpose instructions within a long word. We show that a careful analysis of the data access typical of the lifting scheme allows reducing by a factor of over 60% data misses and execution times measured in clock cycles with respect to a straightforward implementation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.