How to use OpenCL to optimize my decode algorithm?

Hellow,
I have one decoder algorithm have been implemented in the VS, and now I want to optimize the alogorithm performance on the AMD APU. My quesion is:
a. Do I need to use the Opencl heterogeneous programming to do my algorithm optimization?
b. If I use the OpenCL, how to start my work?
c. What aspects(methods) can I optimize to my algorithm ?

Thanks!

no idea about this algorithm…
:frowning: 8)