problem of communication between global memory and private's

When i run a program, which contains a big recycle of data communication between private variables and global memory, i got a very bad performance result, which compared to the situation when i comment the code performing as copying a value of variable declared in main function body(private space) to global memory, it runs very fast. i tried to let the local memory be a agenda between the two terminals, but it seemed not work. So what else better method can i use to hide the latency of data transmission between global and private memory space?? :frowning: :frowning: :frowning: :frowning: