sorting big array in GPU

Given a big array(about 8 digits for length), only containing 0 and 1 value, how to classify them into two classes efficiently and quickly using parallel algorithm? if the type of elements of the array were vector(float8, for example) :?: