I see your point on chaining post-processing effects.
As to the primary thrust of your question, ultimately I don’t think it will matter to mobile hardware. FBOs are ultimately an abstraction. The primary cost burden of the abstraction itself is borne by the CPU, whereas the cost of doing the conceptual operation is on the GPU.
When NVIDIA recommends that you only change image attachments, that’s primarily because of how their hardware works. Notice that they specifically say that the new attachment should retain the same image format as the old. This is because their hardware, and therefore the CPU that implements it, seems to have some explicit dependency on this.
You can see this as well in NV_command_list, where they allow changing render targets in the middle of a command list. But they require that an entire rendering pass use the same image format, so you’re only allowed to change targets if the format doesn’t change.
Those kinds of things are likely to be hardware-specific. Maybe NVIDIA’s GPUs have ways to change where a texture comes from without having to incur a full pipeline stall and clear caches. Or whatever.
For tile-deferred GPUs, all that is irrelevant next to the gigantic pipeline stall that must be incurred when you change a render target into a texture. However you do this, whether it’s changing FBO attachments or the entire FBO itself, you’re still going to pay a serious price when you try to read from that texture. And that price will overshadow pretty much everything else.
That’s not to say that desktop-style GPUs have no pipeline stalls to pay when you initiate a readback from a previously written texture. But it’s not nearly as huge, so the CPU penalty of it is more likely to be important.
So generally speaking, the specific style of the change won’t matter. It’ll hurt a lot either way.
But ultimately, I don’t think there’s much practical, comparative experience out there on mobile GPUs. If you think it could help performance, you could always profile it.
Do you suggest FXAA is not optimal for mobile hardware then?
Optimal relative to what? Just like desktops, it’s probably faster than MSAA. But it won’t be as much faster as it was on desktop hardware, due to the larger cost of switching from writing to reading.