Well, I've done a fair bit of hacking and am already quite familliar with the code, I have to congratulate you on very well written and understandable code

I just finished writing the benchmark code and will make the analysis on the wiki tonight.
So far I can draw a few conclusions (tested on a 2year old laptop, single core)
1) it takes about 25 miliseconds on average to render a frame (decode, yuv convert and blit), which gives a potential of decoding about 50 frames per second
2) some frames take more time then others
3) precaching a few frames in advance could solve most frame dropping problems
4) transfering YUV conversion code to a shader can significantly boost performance.
I have a few ideas that I'm going to implement and could use your opinion:
1) all your member variables begin with 'm_' and ogre's begin with 'm', which makes the coding convention inconsistent. I'll rename all of them to 'm' convention.