This paper presents the design, implementation, and performance evaluation of an ARMv8 NEON-optimized codec for the MX Player 1130 media pipeline. We describe codec algorithm selection, NEON vectorization strategies, memory layout and alignment, multithreading and synchronization for big.LITTLE systems, power and thermal considerations, fallback compatibility, and benchmarking methodology. Results show latency and throughput improvements, CPU utilization changes, and energy trade-offs compared with a scalar baseline.