A technical evaluation of multilingual, context-aware AI guardrails, analyzing how English and Farsi responses are scored under identical policies. The findings surface scoring gaps, reasoning issues, and consistency challenges in humanitarian deployments.