multi-modal AI workflows using video and images