A method encodes a video as video objects. For each candidate object, a quantizer parameter and a skip parameter that jointly minimizes an average total distortion in the video are determined while satisfying predetermined constraints. The average total distortion includes spatial distortion of coded objects and spatial and temporal distortion of uncoded objects. Then, the candidate objects is encoded as the coded objects with the quantizer parameter and the skip parameter, and the candidate objects is skipped as the uncoded objects with the skip parameter.