Studies on perception and cognition require sound methods allowing us to disentangle the basic sensory processing of physical stimulus properties from the cognitive processing of stimulus meaning. Similar to the scrambling of images, the scrambling of auditory signals is aimed at creating stimulus instances that are unrecognizable but have comparable low-level features. In the present study, we generated scrambled stimuli of short vocalizations taken from the Montreal Affective Voices database (Belin et al., Behav Res Methods, 40(2):531–539, 2008) by applying four different scrambling methods (frequency-, phase-, and two time-scrambling transformations). The original stimuli and their scrambled versions were judged by 60 participants for the apparency of a human voice, gender, and valence of the expressions, or, if no human voice was detected, for the valence of the subjective response to the stimulus. The human-likeness ratings were reduced for all scrambled versions relative to the original stimuli, albeit to a lesser extent for phase-scrambled versions of neutral bursts. For phase-scrambled neutral bursts, valence ratings were equivalent to those of the original neutral burst. All other scrambled versions were rated as slightly unpleasant, indicating that they should be used with caution due to their potential aversiveness.