The goal of this study was to explore the effects different spatial sound configurations on visual attention and cognitive effort in an immersive environment. For that purpose, different groups of people were exposed to the same immersive video, but with different soundtrack conditions: mono, stereo, 5.1 and 7.4.1. The different sound conditions consisted of different artistic adaptations of the same soundtrack. During the visualization of the video, participants wore an eye-tracking device and were asked to perform a counting task. Gaze direction and pupil dilation metrics were obtained, as measures of attention and cognitive effort. Results demonstrate that the conditions 5.1 and 7.4.1 were associated with larger distributions of the visual attention, with subjects spending more time gazing at task-irrelevant areas on the screen. The sound condition which led to more concentrated attention on the task-relevant area was mono. The wider the spatial sound configuration, the greater the gaze distribution. Conditions 7.4.1 and 5.1 were also associated with larger pupil dilations than the mono and stereo conditions, showing that these conditions might lead to increased cognitive demand and therefore increased task difficulty. We conclude that sound design should be carefully planned to prevent visual distraction. More surrounding spatialized sounds may lead to more distraction and to more difficulty in following audiovisual contents than less distributed sounds. We propose that sound spatialization and soundtrack design should be adapted to the audiovisual content and the task at hand, varying in immersiveness accordingly.