Moving images of a talker's face carry much information for speech understanding. Interpretation of that information is known as lip-reading, which can be used effectively when people hear speech sounds, especially under difficult listening conditions. For the development of advanced multi-modal communications systems, such information should be well considered. Actually, talker movies have been applied effectively, for example, to voice activity detection (VAD), and automatic speech recognition (ASR). We have been particularly examining which parts around the talker's mouth contribute most to speech understanding. In this study, we performed audio-visual speech intelligibility tests and investigated the relationship between speech intelligibility and effects of the parts around the talker's mouth. As the stimuli, nonsense tri-syllable speech sounds were combined with three kinds of moving images of a talker's face: original face, neighborhood of the lips (mouth part extracted from the original face), and audio only (without video). The size of extracted area around the mouth was changed as a parameter in the neighborhood of the lip conditions. All possible vowel-consonant combinations in Japanese were included in the presented nonsense tri-syllable speech. Generated audio-visual stimuli were presented with speech spectrum noise to the participants, who all had normal hearing and normal or corrected normal vision. Results showed that intelligibility scores of several phonemes (/n/, /h/, Imi, /w/, /d/, Ibi, /p/) were increased by adding the visual information. Moreover, no significant difference was found between the score of the original face condition and that of the neighborhood of the lips condition. This result suggests that the mouth area alone provides sufficient information for speech intelligibility.