Browsing by Subject "Speech perception"

Now showing 1 - 17 of 17

Audiovisual integration for perception of speech produced by nonnative speakers
(2013-08) Yi, Han-Gyol; Chandrasekaran, Bharath; Smiljanic, Rajka, 1967-
Speech often occurs in challenging listening environments, such as masking noise. Visual cues have been found to enhance speech intelligibility in noise. Although the facilitatory role of audiovisual integration for perception of speech has been established in native speech, it is relatively unclear whether it also holds true for speech produced by nonnative speakers. Native listeners were presented with English sentences produced by native English and native Korean speakers. The sentences were in either audio-only or audiovisual conditions. Korean speakers were rated as more accented in audiovisual than in the audio-only condition. Visual cues enhanced speech intelligibility in noise for native English speech but less so for nonnative speech. Reduced intelligibility of audiovisual nonnative speech was associated with implicit Asian-Foreign association, suggesting that listener-related factors partially influence the efficiency of audiovisual integration for perception of speech produced by nonnative speakers.
Cross-language speech perception in context : advantages for recent language learners and variation across language-specific acoustic cues
(2016-05) Blanco, Cynthia Patricia; Smiljanic, Rajka, 1967-; Bannard, Colin; Meier, Richard P; Quinto-Pozos, David; Echols, Catharine H; Chandrasekaran, Bharath
This dissertation explores the relationship between language experience and sensitivity to language-specific segmental cues by comparing cross-language speech perception in monolingual English listeners and Spanish-English bilinguals. The three studies in this project use a novel language categorization task to test language-segment associations in listeners’ first and second languages. Listener sensitivity is compared at two stages of development and across a variety of language backgrounds. These studies provide a more complete analysis of listeners’ language-specific phonological categories than offered in previous work by using word-length stimuli to evaluate segments in phonological contexts and by testing speech perception in listeners’ first language as well as their second language. The inclusion of bilingual children also allows connections to be drawn between previous work on infants’ perception of segments and the sensitivities of bilingual adults. In three experiments, participants categorized nonce words containing different classes of English- and Spanish-specific sounds as sounding more English-like or Spanish-like; target segments were either a phonemic cue, a cue for which there is no analogous sound in the other language, or a phonetic cue, a cue for which English and Spanish share the category but for which each language varies in its phonetic implementation. The results reveal a largely consistent categorization pattern across target segments. Listeners from all groups succeeded and struggled with the same subsets of language-specific segments. The same pattern of results held in a task where more time was given to make categorization decisions. Interestingly, for some segments the late bilinguals were significantly more accurate than monolingual and early bilingual listeners, and this was the case for the English phonemic cues. There were few differences in the sensitivity of monolinguals and early bilinguals to language-specific cues, suggesting that the early bilinguals’ exposure to Spanish did not fundamentally change their representations of English phonology, but neither did their proficiency in Spanish give them an advantage over monolinguals. The comparison of adult listeners with children indicates that the Spanish-speaking children who grow to be early bilingual adults categorize segments more accurately than monolinguals – a pattern that is neutralized in the adult results. These findings suggest that variation in listener sensitivity to language-specific cues is largely driven by inherent differences in the salience of the segments themselves. Listener language experience modulates the salience of some of these sounds, and these differences in cross-language speech perception may reflect how recently a language was learned and under what circumstances.
Emphasis and pharyngeals in Palestinian Arabic : an experimental analysis of their acoustic, perceptual, and long-distance effects
(2020-05-06) Faircloth, Laura Rose; Crowhurst, Megan Jane; Myers, Scott; Smiljanic, Rajka; Meier, Richard; Watson, Janet
Arabic has a phonemic contrast between plain coronal obstruents and emphatic coronal obstruents, which have a secondary [+ back] feature with a debated uvular or pharyngeal constriction. These consonants are known to affect F1 and F2 in adjacent low /a/, but the effects on other vowels, the role of these cues in perception, and the long-distance acoustic effects have not been studied. A production study of emphatic consonants (Experiment 1) in Palestinian Arabic compared F1 and F2 of the vowels /a: i: u:/ following plain /s/, emphatic /s [superscript ç]/, and pharyngeal /[h with stroke]/. In comparison to vowels adjacent to plain coronals, F1 was higher adjacent to pharyngeal /[h with stroke]/ and lower adjacent to emphatic /s [superscript ç]/ at the onset, but this effect decreased at the midpoint and offset. F2 was lower adjacent to emphatic /s [superscript ç]/, in comparison to adjacent to plain /s/, and this effect was consistent at the onset, midpoint, and offset. The effects of emphatic consonants were greater in low /a/ than in high /i u/. A perception experiment (Experiment 2) explored the role of these acoustic correlates in the identification of plain /s/ and emphatic /s [superscript ç]/ before low /a:/ and high front /i:/, where stimuli had a frication segment from /s/ or /s [superscript ç]/ and F1 and F2 values varied. Listeners used F2 lowering as a cue to emphatic consonants, but they were also able to rely on slight differences in F1 and the frication to improve their identification overall. A second production experiment (Experiment 3) examined the long-distance effects of emphatic and pharyngeal consonants. Speakers produced F2 lowering in all emphatic environments compared to a plain control, regardless of directionality or locality. Speakers only produced localized F1 raising with pharyngeal consonants in immediately adjacent vowels. These experiments suggest that emphasis is uvularization in Palestinian Arabic, which causes F1 and F2 lowering in adjacent and non-adjacent vowels in comparison to vowels in plain environments, and that listeners use these cues to identify emphatic consonants. Pharyngeal /[h with stroke]/ raised F1 briefly, suggesting that pharyngeals do not have the same phonological effects as emphatic consonants in this dialect.
Environment- and listener-oriented speaking style adaptations across the lifespan
(2014-08) Gilbert, Rachael Celia; Smiljanic, Rajka, 1967-
This dissertation examines how age affects the ability to produce intelligibility- enhancing speaking style adaptations in response to environment-related difficulties (noise-adapted speech) and in response to listeners’ perceptual difficulties (clear speech). Materials consisted of conversational and clear speech sentences produced in quiet and in response to noise by children (11-13 years), young adults (18-29 years), and older adults (60-84 years). Acoustic measures of global, segmental, and voice characteristics were obtained. Young adult listeners participated in word-recognition-in-noise and perceived age tasks. The study also examined relative talker intelligibility as well as the relationship between the acoustic measurements and intelligibility results. Several age-related differences in speaking style adaptation strategies were found. Children increased mean F0 and F1 more than adults in response to noise, and exhibited greater changes to voice quality when producing clear speech (increased HNR, decreased shimmer). Older adults lengthened pause duration more in clear speech compared to younger talkers. Word recognition in noise results revealed no age-related differences in the intelligibility of conversational speech. Noise-adapted and clear speech modifications increased intelligibility for all talker groups. However, the acoustic changes implemented by children when producing noise-adapted and clear speech were less efficient in enhancing intelligibility compared to the young adult talkers. Children were also less intelligible than older adults for speech produced in quiet. Results confirmed that the talkers formed 3 perceptually-distinct age groups. Correlation analyses revealed that relative talker intelligibility was consistent for conversational and clear speech in quiet. However, relative talker intelligibility was found to be more variable with the inclusion of additional speaking style adaptations. 1-3 kHz energy, speaking rate, vowel and pause durations all emerged as significant acoustic-phonetic predictors of intelligibility. This is the first study to investigate how clear speech and noise-adapted speech benefits interact with each other across multiple talker groups. The findings enhance our understanding of intelligibility variation across the lifespan and have implications for a number of applied realms, from audiologic rehabilitation to speech synthesis.
How auditory discontinuities and linguistic experience affect the perception of speech and non-speech in English- and Spanish-speaking listeners
(2005) Hay, Jessica Sari Fleming; Diehl, Randy L.
Speech perception results from a complex interplay between the operating characteristics of the auditory system (i.e., auditory discontinuities) and linguistic experience. Research in human infants and animals, and research using tone-onset-time (TOT) stimuli, a type of non-speech analogue of voice-onset-time (VOT) stimuli, has suggested that there is an underlying auditory basis for the perception of stop consonants based on a threshold for detecting temporal onset asynchronies in the vicinity of + 20 ms. Languages, however, differ in their reliance on temporal onset asynchrony-based auditory discontinuities in their [voice] categories. This dissertation sought to examine whether long-term linguistic experience with different [voice] categories (i.e., English or Spanish) affects the perception of non-speech stimuli that are analogous in their acoustic timing characteristics. This research was also designed to investigate the joint effects of linguistic experience and auditory mechanisms on phoneme structure and category learning. Three cross-linguistic studies were designed to look at (1) the production and perception of VOT and the perception of TOT, (2) the effects of stimulus range on the perception of VOT, and (3) the effects of auditory discontinuities on non-speech category learnability. Results indicate that linguistic experience does affect the perception of nonspeech stimuli, at least in certain circumstances. Thus, there is some commonality in the processes used to discriminate between non-speech sounds and those used to discriminate between speech sounds. Additionally, auditory discontinuities were found to influence both phoneme structure and category learning. It is suggested that English- and Spanishspeaking listeners use different cues to discriminate their [voice] categories. Results also suggest that there are perceptual asymmetries between the positive and the negative onset asynchrony-based auditory discontinuities. The relationships between auditory discontinuities, linguistic experience, discriminability, phoneme category structure, and learnability are discussed.
Informational masking of multi talker babble in English vowel identification for Spanish-English bilinguals
(2016-05) Estrella, Alexandra; Liu, C. (Chang), Ph.D.; Chandrasekaran, Bharath
Speech perception studies with bilinguals have demonstrated that bilinguals perform comparably to native speakers in listening conditions during quiet conditions. However, when the listening conditions included different types of noise, and different SNRs, bilinguals are seen to have difficulties and perform lower than native speakers when tested in their L2. With Spanish-English bilinguals becoming a large part of the U.S. population, the present study investigated their speech perception abilities using English vowels in different quiet and noise conditions. The participants were controlled for their age of acquisition of English in order to determine if the amount of exposure to the language affected their overall performance. In addition, the amount of informational masking was evaluated using comparisons with the babble and temporally modulated noise conditions. Results indicated that the later bilinguals experienced more difficulties throughout the different conditions when compared to the simultaneous and early bilinguals, but significance levels were only noted for a few of the conditions. Additionally, there were no major effects for informational masking.
Measuring phonetic convergence : segmental and suprasegmental speech adaptations during native and non-native talker interactions
(2013-12) Rao, Gayatree Nandan; Diehl, Randy L.; Smiljanic, Rajka, 1967-
Phonetic convergence (PC) is speech specific accommodation characterized by an increase in similarity in a dyad’s speech patterns due to an interaction. Previous research has demonstrated that PC occurs in dyads during various interactive tasks (e.g. map completion and picture matching) and in cross-linguistic conditions (e.g. dyads who speak the same or different native language) (Pardo, 2006; Kim et al., 2011). Studies suggest that speakers who are closer in linguistic distance (i.e. share the same native language) are more likely to converge than speakers who are far apart (i.e. speak different native languages) (Kim et al, 2011). However, Interdialectal conditions where speakers use different national dialects of the same language have been studied to a far lesser extent (Babel, 2010). Similarly, studies have examined both segmental and suprasegmental features that are susceptible to PC but rhythm has not been studied extensively (Krivokapic, 2013; Rao et al., 2011). Though initial studies postulated that PC is the result of either automatic or social processes, more current research suggests that a combination of both kinds of processes may be better able to account for PC (Goldinger, 1997; Shepard et al., 2001; Babel, 2009a). The current dissertation uses novel measures such as Interlocutor Similarity and EMS + centroid to implicate global properties of vowels and rhythm respectively as acoustic correlates of PC. Moreover, it finds that speakers showed both convergence and divergence in vowels and rhythm as moderated by their language background. Close interactions between native speakers of American English (AE) resulted in convergence whereas interdialectal interactions (between AE and Indian English speakers) and mixed language interactions (between native and non-native speakers of AE who are native speakers of SP) resulted in both convergence and divergence. The results from this study may shed light on how speakers attenuate the highly variable nature of speech by adapting speech patterns to aid intelligibility and information sharing (Shepard et al., 2001) and that this attenuation is moderated by social demands such as identity and cultural distinctiveness.
Memory for speech of varying intelligibility : effects of perception and production of clear speech on recall and recognition memory for native and non-native listeners and talkers
(2020-07-15) Keerstock, Sandie; Smiljanic, Rajka, 1967-; Myers, Scott P; Crowhurst, Megan; Quinto-Pozos, David; Shafiro, Valeriy
This dissertation examines the effects of signal-related articulatory-acoustic enhancements in the form of clear speech on signal-independent processes and integration of information in memory. In a series of five experimental studies, this dissertation investigates the effect of clear speech production and perception on recognition memory and recall for native and non-native listeners and talkers. Two perception studies in Chapter 2 examined the effect of clear speech on within-modal (i.e., audio-audio) or cross-modal (i.e., audio-text) sentence recognition memory for native and non-native listeners. A perception study in Chapter 3 tested the effect of clear speech on recall, a more complex memory task, for native and non-native listeners. Finally, two production studies in Chapter 4 investigated the effect of producing clear speech on recognition memory and recall for native and non-native talkers. Key findings from this dissertation were that clear speech improved within- and cross-modal recognition memory and recall for native and non-native listeners but impaired recognition memory and recall for native and non-native talkers. These seemingly disparate findings in perception and production are discussed in the light of the models that appeal to ‘effort’ and cognitive load as detrimental to memory. This dissertation provides novel theoretical insights into how lower-level acoustic-phonetic enhancements interact with higher-level memory processes in first and second-language speech perception and production. The results from this dissertation have practical implications in a variety of environments where retention of spoken information is essential, such as classrooms and hospitals.
Modulation of neural responses to naturalistic speech production and perception
(2020-12-10) Kurteff, Garret; Hamilton, Liberty
Speech production is under-studied compared to speech perception largely due to complications in data collection caused by articulation. In electroencephalography (EEG), these complications manifest as electromyographic activity (EMG) originating from the muscles that control articulation (Chen et al. 2019). This is unfortunate because EEG is well-suited for studying the rapid temporal changes in speech production. In addition, the few EEG studies of speech production are limited to the single-word level, which limits the generalizability of studies to how speech is used in everyday contexts. In this thesis I present an EEG study of the differences between speech production and perception using sentence-level naturalistic stimuli. Participants overtly produced sentences from the MOCHA-TIMIT (Wrench 1999) corpus then listened to playback of themselves producing the sentences. Perception trials were then split into predictable and unpredictable trials. Predictable trials consisted of playback of the previously produced sentence, while unpredictable trials consisted of playback of a randomly selected previously produced sentence. In this thesis, two contrasts are compared: (1) overt production of sentences versus passive listening to sentences, and (2) passive listening to predictable sentences versus passive listening to unpredictable sentences. Canonical correlation analysis (CCA) was used to remove EMG artifact from the recorded EEG. To demonstrate removal of EMG and preservation of neural responses in CCA-corrected EEG data, event-related potential (ERP) analysis was used on neural responses to perception stimuli, inter-trial click tones, and activity recorded from auxiliary facial EMG electrodes. These ERP analyses revealed a reduction in amplitude for production trials and facial EMG activity after CCA artifact correction and a preservation of early auditory responses in inter-trial click tones, suggesting that EMG was successfully removed while preserving neural responses. After validation of EMG removal, perception and production trials were compared using ERP analysis. Responses to produced sentences were found to have reduced amplitude when compared with perceived sentences, which is consistent with previous research on speaker-induced suppression. Differences in stimulus predictability during speech perception had an effect on response amplitude as well; however, this difference was weaker than the difference in amplitudes observed while comparing the differences between perception and production trials. Multivariate temporal receptive field modeling was used to examine phonological tuning in perception and production. Models demonstrated that speaker-induced suppression does not reflect a change in neural encoding of phonological features but instead a generalized reduction in response amplitude during speech production. Understanding the differences between speech perception and production in a naturalistic context has implications for developing brain-computer interfaces and understanding the neural basis of communication disorders such as apraxia of speech and stuttering. This thesis also serves as a proof-of-concept for studying sentence-level speech production using EEG by demonstrating an effective way of removing EMG artifact while preserving integrity of neural responses.
Neural speech tracking in quiet and noisy listening environments using naturalistic stimuli
(2020-05-10) Desai, Maansi; Hamilton, Liberty
In noisy situations, speech may be masked with conflicting acoustics, including background noise from the environment or other competing talkers. The process of listening to one stream of sounds while ignoring background noise is referred to as the “cocktail party problem,” but its physiological basis remains poorly understood. In this study, we used electroencephalography (EEG) to measure neural responses to a continuous, controlled clean speech stimulus versus speech in naturalistic noise in 17 participants with typical hearing. We employed linear encoding models to assess the degree of neural tracking to specific speech features. These models allow us to predict neural activity from EEG based on specific acoustic or linguistic features in the speech stimulus over time. The aims of this project were the following: 1) assess the fidelity of neural tracking of speech features using a highly uncontrolled and naturalistic stimulus containing speech-in-noise alongside a clean speech condition, 2) characterize neural responses to acoustic features such as the speech envelope and pitch, along with linguistic features such as phonological features in both speech-in-noise and speech alone stimuli, 3) utilize a cross-prediction analysis to predict the neural responses to a speech-in-noise condition from a clean speech condition, and vice versa. The first two analyses seek to understand which speech features drive brain responses measured from the scalp. The purpose of the third analysis is to understand whether the predictions from our encoding model are generalizable to different types of stimuli. Our results demonstrated that model performance was more robust for the phonological features compared to the acoustic envelope in clean speech conditions, but combining acoustic and phonological features aided in listeners tracking speech in a noisy condition. Our ability to predict neural activity in response to speech sounds was higher when those sounds occurred without background noise. Finally, we predicted responses to the clean speech stimuli based on responses to the noisy speech stimuli, and vice versa. These results have implications for identifying which speech features could be used to build a brain-machine interface or a cognitive hearing aid to identify and separate speech from noise.
Perception of vowel quality in the F2/F3 plane
(2002) Molis, Michelle Renee; Diehl, Randy L.
Perceptions of variation in second-generation Montrealers' speech : methods for remote ethnolinguistic research
(2022-07-21) Adams, Tracey Gail; Bullock, Barbara E.; Villeneuve, Anne-José; Remysen, Wim; Blyth, Carl; Epps, Patience
This dissertation assesses if and how ethnicity plays a role in speech perception amongst French speakers in Montreal, Quebec. Is ethnolinguistic variation present, and is it noticeable to Montrealers? In so doing, this work highlights the conflicting nature of two bodies of work: ethnographic and cultural studies research on immigrant communities in Montreal and sociolinguistic research on the region. The former underscores the importance of ethnic and cultural heritage in second-generation speakers’ self-presentation and speech, while the latter assumes that these same speakers have uniformly assimilated to a regional norm. For this dissertation, I aimed to collect and analyze data to better adjudicate between these hypotheses. As such, I created a new corpus, featuring women from the three largest ethnic / cultural communities in Montreal: Haitian, North African, and Quebecker, and experimented with techniques for running an exploratory perceptual experiment remotely. This study speaks to (i) methods of recruitment for remote sociolinguistic interviews, (ii) methods of conducting experiments online, (iii) techniques used in the free classification approach to perception tasks (Clopper & Pisoni, 2007), (iv) how second-generation Montrealers’ speech is perceived, and (v) why disciplines contradict each other with regard to these communities.
Phonetic training for learners of Arabic
(2013-08) Burnham, Kevin Robert; Al-Batal, Mahmoud
This dissertation assesses a new technique intended to improve Arabic learning outcomes by enhancing the ability of learners to perceive a phoneme contrast in Arabic that is notoriously difficult for native speakers of English. Adopting a process approach to foreign language listening comprehension pedagogy, we identify and isolate an important listening subskill, phonemic identification, and develop a methodology for improving that skill. An online training system is implemented that is based upon known principles of speech perception and second language speech learning and has previously been used to improve phonemic perception in a laboratory setting. An empirical study investigating the efficacy of the training methodology was conducted with 24 2nd and 3rd year students of Arabic in several different intensive Arabic programs in American universities. The contrast under investigation was the Arabic pharyngeal (/h̄/) versus laryngeal (/h/) voiceless fricatives. Training participants completed 100 training modules, each consisting of a 24 item minimal pair test featuring the /h̄/-/h/ contrast in word initial position for a total of 2400 training trials over 4 weeks. The training website design was based on the high variability training protocol (Logan, Lively & Pisoni, 1991). The experiment finds significantly greater improvement (F₁,₂₂=8.89, p = .007, [mathematical symbol]₂ = .288) on a minimal pair test contrasting /h̄/ and /h/ for a group that received approximately 5 hours of phonetic training (n=10) compared to a control group (n=14) with no training. Critically, these perceptual improvements were measured with stimuli that were not part of the training set, suggesting language learning and not just stimulus learning. Qualitative data from participants suggested that these perceptual gains were not restricted to the simple minimal pair task, but carried over to listening activities and perhaps even pronunciation. The dissertation concludes with a discussion of phonemic perception and foreign language instruction and implementation of phonetic training within an Arabic curriculum.
Recognition memory in noise for speech of varying intelligibility
(2013-05) Gilbert, Rachael Celia; Smiljanic, Rajka, 1967-
This study investigated the extent to which noise impacts speech processing of sentences that vary in intelligibility for normal-hearing young adults. Intelligibility and recognition memory in noise were examined for conversational and clear speech sentences recorded in quiet (QS) and in response to the environmental noise, i.e. noise adapted speech (NAS). Results showed that 1) increased intelligibility through conversational-to-clear speech modifications lead to improved recognition memory and 2) NAS presented a more naturalistic speech adaptation to noise compared to QS, leading to more accurate word recognition and better sentence recall. These results demonstrate that acoustic-phonetic modifications implemented in listener-oriented speech enhance speech processing beyond word recognition. The results are in line with the effortfulness hypothesis (McCoy et al., 2005), which states that speech perception in challenging listening environments requires additional processing resources that might otherwise be available for encoding speech in memory. This resource reallocation may be offset by speaking style adaptations on the part of the talker. In addition to enhanced intelligibility, a substantial improvement in recognition memory can be achieved through speaker adaptations to the environment and to the listener when in adverse conditions.
Speech perception in noise with formant enhancement for older listeners
(2017-05) Guan, Jingjing; Liu, C. (Chang), Ph. D.; Champlin, Craig; Smiljanic , Rajka; Campbell, Julia
Degraded speech intelligibility in background noise is a common complaint of listeners with hearing loss. The purpose of the current study is to investigate the effect of spectral enhancement of the second formant frequency (F2) on speech identification in noise for older listeners with hearing loss (HI) and with normal hearing (NH). This study also aims to explore whether F2 enhancement improves speech perception in noise across languages such as American English and Mandarin Chinese. Target words (e.g., color and digit) were selected and presented based on the paradigm of coordinate response measure (CRM) corpus of English and Chinese versions. Speech recognition thresholds with original and F2-enhanced speech in two-talker and six-talker babble were examined for English and Chinese listeners with NH and HI. As expected, listeners with NH had better performance on speech perception in noise than listeners with HI in almost all listening conditions. More importantly, thresholds of both NH and HI groups were improved for enhanced speech signals across languages, primarily in two-talker babble, but not in 6-talker babble. Compared with the NH group, listeners with HI showed significantly greater benefits in the most challenging condition (e.g., low signal-to-noise ratios). F2-enhancement benefits did not significantly correlate with memory abilities. Moreover, the speech intelligibility index (SII) model interpreted the F2-enhancement benefits for the two language groups in multi-talker babble. That is, the perceptual improvement was mainly associated with the greater availability of acoustic cues of speech with F2 enhancement in the two-talker babble, but not in the six-talker babble. Overall, speech sounds with F2 enhancement may improve listeners’ speech perception in noise across different languages, possibly due to a greater amount of speech information available.
The neural representation of simultaneous speech and music
(2019-06-19) Lowery, Mary; Hamilton, Liberty
Research on neural processing of speech in the presence of other sounds has mostly been limited to studies of the cocktail party problem, in which a target speech signal is superimposed on other speech. The processing of speech in combination with other types of sound, meanwhile, has received little research and is poorly understood. In the current study, electroencephalography (EEG) was used to measure listeners’ neural responses to stimuli consisting of overlapping segments of speech and different musical instruments. Presented here is a preliminary analysis of these data that indicates differential neural representation of component sounds in these mixtures. Possible explanations for this result are discussed, as well as potential future analyses of the data.
The role of L2 experience in L1 phonotactic restructuring in sequential bilinguals
(2018-10-10) Alcorn, Steven Michael; Toribio, Almeida Jacqueline, 1963-; Smiljanic, Rajka, 1967-; Bullock, Barbara; Kelm, Orlando
Languages differ in their phonotactic constraints – that is, the numbers, types and combinations of sounds that can occur together in different parts of a syllable. In Brazilian Portuguese, for example, no stop consonant may occur at the end of a syllable, and any syllables in violation of this constraint are repaired by inserting an epenthetic /i/. Similarly, word-initial /sC/ clusters are disallowed in Spanish and are modified with a prothetic /e/ before the cluster. These repair processes have been shown to occur in both production and perception. Previous work on L2-L1 crosslinguistic influence in speech has focused primarily on segmental effects. To address the issue of the L2-L1 effect on phonotactics, I examine production and perception of “illegal” consonant clusters in two populations of bilingual speakers: L1 Brazilian Portuguese/L2 English and L1 Spanish/L2 English. Both language pairings exemplify phonotactic constraint mismatches because English allows both syllable-final stops and word-initial /sC/ clusters. Previous work has shown that the phonotactics of both languages are active during speech perception and processing for early bilinguals. If the same is true for late bilinguals, then their L1 performance should be expected to show an effect of the less restrictive L2 English system compared to monolingual listeners. Two pairs of studies were conducted, one for each language combination: perception was assessed via a forced-choice non-word identification task, and production data was elicited with a sentence reading task. The results showed that whereas monolinguals perceptually repaired illegal consonant sequences with an illusory vowel in a vowel detection task, bilinguals were more faithful to the acoustic signal. In production, greater experience with English led not only to target-like production of these sequences in L2 English, but it also predicted lower rates of epenthesis in L1 readings. For the Portuguese/English subjects, perception and production in both languages were correlated, with higher accuracy in perception predicting less frequent epenthesis in production. These findings provide novel evidence of the interaction between the L2 and L1 in sequential bilinguals who began learning their L2 in adolescence and adulthood, thus extending previous findings on simultaneous/early L2 learners. The results suggest that even sequential bilinguals can acquire novel phonotactic constraints in an L2, and that this new knowledge modulates L1 performance. The results further suggest that the relationship between the production and perception modalities in bilinguals is not straightforward and may be modulated by language dominance. Implications for models of bilingual speech perception are discussed and directions for future research are suggested