
Data acquisition followed procedures similar to those conducted in previous work12,38,106.
Participants
Thirty-four patients (17 female) were implanted subdural ECoG grids (Integra or PMT) with 4-mm centre-to-centre electrode spacing and 1.17-mm-diameter contacts (recording tens to hundreds of thousands of neurons107,108) as part of their neurosurgical treatment at either University of California, San Francisco (UCSF) or Huashan Hospital. All participants gave informed written consent to participate in the study before experimental testing. Most electrode grids were placed over the lateral surface of a single hemisphere and were centred around the temporal lobe extending to adjacent cortical areas. The precise location of electrode placement was determined by clinical assessment.
For all participants, age, gender, language background information and which hemisphere was recorded from is included in Extended Data Tables 1–4, where each table corresponds to the respective participant group: English monolingual, Spanish and Mandarin monolingual, Spanish–English bilingual and others with diverse language backgrounds. Which participants were included in each set of analyses is listed in Extended Data Table 5. ECoG recordings with the four Mandarin speakers were conducted at Huashan Hospital while patients underwent awake language mapping as part of their surgical brain-tumour treatment. ECoG recordings with all other participants were conducted at UCSF while patients underwent clinical monitoring for seizure activity as part of their surgical treatment for intractable epilepsy. Most participants in this dataset had epileptic seizure foci that were located in deep medial structures (for example, the insula) or far anterior temporal lobe outside of the main regions of interest in this study, such as the STG. All participants included in this study reported normal hearing and spoken language abilities.
Participant consent
All protocols in the current study were approved by the UCSF Committee on Human Research and by the Huashan Hospital Institutional Review Board of Fudan University. Participants gave informed written consent to take part in the experiments and for their data to be analysed. Informed consent of non-English-speaking participants at UCSF was acquired using a medically certified interpreter platform (Language-Line Solutions), and communication with research staff was facilitated by either in-person or video-call-based interpreters who were fluent in the participant’s native language.
Language questionnaire
All participants were asked to self-report speech comprehension proficiency, age of acquisition and frequency of use for all languages that they were familiar with. Participants who self-identified as Spanish–English bilingual were asked to complete a comprehensive language questionnaire by means of an online Qualtrics survey103.
Neural data acquisition
ECoG signals were recorded with a multichannel PZ5 amplifier connected to an RZ2 digital signal acquisition system (TuckerDavis Technologies (TDT)) with a sampling rate of 3 kHz. During the stimulus presentation, the audio signal was recorded in the TDT circuit and therefore time-aligned with the ECoG signal. Audio stimulus was also recorded with a microphone, and this signal was also recorded in the TDT circuit to ensure accurate time alignment.
Data preprocessing
Offline preprocessing of the data included downsampling to 400 Hz, notch-filtering of line noise (at 60 Hz, 120 Hz and 180 Hz for recording at UCSF and 50 Hz, 100 Hz and 150 Hz for recording at Huashan Hospital), extracting the analytic amplitude in the high-gamma frequency range (70–150 Hz, HFA) and excluding extended interictal spiking or otherwise noisy channel activity through manual inspection. The Hilbert transform was used to extract HFA using eight band-pass Gaussian filters with logarithmically increasing centre frequencies (70–150 Hz) and semilogarithmically increasing bandwidths. High-gamma amplitude was calculated as in previous work12,106 as the first principal component of the signal in each electrode across all eight high-gamma bands, using principal components analysis (PCA). Last, the HFA was downsampled to 100 Hz and Z-scored relative to the mean and s.d. of the neural data in the experimental block. Each sentence HFA was normalized relative to the 0.5 s of silent prestimulus baseline. All subsequent analyses were based on the resulting neural time series.
Electrode localization
For anatomical localization, electrode locations were extracted from postimplantation computer tomography scans, coregistered to the patients’ structural magnetic resonance imaging and superimposed on three-dimensional reconstructions of the patients’ cortical surfaces using an in-house imaging pipeline validated in previous work109. FreeSurfer (https://surfer.nmr.mgh.harvard.edu/) was used to create a three-dimensional model of the individual participant’s pial surfaces, run automatic parcellation to get individual anatomical labels and warp the individual participant surfaces into the cvs_avg35_inMNI152 average template.
Stimuli and procedure
All participants passively listened to roughly 30 min of speech in a language they knew (either Spanish, English or Mandarin Chinese) and one other language (either Spanish or English). Speech stimuli consisted of a selection of 239 unique Spanish sentences from the DIMEx corpus30,110, spoken by a variety of native Mexican-Spanish speakers; 499 unique English sentences from the TIMIT corpus, spoken by a variety of American English speakers31; and 58 unique Mandarin Chinese paragraphs from the ASCCD corpus from the Chinese Linguistic Data Consortium (www.chineseldc.org/), spoken by a variety of Mandarin Chinese speakers. Although the total duration of stimuli presentation was roughly equal across languages, statistical tests that were conducted at the sentence level across corpora accounted for the differing number of sentences.
We selected these specific speech corpora, originally designed to test speech recognition systems, to meet the following requirements: (1) comprehensively span the phonemic inventories of the languages in question; (2) provide validated phonemic, phonetic and word-level annotations; and (3) include a diverse and wide range of speakers with varying accents. These corpora were not intended to span specific semantic or syntactic structures in the languages or to capture long-range dependencies that extended across multiple sentences. Therefore we did not ask research questions about these specific speech representations.
The contents of each speech corpus were split across five blocks, where each block was roughly 5–7 minutes in duration. In English and Spanish speech stimuli, four blocks contained unique sentences, and a single block contained ten repetitions of ten sentences. In the Mandarin Chinese speech stimuli, four paragraphs were repeated six times, and repetitions were intermixed with unique paragraphs across blocks. Repeated sentences were used for validation of TRF models (see details below). Sentences were presented with an intertrial interval of 0.4–0.5 s in English and Mandarin and 0.8 s in Spanish. Spanish sentences were on average 4.77 s long (range: 2.5–8.03 s), English sentences were on average 3.05 s long (range: 1.99–3.60 s), and Mandarin Chinese sentences were on average 3.16 s long (range: 1.17–11.76 s). Although speech blocks of different languages could be intermixed in the same recording session, each 5- to 7-min block of speech consisted of only one language.
Speech stimuli were presented at a comfortable ambient loudness (about 70 dB) through free-field speakers placed roughly 80 cm in front of the patients’ head. All speech stimuli were presented in the experiment using custom-written MATLABR2016b scripts (MathWorks, www.mathworks.com). Participants were asked to listen to the stimuli attentively and were free to keep their eyes open or closed during the stimulus presentation. We performed all subsequent data analysis using custom-written MATLABR2024b scripts.
Electrode selection
Subsequent analyses included all speech-responsive electrodes: electrodes for which at least a contiguous 0.1 s of the neural response (10 samples, 100-Hz sampling rate) time-aligned to the first or last 0.5 s of the spoken sentences was significantly different from that of the prespeech silent baseline. To test for significance, we used a one-way ANOVA F-test at each time point, Bonferroni corrected for multiple comparisons using a threshold of P < 0.01. We included the neural response aligned to the last 0.5 s of each sentence to ensure that we did not exclude electrodes for which there was a significant response time aligned to the end of sentences but not the beginning.
Feature TRF analysis
HFA responses to speech corpora were predicted using standard linear TRF models with various sets of speech features37,111,112. Many of the speech features used were extracted from pre-existing corrected acoustic, phonetic, phonemic and word-level annotations included with these speech corpora. In the feature TRF (F-TRF) models described below, HFA neural responses recorded from a single electrode were predicted as a linear combination of speech features that occurred at most 0.6 s before the current time point:
$${rm{H}}{rm{F}}{rm{A}}
نشر لأول مرة على: www.nature.com
تاريخ النشر: 2025-11-19 02:00:00
الكاتب: Ilina Bhaya-Grossman
تنويه من موقع “yalebnan.org”:
تم جلب هذا المحتوى بشكل آلي من المصدر:
www.nature.com
بتاريخ: 2025-11-19 02:00:00.
الآراء والمعلومات الواردة في هذا المقال لا تعبر بالضرورة عن رأي موقع “yalebnan.org”، والمسؤولية الكاملة تقع على عاتق المصدر الأصلي.
ملاحظة: قد يتم استخدام الترجمة الآلية في بعض الأحيان لتوفير هذا المحتوى.