Questions Generated by Japanese Students of English

Janusz Buda


Most conversational encounters begin with a question. Teaching students to frame clear, concise and relevant questions is a key element in courses of spoken English, especially at the basic and intermediate levels.

The present study examines a corpus of over eight thousand questions generated by students of English at Japanese universities and discusses salient patterns of usage and error.

Common error patterns were direct translation of incompatible Japanese expressions, a weak command of modal verbs and gerund forms, and rote memorisation of inappropriate words and collocations found in Japanese textbooks and dictionaries.

It is hypothesised that mental disassociation may have an adverse effect on English language acquisition among Japanese students.


The data for the present study was drawn from freshman and sophomore courses of spoken English taught by the author at three universities in Japan: Waseda University, Tsuda College, and Yokohama National University.

As preparation for each lesson, students were assigned a discussion topic and asked to submit ten or more starter questions, i.e., questions that would stimulate a dialogue or discussion. Five of these questions were informal personal questions that provided an easy entry to the topic; the remaining five were more general questions designed to open up the discussion. In addition to the questions, students were asked to submit a small sample of relevant data or information that would fuel the discussion. The submitted starter questions were checked and corrected by the instructor, then returned to the students before the lesson.

Before 2002, student assignments were submitted on paper and were handwritten, typed, or word-processed. From 2002 students were encouraged to submit assignments by email. In 2003, after a successful six-month trial period, the use of an online course management system (CMS) was extended to all courses and students were required to submit their assignments via the CMS (Buda, 2006).

The CMS used was Moodle, a free open-source system created by the Australian developer Martin Dougiamas. Accessing the Journal module of the CMS with a standard Web browser, students typed their questions and data into a large text box which could be viewed by the instructor. The instructor then typed comments, corrections, or requests for clarification into a smaller text box underneath. Students were encourage to access the CMS periodically and check for instructor feedback. On the basis of this feedback students could then amend and improve their questions and data.

Whether received by email or via the Journal module, student submissions were archived for future reference. The email submissions (2002–2003) were saved as text files and the Journal submissions (2003–2006) were stored in a MySQL database. At the conclusion of the 2006 Japanese academic year the total number of archived questions was 9,178.

Creation of the Corpus

These questions were then processed and organised into a corpus for further analysis. This processing of the raw data consisted of several stages.

Reduction of the Raw Data to Plain Text

Students were asked to input their starter questions into the Journal module as simple numbered lists. (1. 2. 3. …) Other than this, there were no specific formatting requirements. Despite repeated cautions and explanations, many students either did not number the questions as requested, or chose not to number the questions at all. Other students used HTML lists formatted internally by Moodle, or lists formatted externally in word processors, then converted into HTML and pasted into the Journal. To compound matters, many students were unaware of the differences between English and Japanese word-processing, and submitted lists that were a mixture of one-byte ASCII alphanumerics and two-byte Shift-JIS or other Japanese encodings.

The first stage in the generation of the corpus was to strip all HTML tags from the raw data. This and most subsequent re-formatting was done using various functions of the BBEdit text editor.

Separation of Questions and Data

The plain text data was transferred to a FileMaker database in which each student submission could be displayed as a separate record. In addition to an identifying number, each record contained ten starter questions plus a small amount of researched data, usually interesting statistics relating to the topic under discussion. The next stage in the processing was the removal of this topic data.

Only two instructions were given to students with regard to the topic data: that it should be simple and that it should be in English. No instructions were given with regard to format.

Approximately half the students re-typed data they had found in newspapers, magazines, or on the Web. The other half transferred Web material directly, usually in the form of charts, graphs or tables. This inconsistency in data format made it impractical to use regular expressions to identify the data sections of the submissions, and this task had to be performed manually.

As each submission was displayed within its FileMaker record, the data section was identified visually and then deleted. The starter questions still being in blocks of ten, the number of records that had to be processed in this manner was approximately 900.

Cleaning of Text

The starter questions, minus the topic data, were exported as a standard tab-delimited text file and the text cleaned using BBEdit. Control characters were stripped, HTML entities were translated, and two-byte encodings were converted into ASCII equivalents.

Punctuation was corrected and standardised, by far the most common problem being a lack of spaces after commas, periods and other punctuation marks, most probably the result of unconscious adoption of Japanese-language keyboard input. Another problem was the insertion of spaces before such punctuation marks. That the latter error is not an artifact of Japanese word-processing habits is evidenced by samples of handwritten English submitted in English, in which some students leave spaces of up to one centimetre between the end of a word and the comma or period following.

Linking of Records to Related Data

The cleaned text, still in tab-delimited format, was then exported to a spreadsheet application and the record-identifying numbers were used to lookup data from related tables downloaded from the main MySQL database. When this process was completed, each tab-delimited record contained the following information:

  1. One text block of ten starter questions.
  2. Submission identifier.
  3. Journal identifier.
  4. Topic identifier.
  5. Student identifier.
  6. Gender identifier.
  7. Foreign student identifier.

Fragmentation of Records

The linked data was exported from the spreadsheet application and once again returned to tab-delimited text format. Regular expressions were then used to search for and select each individual question, separate it from the text block, and add to it the relevant identifying information.

At the end of this process each record contained one question and the six items of identifying data listed above.

A comprehensive check of the data file was then performed and any questions that had not been identified correctly by the regular expressions were reformatted manually.

The identifying data was used to check inconsistencies such as skipped questions, e.g., 1. 2. 3. 5. 6. …, and to confirm that such missing questions had not been deleted inadvertently in previous formatting stages.

Finally, list prefixes were stripped.

Addition of Other Records

The processing of data from the CMS having been completed, the last stage in the creation of the corpus was the addition of questions submitted by email. Most of these questions were from the second stage of data collection, in which students were asked to submit assignments by email, and the first six months of the third stage, in which use of the CMS was not mandatory.

The email submissions required significantly less formatting than their CMS counterparts, as topic data was usually submitted separately in the form of attached files.

Email submissions were cleaned, formatted, and added to the corpus with the following identifiers:

  1. Submission identifier.
  2. Course identifier.
  3. Topic identifier.

Spelling Check

At the beginning of the text-cleaning process, the intention was to facilitate future text searches by correcting spelling mistakes and standardising US/UK orthography. During the early stages of text cleaning, however, it became clear that some of the spelling mistakes were related to other errors or ambiguities in the questions. It was decided not to correct spelling mistakes, but a spelling check was performed to locate run-on typing mistakes such as

Whatis the important when I study economics?
Do you want towork in advertising company?

At the end of this process, the corpus contained 8,674 questions, 504 having been discarded for reasons of duplication (two or more identical or near-identical questions submitted by the same student), indecipherable encoding, or incompleteness. Questions in the last category were probably the result of students postponing completion of a question until they had referred to a dictionary or textbook for help, then forgetting the existence of the incomplete question at the time of submission. Examples of the first category are:

	What vegetable do you like?
	What fruits do you like?
	What vegetable don't you like?
	What fruits don't you like?

Limitations of the Corpus

The resulting corpus of 8,674 questions provides a valuable resource for a study of conversational English questions generated by Japanese students.

Several limitations should, however, be noted. The questions were not collected for the purpose of subsequent statistical analysis, hence no attempt was made to control the sampling or formatting. The questions represent the final versions of submissions. Some questions incorporate suggestions offered by the instructor. Other questions required no correction: the final version is identical to the first. Yet other questions required correction but students chose to ignore the suggestions offered by the instructor. In some cases this was because students did not bother to check for instructor feedback after submitting an assignment. In other cases students did check the feedback but seemed incapable of identifying the differences between original and corrected versions, a phenomenon that is discussed in a later section of this paper.

Instructor feedback consisted of:

Because of these limitations, few direct extrapolations can be made from the corpus. It cannot be said, for example, that since 24% of the questions in the corpus contain a specific pattern, that pattern is characteristic of 24% of questions formed by Japanese students of English. The corpus can, however, furnish multiple examples and numerous variations of typical question patterns generated by students of English in Japan.

In the following section several such patterns will be illustrated with examples from the corpus.


General Observations

Over 60% of the questions in the corpus begin with What (37.9%), Do (13.5%), or How (10.4%). Of the standard interrogatives, What is the most common, followed by How, Which (5.5%), Why (5.1%), and When (3.2%). Where was the least common, accounting for only 1.5% of questions.

  No. % of
What 3,288 37.9%
Do 1,167 13.5%
How 901 10.4%
Which 479 5.5%
Why 446 5.1%
Have 347 4.0%
When 279 3.2%
If 239 2.8%
Are 182 2.1%
Is 178 2.1%
Where 134 1.5%
Please 104 1.2%
Can 94 1.1%
Did 75 0.9%
Should 49 0.6%
Who 45 0.5%
Does 41 0.5%
Will 36 0.4%
Could 17 0.2%
Would 14 0.2%
Although 13 0.1%
Was 11 0.1%
Were 7 0.1%


The What + <verb> construct accounts for 75% of What questions; What + <noun> for 25%.

In the What + <verb> construct, the most common combinations were What + do (35.5%) and What + is (23.9%).

In the What + <noun> construct, What + kind + of was predominant (259 of 833 = 31.1% ). The frequency of other common nouns was linked to the assignment topics.

No. %
What do 1,166 35.5%
What is 785 23.9%
What kind 259 7.9%
What are 121 3.7%
What should 90 2.7%
What's 70 2.1%
What did 62 1.9%
What was 44 1.3%
What will 34 1.0%


13.5% of questions in the corpus begin with Do. Variants Does and Did account for an additional 1.5%. In the overwhelming majority (93.7%) of Do questions, Do is followed by you. The following table lists the most common verbs in this construct.

No. %
Do you think  307 27.0%
Do you have 235 20.7%
Do you like 107 9.4%
Do you want 96 8.5%
Do you know 67 5.9%


10.4% of questions in the corpus begin with How and then continue as follows:

No. %
How many 235 26.1%
How do 181 20.1%
How much 113 12.5%
How often 87 9.7%
How long 84 9.3%
How can 28 3.1%
How did 25 2.8%

The 20.1% figure for How + do requires comment. In the first two or three assignments a significant number of students attempted to frame opinion questions in the form How do you think + <noun phrase> orHow do you think + <that clause>, as in:

How do you think the necessity of study?
How do you think that the freeter increase?

A variation of this was How do you feel:

How do you feel we can't eat a “gyuudonn”?
How do you feel foreigners eat a sushi?
How do you feel an African's hunger?

In addition to individual feedback via the CMS Journal module, repeated explanations were given in class and students were encouraged to use the What do you think about + <noun/gerund> construct.

These suggestions were, for the most part, adopted. The student penchant for think + <noun phrase> constructs is, however, reflected in the persistence of hybrid forms such as

What do you think the English education in Japan?


The occurrence of modal auxiliary verbs in the present corpus was as follows:

No. %
should 288 3.3% 29.0%
will 254 2.9% 25.6%
can 205 2.4% 20.6%
would 172 2.0% 17.3%
could 40 0.5% 4.0%
must 16 0.2% 1.6%
may 13 0.1% 1.3%
shall 5 0.1% 0.5%
might 0 0.0% 0.0%

These figures differ strikingly from equivalent figures in other corpora, such as Jean Claude Viel’s study of Modal Auxiliary Verbs in E.S.T. (Viel, 2002):

No. %
per 1K
can 940 37.5% 4.7
would 457 18.2% 2.3
will 429 17.1% 2.1
may 286 11.4% 1.4
should 153 6.1% 0.8
must 194 7.7% 1.0
shall 20 0.8% 0.1

It should be noted, however, that Viel’s corpus is compiled for the most part from examples of written English, whereas the present corpus is limited to conversational questions.

The predominance of should is probably related to topic content. Of the ten questions students were asked to submit for each assignment, five were supposed to be general questions. In this context, use of should constructs to evince opinions on actions and situations is understandable.


The frequency of modal verbs in the present corpus is partly related to the use of conditionals. The corpus contains 354 questions using if. Exactly half (177) used the modals can/could, should or will/would. None used the modals must, might, or shall.

Although 119 questions used modals indicative of hypothetical conditionals (could, should, would), most of these were a consequence of instructor feedback. As with the How do you think that construct mentioned earlier, individual feedback had to be reinforced with repeated in-class clarification of if constructs.

Despite the feedback and clarification, if constructs remained an intractable difficulty for students. Irrespective of intended meaning, a majority of students chose a simple if + <simple present>, what + <simple present> pattern, as in:

If you are copy writer, what copy do you write?
If we haven't got food imports, what do you do?
If you make your homepage, what do you want to list?

Or, unsure of the difference between real and hypothetical conditionals, students generated hybrid patterns such as:

If wheat disappears, which food would Westerners eat?
If you have advertised yourself as prime minister, how would you do?


Many of the ambiguous sentences in the corpus would appear to be the result of attempts to translate Japanese phrases into English, with or without the help of a dictionary.

For example, the corpus contains 66 examples of constructs using care. From the context it can be assumed that the intention was to find an equivalent for the Japanese expressions ki o tsukeru or chūi suru.

What do you care with customer?
When you attend the class, what do you care about it?
Do you take care of your meal?

Variants using attention were also found:

What should we pay attention to traveling?
What do you pay attention about your health?

Examples of constructs combining care/attention with point(s) appear to confirm the link to a Japanese expression:

What is your care points when you see advertisements?



In this preliminary study, no attempt was made to evaluate and categorise questions according to topic relevance, grammatical accuracy, or semantic integrity. For such an evaluation to be meaningful, it would have to identify and synthesise a large number of tangential factors — a task for which the author is unqualified.

However, examining the questions in the corpus from the limited viewpoint of communication, most native or near-native speakers of English would find them awkward, unnatural, ambiguous, or incomprehensible.

The above analysis provides a glimpse of some of the most common patterns and mistakes made by students, but attempting to identify the cause of these choices or mistakes has proved far from easy.

Difficulties with articles or number are to be expected among students whose mother tongue either does not utilise or does not require such elements. However, most questions in the corpus contain more than one error of vocabulary, grammar or collocation, rendering it difficult if not impossible to assess whether one mistake triggered, and was therefore linked to, the others, or whether each was generated independently.

Asking the students themselves — in class, via email, or through the CMS — how and why they composed a particular question proved singularly fruitless. Some students (a tiny minority), alerted by such queries to the existence of problems in their questions, were able to recognise the problems and correct them without further guidance, but without being able to explain the cause of the original mistake or ambiguity.

The only exception to this almost complete absence of feedback came from students who maintained that they had found the relevant word or phrase in a Japanese-English dictionary. Investigation of the source inevitably revealed that students had chosen the first item in a list of entries without giving consideration to subsequent entries or, indeed, examples of usage. [Note: In a significant number of cases, the references in the Japanese-English dictionaries proved to be at fault, exhibiting clear indications that they were taken from sources written by non-native speakers of English.]

Even in the absence of substantive feedback from students, most ESL instructors familiar with the Japanese language would be able to identify many of the dubious questions as direct translations from Japanese or influenced by Japanese usage. This tendency by students to think in Japanese and then translate the internalised questions by dividing them into words and phrases, looking up the English equivalents in a dictionary, and then re-combining them into a question, is probably the prime cause of some of the most grotesque examples in the corpus.

Careless use of dictionaries, especially compact electronic dictionaries that display only a few lines of text, thereby discouraging users from scrolling through multiple screens (Buda, 1991, 1993), and unthinking direct translation from Japanese are certainly two significant causes for the broken English generated by Japanese students. It must not be forgotten, however, that the students creating such questions or sentences have received at least seven or eight years of systematic English education at the secondary and tertiary levels, and should have acquired some sense of what is lexically, grammatically, or semantically acceptable in English.

That many — perhaps a majority — have not done so would indicate prima facie that the English education system in Japan has not succeeded in familiarising students with standard communicative English.

Japanese English

It has been noted earlier that, to a native speaker of English, most of the questions in the corpus are flawed. In a future study, it is planned to ask a representative sample of experienced English instructors to evaluate the questions in the corpus for acceptability using, if possible, a control group of native speakers unfamiliar with the Japanese language and Japanese usage of English. Without the data from such a controlled study, a quantification of the level and characteristics of Japanese English must rely on two problematic measures: the results of objective standard tests of English and subjective evaluation by ESL instructors for whom English is the mother tongue or who possess native-speaker fluency in the language.

Objective Tests

For many years, scores obtained in international tests of English such as TOEFL have placed Japan at or near the bottom of international rankings. These rankings have generated controversy and debate. Some educators, both Japanese and non-Japanese, have seized on these rankings as evidence of the poor quality of English language education in Japan. Others have countered by pointing out that the number of students taking the TOEFL and similar exams in Japan far exceeds comparable numbers in other countries, thereby making any direct comparison statistically invalid (Reedy, 2000).

The case for cautious handling of international comparisons is a strong one, but the data from Educational Testing Service and organisations administering other, less widespread, international tests of English do seem to indicate that, putting aside aggregate scores, separate assessment of the four skills of listening, speaking, reading and writing does not support the wide-spread contention held by Japanese that, whilst their listening and speaking ability may be poor, they are strong at reading and writing. This mistaken perception may have a prosaic basis: a short conversation or interview with a native speaker, or a real-time test of spoken English such as PhonePass (Buda, 2005), will soon reveal weaknesses in the first two skills, but objective measurement of reading speed and comprehension is not part of the average Japanese secondary school English curriculum, resulting in a lack of evaluatory feedback. In this respect, adoption of an objective measure of reading difficulty such as the Lexile Framework could prove of great value in the improvement of reading skills by matching individual ability with suitable reading materials (Schnick & Knickelbine, 2000). In a future study it is hoped to examine the possibility of adapting the Lexile Framework to English education in Japan.

An objective system of measuring Japanese writing ability may never be possible, and the mistakes made by Japanese students of English do not exhibit the characteristics of a consistent and definable variety of English. The prospects for improvement in this area are bleak, as demonstrated by the conservative and unoriginal nature of most Japanese writing textbooks. Although such textbooks provide sound instruction in the principles of correct writing, the preponderance of mechanical and unimaginative exercises would indicate that the teachers of this subject at junior-high and high-school level lack confidence in their ability to evaluate extended passages of original written English and make suggestions for improvement.

Linguistic Dissonance

Viewed objectively and subjectively, the corpus as a whole indicates a distinct proclivity for Japanese students of English to favour certain question forms and use them incorrectly. Several reasons for this tendency may be proposed.

The most obvious is the unconscious application of Japanese structures to English. Another is the incautious use of faux amis. This assertion may come as a surprise to readers unfamiliar with the eclectic nature of the Japanese language, which has enthusiastically adopted and often adapted words from a wide variety of languages including Chinese, English, French and German. Historically, this adoption was stimulated by the importation of foreign learning, but in more recent times cultural intercourse would appear to be the driving factor.

Many of these adopted words are used with only a minimal or fragile understanding of their original meaning, and the relatively short life-cycle of many of these adopted words would underline their origins in the cross-fertilisation of rapidly-changing popular cultures. One example would be the popularity, in the period 2002–2005, of the English word ‘boom’, used mistakenly in questions such as:

What is your boom now?

Which, interpreted loosely, is probably equivalent to

What are you interested in these days?

More recently, the word ‘staff’ has enjoyed wide popularity, probably as a result of appearing on the T-shirts or arm-bands of employees at concerts and sports events. This word has been picked up by a new generation of Japanese students of English and appears in constructs such as:

How many staffs are there in your part-time job?
I am a staff in a restaurant.

As pointed out by numerous scholars, the prolific use of loan words in Japanese is one of the characteristics of the language, and patterns of use and misuse have been identified and discussed by generations of Japanese and foreign scholars (Ishiwata, 1983; Hirowatari, 2009; et al). Such loan words serve many purposes: to express concepts or identify objects for which no Japanese equivalent exists, to create group-specific jargon that serves to bond members of that group, or simply to display familiarity with foreign languages. Whatever the purpose, once a foreign word has assumed the status of a loan word, it becomes a valid, albeit often ephemeral, element of Japanese vocabulary. It is only when attempts are made to transplant these words back into English that serious misunderstanding results.

Although no attempt was made to ascertain the linguistic background of the students generating the questions in this study, as far as the author is aware, none were bilingual in Japanese and English. It is possible that a few were bilingual in Japanese and Korean or Chinese.

Recent studies in bilingualism do, however, offer intriguing insights into some of the linguistic interference evident in the corpus questions. Hernandez, Li, and MacWhinney (2005) hypothesise that, whereas infant bilinguals are able to separate their two languages completely, with growth and development children begin to recognise semantic similarities, some of which facilitate further language acquisition, while others may interfere with this process.

Areas of Weakness

The corpus shows that Japanese students are particularly weak in the use of articles and in maintaining number agreement between subjects and verbs. As noted earlier, this tendency is not surprising. What is surprising is its resilience, not even Japanese who would normally be classified as fluent in English being immune. Of particular note is the inability of both beginners and experts to maintain gender consistency. In, for example, a discussion of a female subject, many Japanese speakers will suddenly switch to ‘he-him-his’, then revert to ‘she-her’ without exhibiting any sign of noticing the anomaly. As with the previous two weaknesses, gender inconsistency is evident across the full range of English-language abilities and can be observed not only among students but also among Japanese teachers of English.

A puzzling phenomenon links several of the weaknesses discussed thus far: the inability of many Japanese users of English to differentiate between correct and incorrect English and consequently to notice mistakes made by themselves or others. In an earlier section of this paper, it was noted that, in preparation for discussion classes, many students submitted incorrect questions in spite of suggestions (or even corrections) by the instructor. This lack of reaction is more clearly evident in assignments for English writing classes. Even when presented with supplementary materials listing incorrect sentences generated by members of the class, and even after having been informed that each sentence contained at least one serious mistake — in some cases as many as four or five — a majority of students will find approximately 10%–20% of the mistakes, though not necessary the most serious, whilst a significant minority will find none, even after intensive reference to electronic dictionaries.

If a parallel may be drawn to music, such ‘tone deafness’ to irregularities in English is difficult to explain. A firm command of basic English grammar and a working vocabulary of several thousand English words — the averred goal of secondary school English education in Japan — should equip students at the university level with the linguistic skills to generate acceptable English questions and spot mistakes made by themselves and others.

English Education in Japan 1

English education in Japanese elementary, junior high and high schools is conducted in accordance with guidelines laid down by the Ministry of Education, Culture, Sports, Science and Technology, formerly known as the Ministry of Education (Kitao, Kitao, Nozawa, & Yamamoto, 1985). For convenience, the older and simpler nomenclature will be used in this paper. In 1980, in spite of widespread protests from English language teachers and English education experts throughout Japan (Kitao, 1982), the Ministry decreed that the English education in junior high schools should be reduced from four classes per week to three. Other guidelines limited the number of English words to be taught, leaving significant gaps between the vocabulary acquired in junior high school and that required in high school and later for university entrance examinations. Japanese publishers rushed to revise their English textbooks in accordance with the guidelines by deleting major sections, replacing them with photographs, illustrations and ultra-wide margins.

Although most schools, especially those in the private sector, attempted to alleviate the negative effects of the new guidelines with supplementary instruction, English instructors at the tertiary level noticed a drastic lessening of English competency among the first batch of university entrants to have completed their secondary English education under the new Ministry guidelines. Many universities were forced to lower admission requirements or consider setting up remedial English courses.

Subsequent periodic revisions of the Ministry of Education guidelines (MEXT, 2003) have resulted in corresponding revisions of textbooks. From the subjective point of view of a university instructor, each of these periodic revisions appears to have effected changes in the English used by students.

The current study made no attempt to compare questions in the corpus with model sentences in junior- and high-school textbooks and, as noted previously, students seemed unable to recall the material contained in those books, but the cyclical appearance and disappearance of certain patterns would indicate a correlation to the revisions mentioned above. Revisions which, it should be repeated, follow Ministry of Education guidelines and receive Ministry approval.

To give one example, for a period of approximately 4–5 years in the early 2000s, many students favoured the use of tag questions in which long declarative statements were followed by brief interrogatives:

Many Japanese schools forbid students to work part-time work at all. How did your school do?

Instead of the more natural:

Did your high school prohibit part-time jobs?

Another generation of secondary school students displayed a penchant for sentences beginning with among, as in:

Among Japanese singers, who do you like the best?

avoiding the simpler and more natural:

Who's your favorite Japanese singer?

It is highly unlikely that hundreds of thousands of students simultaneously developed a preference for tag questions or an aversion to the use of favourite; more likely that corresponding constructs were included in one or more best-selling English textbooks and then drummed into students’ memories. This correlation between textbooks and student question preferences merits further investigation.

English Education in Japan 2

English education in Japan relies heavily on following a carefully structured curriculum emphasising a progression from basic to advanced and simple to difficult, each of these levels defined by, at least in outline and principle, by the Ministry of Education. Conversely, foreign textbooks of English that give precedence to common communicative patterns, irrespective of grammatical complexity, and which seek to inculcate the four essential skills in an integrated manner, have been adopted by only a small number of schools, almost all outside the Ministry-controlled system.

Although the current Ministry of Education guidelines pay lip service to the acquisition of communicative skills, especially those required to promote Japanese culture to the outside world (Kubota, 2002), this approach is incompatible with the traditional step-by-step approach still prevalent in Japanese schools.

Some extremely common and essential conversational gambits are surprisingly difficult to define grammatically and are thus not included in Ministry-approved textbooks. Over three decades, the author has been asked to write sentences, questions, dialogues and essays for Japanese textbooks. Use of the most natural expression for a specific situation has often been rejected at the editorial stage with the explanation that the vocabulary or grammar in that expression would not be introduced to students until a later stage of their English education. Until then, it would be necessary to replace the most natural expression with an unwieldy substitute cobbled together from previously-acquired vocabulary and grammar. In many cases, constraints of time result in the more natural expression never being introduced to students.

It would not be an exaggeration to state that the questions generated by Japanese students of English included in the present corpus exhibit marked evidence of incomplete patterns learned under difficult conditions in junior- and high-schools, and never un-learned by intensive exposure to natural spoken or written English.

Second-Language Acquisition

If we set aside theories of a universal proto-language, the origins of second- or multi-language acquisition are probably contemporaneous with the birth of language itself — an acquisition stimulated by migrations during the hunter-gather period of pre-history, and by cultural and mercantile exchange during the later transition to agricultural and urban civilisations. One of the earliest accounts of difficulties in language acquisition is perhaps the first language proficiency test in history: that of Judges 12:5–6, which describes the disastrous consequences of incorrect pronunciation.

Although countless generations of polyglots must have experienced the full gamut of second-language acquisition problems, it was not until the emergence of linguistics and psychology as scientific disciplines that the first attempts were made to identify and explain in a systematic manner the most common errors and the strategies used to deal with them. Schegloff, Jefferson, and Sacks’ seminal The Preference for Self-Correction in the Organization of Repair in Conversation (1977) was followed by Willem J. M. Levelt’s Monitoring in Self-Repair and Speech (1983) which remains required reading for many undergraduate psycholinguistic courses [viz:], and has stimulated research into self-repair among Japanese students of English (Nagano, 1997).

Linguists and psychologists have hypothesised numerous related phenomena such as structural priming (Bock & Griffin, 2000; Kaschak & Borreginne, 2008), monitoring (Kormos, 2000) source-monitoring (Mitchell & Johnson, 2009) and phonological competition (Barker, 2001). Such studies have expanded to encompass difficulties in English-language acquisition experienced by students with linguistically-unrelated mother tongues (Nakano et al., 2005; Liu, 2009).

Studies of Language Acquisition

Earlier in this study, reference was made to the inability of most students to recall where, when and how they had encountered and absorbed a particular question pattern. Reference was also made to the inability to identify or even recognise the errors made. This lack of feedback is not altogether unexpected: most mental processes, language being one of them, take place at a subconscious level and analyses of the end results of such processes are unable to take into account all contributory variables. Hence the tendency for studies in this field to rely on extremely small numbers of subjects and strictly-controlled replicable environments that bear little resemblance to those in which everyday language activities take place.

One of the thorniest aspects of the field of Cognitive Science is the search for manipulable and measurable variables. Surely this problem plagues all research endeavors to one degree or another, but the study of cognitive processes presents a particularly difficult case. First and foremost, the objects of our inquiry are not available for direct observation. Mental representations of information about the world and the processes that manipulate this information cannot be directly observed … Not only are mental processes not directly observable, they also have the misfortune of residing within people. And unlike atoms or photons, people are under no obligation to sit around and allow us to study them. And even if they may agree to do so on occasion, people have that pesky attribute of consciousness which can cause them to become distracted, bored, willfully deceptive, or even with the best of intentions go about a performing a given task in a different way than they would if they were not in an experimental setting (Barker, 2001, pp. 13–14).

Barker’s comments about ‘manipulable and measurable’ variables are of particular relevance to the current study, in which the questions that compose the corpus were created in an environment completely different from that in which they were used. It is not unreasonable to assume that most, if not all, of the questions were written by students working unsupervised and alone, without the vocalisation, auditory noise, time constraints, peer pressure, or other factors that accompany the generation of questions in dialogues or discussions.

The frustration felt by many researchers in this field has been articulated by John F. Schumaker:

As a psychologist, I have often wondered why my chosen profession had not made more obvious contributions to our embarrassingly small pool of knowledge about human behavior. Unfortunately, what we see is a once promising field of study that remains tangled in a barren wasteland of contrived and recycled theories about human thought, feeling, and action.… The fact remains that the disconcerting numbers of inconsistent and mutually incompatible theories on any one topic are irritating, if not downright exasperating, to the dedicated student of the human condition (1990, pp. 1–2).

Significant advances in other disciplines do, however, offer ways of observing mental processes. Neuroscience has come a long way from Wilder Penfield’s revolutionary yet highly intrusive mapping of brain areas with electrical probes inserted into the living brain (Rose, 1976; Blakemore, 1977; Kandel, Schwartz, & Jessel, 2000). Sixty years on, cognitive neuroscientists can avail themselves of numerous non-intrusive tools, each generation of which produces more precise and diverse information.

…newly developed imaging techniques now allow us to watch the brain in action. Computed tomography (CT) scans and magnetic resonance imaging (MRI) provide anatomical images of the brain, while positron emission tomography (PET) scans show the chemical and metabolic activities of the brain’s tissues. Functional MRI is a recent innovation that enables us to see the mind at work. It shows what happens in the brain when a person is listening to music, reading, speaking, or thinking (Winston & Wilson, 2004, pp. 146–147).

Computer-assisted electroencephalography (EEG) offers high temporal resolution of electrical brain activity whilst functional MRI (fMRI) provides high spacial resolution of cerebral activity during the performance of specific mental tasks (Mitchell & Johnson, 2009). Such research has, hitherto, focused on mental activities of immediate medical or social concern, such as memory loss in Alzheimer’s disease (Becker & Overman, 2002), memory alteration and manipulation in witness testimony (Bergström, Anderson, Buda, Simons, & Richardson-Klavehn, 2010), plagiarism (Stark, Perfect, & Newstead, 2005), and reality monitoring (Simons, Henson, Gilbert, & Fletcher, 2008; Buda, Fornito, Bergström, & Simons, 2010). Many of the discoveries in this field offer objective data for hypothesising some of the processes taking place in language learning dysfunctions. Unfortunately none of them, singularly or plurally, has come close to a offering a convincing general theory of second language acquisition and, to the slippery variables that so often confound traditional linguistic and psychological experimentation, is added the exasperating complexity and interrelatedness of the human nervous system:

…though the neural substrate that allows us to acquire language is innate, we learn the sound pattern, words, and syntax of particular languages. Nor are the mental operations carried out by our brains compartmentalized in the manner proposed by most linguists and many cognitive scientists. The correct model for the functional organization of the human brain is not that offered by “modular” theorists such as Steven Pinker (1994, 1998) — a set of petty bureaucrats each of which controls a behavior and won’t have anything to do with one another. The neural bases of human language are intertwined with other aspects of cognition, motor control, and emotion.

Neither the anatomy nor the physiology of the FLS [functional language system] can be specified with certainty given our current limited knowledge (Lieberman, 2000, p. 2).

Gallistel and King (2009) have taken an even more pessimistic view of current studies into learning and memory:

…learning is the extraction from experience of behaviorally useful information, while memory is the mechanism by which information is carried forward in time in a computationally accessible form. On this view, it is hard to see how one could have a single hypothesis about the physical basis of both learning and memory. The physical basis for one could not be the physical basis for the other. One could know with certainty what the mechanism was that carried information forward in time in a computationally accessible form, but have no idea what the mechanism was that extracted a particular piece of information from some class of experience (p. 279).

Although the synaptic plasticity model first suggested by Hebb in 1949 has been confirmed by decades of experimental research, Gallistel and King maintain that its relevance to memory mechanisms remains unproved: 

If a memory mechanism is understood to be a mechanism that carries information forward in time in a computationally accessible form, then the first and most basic property that a proposed mechanism must possess is the ability to carry information. The synaptic plasticity hypothesis in any of its historically recognizable forms fails this first test. There is no way to use this mechanism in order to encode the values of variables in a computationally accessible form. That is why whenever the need arises in neural network modeling to carry values forward in time—and the need arises almost everywhere—recourse is had to reverberating activity loops. That is also why one can search in vain through the vast literature on the neurobiological mechanisms of memory for any discussion of the coding question. The question “How could one encode a number using changes in synaptic conductances?” has, so far as we know, never even been posed. And yet, if our characterization of the nature of memory is correct, then this is the very first question that should arise whenever suggestions are entertained about the physical identity of memory in the brain (p. 279).

If we replace ‘encode a number’ with ‘encode a word’, the relevance of Gallistel and King’s observations to language acquisition and retrieval becomes depressingly clear.

Obstacles to Language Retention

This discussion of the question patterns found in the present corpus ends with a highly speculative suggestion — one presented with much trepidation. Although the following remarks apply to written English, they may provide a hint or clue to an additional factor in some of the mistakes made by students of English when forming conversational questions.

The author has noticed that many university students, when copying material written on the blackboard or whiteboard, or when transcribing English text from handouts or textbooks, are unable to do so accurately. Words or phrases are cut, added, or altered, and spelling mistakes abound. Why should, for example, a simple word such as ‘message’ be copied as ‘massage’, ‘for any inconvenience’ as ‘for inconvenience’, and names such as ‘John Smith’ as ‘Jhon Smith’, even though the original is only a few centimetres away from the target? Some students are even incapable of writing their own name correctly (e.g. ‘jiro Tnka’). One reason may lie in the kind of internal transformation mentioned in the overview to this discussion. English words are read, stored in short-term (‘working’) memory, then transferred to the written copy.

In immediate serial recall, performance level is greater for lists of phonologically dissimilar stimuli, as compared to similar. This suggests that verbal information held in the STS [short-term store] is coded phonologically. The presence of the effect with both auditory and visual input indicates that written material also gains access to the phonological STS when immediate retention is required (Vallar & Papagno, p. 251).

Whilst being stored and then retrieved from short-term memory, it is likely that some kind of transformation takes place. In the case of ‘message-massage’ the transformation is probably related to Japanese pronunciation in a phonological loop (Baddeley, 1999). The omission of ‘any’ in the second example may be related to internal repetition of an unfamiliar phrase causing impairment in short-term storage, as posited in Papagno and Vallar’s study of the phonological similarity effect (1992). The author is unable to offer any suggestions for the inability of some students to differentiate between upper- and lower-case letters and write their own name correctly.

A similar internal transformation phenomenon can be observed in oral acquisition. During Web-based listening lessons conducted by the author in CALL rooms (Chen, Belkada, & Okamoto, 2010), no matter how many times a student replays a recorded word or phrase, and no matter how many times the instructor attempts to help by repeating the word clearly and slowly, the light-bulb moment of sudden comprehension occurs only when the student switches to an internalised use of the Japanese katakana pronunciation and intonation of the word.

Leaving aside the deleterious effect of reliance on the katakana syllabary to represent English pronunciation, this effect accounts for only a small proportion of the transcription errors mentioned above. It is highly unlikely that Japanese students of English, as a group, suffer from a genetic predisposition to aberrations in short-term memory.

Before introducing the tentative hypothesis mentioned at the beginning of this section, it may be helpful to define the role of short-term memory in new language acquisition:

The main system involved in the immediate retention of verbal material comprises two components: a phonological STS and a rehearsal process. The STS is an input store, which provides the main retention capacity. Rehearsal is based on systems primarily concerned with speech production. Its main function is to revive the phonological memory trace, preventing its decay, and to convey visually presented material to the phonological STS. In addition to immediate retention per se, a main general role of phonological short-term memory concerns the acquisition of new phonological material, such as unfamiliar sound sequences (new words in a native or foreign language). This system is also involved in certain aspects of speech comprehension. Phonological memory has not only specific functional properties but also specific neural correlates, viz. posterior-inferior parietal and premotor frontal neural networks in the left hemisphere (Vallar & Papagno, p. 266).

This system does not, however, always function smoothly and accurately, a trait shared by many other human mental systems, which evolution appears to have blessed (or cursed) with an innate error-generating mechanism: a complex, fuzzy system of duplication and redundancy which occasionally results in aberrant decisions and actions. These aberrations are what makes us human, fallible and, most significantly, creative — a ‘Jester in the Machine’ as opposed to a ‘Ghost’ (Ryle, 1949; Koestler, 1967; Dixon, 1987).

One such fly-in-the-ointment to the dominant working-memory model proposed by Alan Baddeley (2001) is divided attention:

By focusing on the various component processes required for encoding and retrieval, we have been able to account for the asymmetric effects of DA [divided attention] on memory. DA at encoding leads to a relatively larger interference effect than DA at retrieval, and the magnitude of that effect does not depend on the material specificity of the concurrent task. At encoding, formation of the memory trace requires conscious apprehension of the material. Any concurrent task that diverts resources necessary for conscious apprehension of that material prevents it from being encoded and becoming part of a memory trace, leading to very poor memory (Fernadez & Moscovitch, 2000, p. 175).

Another potential impediment to language retention is dissociation. It is here that Schumaker’s hypothesis of the role of dissociation in mental activity affords an intriguing key to the apparent ineffectiveness of Japanese English education:

Every minute of every day, we are flooded with information inputs, the accumulated effect of which would be to overwhelm us and place a large drain on our nervous systems. There is a very important need for a means by which to escape, even temporarily, from certain situations that are deemed nonessential. In doing this, we conserve ourselves neurologically and preserve our energies for more demanding efforts. An example of dissociation for purposes of economy of effort might be “highway hypnosis.” After driving many miles with no other stimulation than white lines in the road, people fade into a mild dissociative trance. Once they return from this state, people often claim that they have no memory of long stretches of their trip. Some even express a bit of worry that they were not in control during the time of the dissociation. Of course, these people were in control as evidenced by the fact that they arrived safely and without incident, still another example of the simultaneous knowing and not knowing that is an aspect of dissociation.

There are many other times that we use dissociation in order to go into “neutral” or “automatic pilot” and thereby conserve ourselves neurologically. Boring lectures, tedious staff meetings, noisy children—these and countless other circumstances might lead people to employ dissociation in a form that does not involve subsequent suggestions to guide the dissociation in meaningful ways. (Schumaker, 1995, p. 60).

This long quotation from Schumaker’s The Corruption of Reality: A Unified Theory of Religion, Hypnosis, and Psychopathology, taken out of context, could be interpreted as an encomium for disassociation as an aid to efficient multi-tasking. The opposite is true: Schumaker hypothesises that light-trance states permit the absorption of erroneous, irrational, and contradictory data and ideas, ideas which are stored in long-term memory and which can lead to an over-reliance on the ‘automatic’ neural network as opposed to the ‘controlled’.

Schumaker is referring in particular to the adoption of irrational ideas by otherwise rational people who seem either unable or disinclined to examine the fundamental contradictions in their religious or political beliefs.

However, his postulation that certain states of mind can permit incoming data to by-pass the usual filters of rationality — the conscious process that identifies and categorises new information before deciding whether to commit it to long-term memory — is also applicable to language acquisition in certain psychological environments.

Schumaker found that his statistics courses were extremely effective at inducing disassociation in his students. It could be that the boring, mechanical, and irrelevant nature of much of Japanese English education, coupled with the frenetic memorisation of thousands of Japanese-English word pairs with little or no consideration for collocation or context (pace commercial study aids), is in part responsible for the passive acceptance of presented material and, conversely, the reluctance or inability to examine critically the English so acquired.

It is perhaps no coincidence that, in informal surveys conducted by the author, of the approximately 50% of students who stated that they disliked or hated English, more than 70% identified the first year of high school as the starting-point of their aversion.

The Future of the Corpus

Although analysis of the current corpus was carried out on material gathered up to July 2006, the collection of English questions generated by Japanese students has continued. These additional questions have not yet been incorporated into the main corpus, so no firm figure can be given for the current total. The number is, however, well over 10,000. Possible uses of the corpus are suggested in this paper, all requiring the cooperation of colleagues with skills in areas such as linguistics, psycholinguistics, statistics, cognitive psychology and neuroscience. Such cooperation is warmly invited.


Baddeley, A. D. (1999). Essentials of human memory. Hove, England: Psychology Press.

Baddeley, A. D. (2001). Is working memory still working? American Psychologist, 56, 849–864.

Barker, J. E. (2001). Semantic and phonological competition in the language production system. The University of Arizona, Tuscon, AZ.

Becker, J. T., & Overman, A. A. (2002). The memory deficit in Alzheimer’s disease. In A. D. Baddeley, M. D. Kopelman, & B. A. Wilson (Eds.), The handbook of memory disorders (2nd ed., pp. 269–589). New York: J. Wiley.

Bergström, Z., Anderson, M., Buda, M., Simons, J. S., & Richardson-Klavehn, A. (2010). Concealing guilty knowledge by retrieval suppression in an ERP memory detection test. Paper presented at Human Brain Mapping 2010, Barcelona.

Blakemore, C. (1977). Mechanics of the mind. Cambridge: Cambridge University Press.

Bock, K., & Griffin, Z. M. (2000). The persistence of structural priming: Transient activation or implicit learning? Journal of Experimental Psychology: General, 129(2), 177–192.

Buda, J. (1991). Electronic network communication. Otsuma Women’s University Annual Report: Humanities and Social Sciences (XXIII), 73–90.

Buda, J. (1993). The formatting of network messages. The Cultural Review, 4, 69–94.

Buda, J. (2005). Phonepass at the Waseda University School of Commerce. Paper presented at Ordinate, Menlo Park CA.

Buda, J. (2006). Course management systems. The Cultural Review, 28, 49–73.

Buda, M., Fornito, A., Bergström, Z., & Simons, J. S. (2010). The influence of paracingulate sulcus variability on reality monitoring. Paper presented at Recognition Memory Symposium, University of Bristol.

Chen, J., Belkada, S., & Okamoto, T. (2004). How a web-based course facilitates acquisition of English for academic purposes. Langage, Learning and Technology, 8(2), 33–49.

Dixon, N. F. (1987). Our own worst enemy. London: J. Cape.

Fernandes, M. A., & Moscovitch, M. (2000). Divided attention and memory: Evidence of substantial interference effects at retrieval and encoding. Journal of Experimental Psychology: General, 129(2), 155–176.

Gallistel, C. R., & King, A. P. (2009). Memory and the computational brain: Why cognitive science will transform neuroscience. Chichester, West Sussex, UK: Wiley-Blackwell.

Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. New York: Wiley.

Hernandez, A., Li, P., & McWhinney, B. (2005). The emergence of competing modules in bilingualism. Trends in Cognitive Science, 9(5), 220–225.

Hirowatari, T. (2009). Machigaidarake no katakana eigo. Tokyo: Gakken Shinsho.

Ishiwata, T. (1983). Gairaigo to eigo no tanima. Tokyo: Akiyama Shoten.

Kandel, E. R., Schwartz, J. H., & Jessell, T. M. (2000). Principles of neural science (4th ed.). New York: McGraw-Hill, Health Professions Division.

Kaschak, M. P., & Borreginne, K. L. (2008). Is long-term structural priming affected by patterns of experience with individual verbs? Journal of Memory and Language, 58, 862–878.

Kensinger, E. A., & Schacter, D. (2006). Memory and cognition reality monitoring and memory distortion: Effects of negative, arousing content. Memory and Cognition, 34(2), 251–260.

Kitao, K. (1982). JALT participates in tenth ‘kaizenkon’. The Language Teacher, 6(2).

Kitao, S. K., Kitao, K., Nozawa, K., & Yamamoto, M. (1985). Teaching English in Japan. Retrieved 1 June, 2010, from

Kormos, J. (2000). The role of attention in monitoring second language speech production. Language Learning, 50(2), 343–384.

Kubota, R. (2002). The impact of globalization on language teaching in Japan. In D. Block & D. Cameron (Eds.), Globalization and language teaching (pp. 13–28). London: Routledge.

Levelt, W. J. M. (1983). Monitoring and self-repair in speech. Cognition, 14, 41–104.

Liu, J. (2009). Self-repair in oral production by intermediate Chinese learners of English. TESL-EJ, 13(1), 1–15.

MEXT: Ministry of Education, Culture, Sports, Science and Technology. (2003). The course of study for foreign languages. Retrieved 1 June, 2010, from

Mitchell, K. J., & Johnson, M. K. (2009). Source monitoring 15 years later: What have we learned from fMRI about the neural mechanisms of source memory? Psychological Bulletin, 135(4), 638–677.

Nagano, R. L. (1997). Self-repair of Japanese speakers of English: A preliminary comparison with a study by W. J. M. Levelt. Bulletin of Science and Humanities, 65–90.

Nakano, M., Owada, K., Oya, M., Ueda, N., Yamazaki, T., Tsutui, E., et al. (2005). Japanese English at the university level: Dysfluency analysis in spoken English. World English. Retrieved 1 June, 2010, from

Papagno, C. and Vallar, G. (1992) Phonological short-term memory and the learning of novel words: The effect of phonological similarity and item length. The Quarterly Journal of Experimental Psychology Section A, 44(1), 47–67.

Perfect, T. J., Field, I., & Jones, R. (2009). Source credibility and idea improvement have independent effects on unconscious plagiarism errors in recall and generate-new tasks. Journal of Experimental Psychology: Learning, Memory and Cognition, 35(1), 267–264.

Reedy, S. M. (2000). TOEFL scores in Japan: Much ado about nothing. Retrieved 1 June, 2010, from

Rose, S. P. R. (1976). The conscious brain (Updated ed.). New York: Vintage Books.

Schegloff, E. A., Jefferson, G., & Sacks, H. (1977). The preference for self-correction in the organization of repair in conversation. Language, 53(2), 361–382.

Schnick, T., & Knickelbine, M. (2000). The Lexile Framework: An introduction for educators. Durham, NC: MetaMetrics.

Schumaker, J. F. (1990). Wings of illusion: The origin, nature, and future of paranormal belief. Buffalo, N.Y.: Prometheus Books.

Schumaker, J. F. (1995). The corruption of reality: A unified theory of religion, hypnosis, and psychopathology. Amherst, N.Y.: Prometheus Books.

Simons, J. S., Henson, R. N. A., Gilbert, S. J., & Fletcher, P. C. (2008). Separable forms of reality monitoring supported by anterior prefrontal cortex. Journal of Cognitive Neuroscience, 20(3), 447–457.

Stark, L.-J., Perfect, T. J., & Newstead, S. E. (2005). When elaboration leads to appropriation: Unconscious plagiarism in a creative task. Memory, 13(6), 561–573.

Vallar, G., & Papagno, C. (2002). Neuropsychological impairments of verbal short-term memory. In A. D. Baddeley, M. D. Kopelman, & B. A. Wilson (Eds.), The handbook of memory disorders (2nd ed., 249–271). New York: J. Wiley.

Viel, J.-C. (2002) An overview of modal auxiliary verbs in E.S.T. Retrieved 1 June, 2010, from .

Winston, R. M. L., & Wilson, D. E. (2004). Human. London: New York: DK Publishing.