The SILS Learner Corpus of English

School of International Liberal Studies at Waseda University

What is the Corpus?
The SILS Learner Corpus of English is a collection of essays by students at SILS, the School of International Liberal Studies at Waseda University.

All SILS students take one to three semesters of writing in English as part of their required classes. The essays submitted for these courses are being collected into a database (the corpus) which can be studied for the purpose of understanding the development of SILS students' English writing skills and for creating textbooks and other teaching materials.
Special Features of the SILS corpus
There are many English learners' corpus projects around the world (see, for example, the list of learner corpus links on the page of Professor Tono), including some created in Japan. Like these, the SILS corpus provides valuable data for researchers studying the the process of learning a second language and for teachers who want to create materials which will help their students. But an interesting feature of this corpus is that the students come from an exceptionally wide variety of backgrounds. Right now, the majority of SILS students are Japanese; most of them completed their education in Japan, but a sizeable group attended schools outside of Japan, some in English-speaking countries. The rest of SILS students come from countries around the world--many from Asian countries, but even a few from Europe, North American and elsewhere. There are even a few students' whose native language is English. All of these students are studying together in the same undergraduate school, taking the same writing classes, and working on the same assignments, which means that we can use the essays in this corpus as a way of looking at the effects of native language and educational background on writing skills in English.

Another interesting feature of this corpus is that we will be able to collect many essays from each indiviual student; in some cases, we'll have three semesters worth of data from the same student. This means we'll be able to look at development of students' writing over time. (Of course, the students will not be identified by name, but researchers will be able to access infomation on individual students' language background and educational history.)

Finally, we plan to collect both first drafts and second drafts of essays. In some cases, we'll also have the teachers' comments on the drafts. This should be a valuable resource for studying the process of revision in writing.

At this time, there is no plan to code the essay for learners' errors. Instead, we plan to use standard concordancing software such as MonoConc and WordSmith Tools to analyze the corpus data.

Schedule for the Project

Stage 1 - December 2004 to July 2005: Survey of other corpus projects; preliminary plans for the SILS corpus; getting approval from writing program directors; purchasing and setting up equipment and software; designing the background information questionnaires. [Completed]

Stage 2 - August 2005 to March 2006: Creating the database program; collecting and inputting student background data and essays for six classes (more than 100 students, with 3 to 4 essays per student) to test the program. [Completed]

Stage 3 - April 2006 to March 2007: Collecting and processing a full year of student essays (more than 600 students in the spring term and more than 300 in the second, with 3 to 4 essays each). [Nearly completed]

Stage 4 - From April 2007: Continuing to collect and process students essays each year, while starting to analyze the data already collected.
For More Information
Contact Victoria Muehleisen, Assoc. Professor, School of International Liberal Studies, Waseda University.

More about Learner Corpora
(coming soon!)

Bibliography of Japanese articles on Learner Corpora
Related Links
This page was created on November 13, 2005, and last updated on February 24, 2007.