Part II: WQI Protocols
Content Analysis and Training Guidelines and Protocols
- Section 1: Overview of Content Analysis Protocols
- Section 2: Wiki Sampling
- Section 3: Content Analysis Work Flow
- Section 4: Evaluating Wiki Edit Histories
- Section 5: Ethical Concerns in Online Content Analysis and IRB Considerations
- Section 6: Overall Ratings
- Section 7: Training Protocols
- Section 8: Conclusion
Section 1: Overview of Content Analysis Protocols
The wikis that we study are extremely diverse. They are used with elementary schools through high schools, in nearly every subject area imaginable, and for a wide variety of educational purposes. They range in size and complexity from a single page with no revisions to wikis with hundreds of pages revised thousands of times. Accurately characterizing the activity on wikis is very challenging work. In this section, we present the current strategies that we are using to meet these challenges. Readers of some of our published articles may find some discrepancies between the procedures described here and our published procedures. The protocols listed here reflect our most recent, most refined thinking.
In our first round of coding, researchers evaluate the demographic features of a wiki, which we treat as time-invariant. Two coders evaluate each wiki. First, they determine the wiki’s eligibility for our studies by confirming that the wiki is visible (not private, deleted, or unchanged), used in the United States, and used in K-12 settings. The same coders then determine the subject area(s), grade level(s), and hosting school(s), district(s) or other site(s) of the wiki. A third coder reconciles disagreements. We then provide this information to two additional coders who evaluate wiki quality. (We attempt to have the same people who evaluated wiki demography evaluate wiki quality, since they can often do the quality coding more quickly than someone who needs to examine a wiki for the first time. Sometimes the timing does not work out—some people work faster or have more time than others—so this is a preference rather than a rule.)
To evaluate wiki quality, we use the Wiki Quality Instrument (WQI). The WQI contains 24 dichotomous items that probe for the presence or absence of behaviors on the wiki that provide opportunities for students to develop 21st century skills. These 24 items are in five subdomains: Information Consumption (2 items), Participation (4 items), Expert Thinking (5 items), Complex Communication (7 items), and New Media Literacy (6 items). We measure wiki quality at days 1, 7, 14, 30, 60, 100, and 400. Two coders evaluate wiki quality for each wiki at each of these time points. A third coder reconciles disagreements.
To determine a composite wiki quality score, we sum the values of the 24 WQI items. To determine subdomain wiki quality scores, we sum the values of the items in each of the five subdomains. These scores are then used as outcome measures in our analyses.
Section 2: Wiki Sampling
We have gathered three wiki samples from PBworks.com, each using different strategies.
In our first sample, we sought to evaluate a representative sample of all wikis created from the founding of PBworks in June of 2005 through August of 2008. PBworks provided us with a list of the URLs for all the publicly-viewable, education-related wikis. We assigned each of these 179,851 wikis an ID number, and we used random.org to draw a non-repeating series of 1,799 numbers from the larger set. We then analyzed this smaller sample.
In our second sample, we sought the ability to identify a representative sample of wikis very close to their creation date so that we could survey wiki creators soon after they started their wiki. To do this, in September of 2010, we had PBworks send us a list of all recently-created, publicly-viewable, education-related wiki URLs every two weeks. These wiki creators received an automated survey solicitation, generated by PBworks, as soon as they signed up for a wiki. We then sampled 500 wikis at random, every two weeks, for eight weeks (giving us a sample of 2,000 wikis out of approximately 20,000 publicly-viewable, education-related wikis created during this period). As soon as we received the list, we then sent each wiki a follow-up solicitation to participate in our survey.
In our third sample, we gathered a convenience sample of wiki-creators who had completed our survey. All PBworks educational wiki creators from June 2010 through December of 2010 received an automated survey solicitation which included the option to provide us with their URL, so we could include them in our study. Through our surveys we received 510 URLs. Since we do not know precisely how many U.S., K-12 wikis were created during this period, it is impossible to precisely know the response rate, but it is low, probably below 10%. If the shared-URLs were publicly viewable, they were included in our survey. If the URLs we received were for private wikis, we sent a solicitation inviting the host to join our study by adding us to their wiki community.
Section 3: Content Analysis Work Flow
Coding thousands of wikis at multiple time points with multiple raters presents a set of complicated logistical challenges. Through experimenting with a variety of protocols and processes, we have settled on a set of strategies to manage these challenges. In this section we describe how we organize our wikis and conduct our demographic and quality analyses.
First, when we receive the “population lists” of wikis from PBworks, we assign every wiki a unique ID, which we called the DCLCID. This identification number is essential; it allows us to have multiple researchers simultaneously analyze different aspects of a wiki, and then bring these analyses together in a common dataset. After assigning DCLCID’s to wikis and drawing samples from the population, we organize lists of sampled wikis into spreadsheets for further analysis.
Organizing Coding Activities with Spreadsheets
We organize our coding activities around a large set of short Excel worksheets. Each sheet is designed so that it includes only those wikis and wiki time-periods that a coder is expected to work on, only the information that she needs for the task, and only those blank columns which she needs to complete. (We experimented with having large sheets and assigning coders to only work on parts of them. For instance, we created worksheets with 500 rows and would assign a coder to complete rows 101-200. This led to errors, with coders completing the wrong rows, so we settled on a system where coders only received one sheet at a time that included exactly the amount of work they were supposed to complete). Most of our coders are Masters students who work 8-10 hours per week, so sheets are designed to take 1-2 weeks to complete. In the sections below, we describe how we create and organize these various spreadsheets to conduct our coding exercises. Examples are available from the first author.
Wiki Coding Round 1: Demographic Coding
In the first round of demographic coding, we divide our wikis into sets of 100. We create a series of spreadsheets where each sheet contains 100 rows, each corresponding to one wiki. Each sheet has several pre-populated columns of information, such as the URL, the creation date supplied by PBworks, and the DCLCID. Each sheet also has one column header for each of the demographic questions in the WQI. While some of these items require text answers (such as a short narrative of the wiki’s purpose or the name of the wiki’s hosting school), most are dichotomous items (does the wiki serve K-5 students? 6-8? 9-12?). Therefore, we can easily validate the data of these dichotomous items by ensuring that all fields include only 1s or 0s, with no blanks or other characters. The list of items and decision rules for each item can be found in the Wiki Quality Instrument. Each sheet is given to two different coders for the initial round of demographic analysis.
We treat the demographic information for each wiki as time-invariant. Thus, we do not track the exact time in which students begin participating in a wiki created by an educator, we simply record the wiki as having student participants. This means that for our demographic coding sheets, each row corresponds to a single wiki.
Identifying demographic information on a wiki often involves a certain amount of “detective” work. In some wikis, all of the demographic information is very clearly presented. For instance, a teacher might create a wiki for her students and indicate the course the wiki is used in, the school she works at, the grade level of her students, and so forth. In these circumstances, identifying the demographic characteristics of the wiki is quite simple. In many cases, however, this analysis is much less straightforward.
For instance, we encounter a large proportion of wikis with very few changes. We still hope to understand as much as we can about these wikis, so we can compare more and less successful wiki learning environments. Therefore, we invest quite of bit of time in examining whatever snippets of content are available to discover as much as we can about each wiki. For instance, teachers often use their real names as their PBworks user IDs, so sometimes we can search for these names online and identify the school where a teacher works and the classes he or she teachers. This works better with more uncommon names. Often, teachers and students put their email address on a wiki, which can help us identify where they are from and what subjects the wiki is meant to support. Sometimes teachers will have links to other online learning environments, like their blogs or class websites, that allow coders to identify users and schools. Teachers sometimes use their school initials in the wiki URL, so if we find the initials CRLS in a URL, we can search for that combination of letters and find that there is a school called Cambridge Rindge and Latin in Cambridge, Massachusetts. Whenever possible, we use multiple sources of information like this to triangulate and corroborate our findings.
For experienced research assistants, conducting the demographic coding on a spreadsheet containing 100 wikis typically takes between 5 and 12 hours. In our budgeting and time planning, we assume that demographic coding the average sheet of 100 wikis would take about 8 hours on average, or approximately 5 minutes per wiki. Many wikis can be coded very quickly, in a minute or two. Wikis created in foreign countries or in higher education settings are not eligible for our study; these can be coded very quickly. Some wikis contain content which clearly labels the activity on the wiki or contain almost no content at all making identification impossible; these are also simple to code. These are balanced by wikis with many pages or with limited information that can require as many as 30 minutes to examine and explore.
Wiki Coding Round Two: Demographic Coding Reconcile
Since two coders complete each demographic coding task, we have a third rater reconcile disagreements. To reconcile the two original worksheets, we take each completed demographic sheet and copy the data into a new Excel workbook. One sheet is named CODER1, and the second sheet is named CODER2. We then create a third sheet which we use to reconcile disagreements.
For all of the columns that record the values of dichotomous items (grade level, subject, users), we use a series of IF functions in columns to identify disagreements. For instance, assume that column B records whether or not a wiki is visible, with cells coded “0” for not visible (set to private or deleted) or “1” for visible. For cell B2 on the reconcile worksheet, we use the formula IF(CODER1!B2=CODER2!B2, CODER1!B2, “X”). This formula reconciles the B2 columns from both coder sheets such that if two coders agree on a rating, the rating stands. If they disagree, it is marked with an X.
For all of the columns that record text values, such as the wiki narratives and the names of hosting institutions, we simply transfer this information onto the third reconcile sheet. The narratives are not reconciled. They are there for the reference of future coders. The school, district, and/or other hosting site information is also transferred to the third reconcile sheet, and an additional set of columns are generated to be completed by the third rater to create a final reconciled list of hosting institutions.
This third reconcile sheet is generated entirely using Excel formulas, so to create a copy to send to the third rater, we copy the entirety of the third sheet (comprised of formulas) onto the fourth sheet using the “Paste Values” options, so the fourth sheet is comprised entirely of values. We then use conditional formatting to mark all Xs with a red background. This is then sent to the third rater, who is tasked with turning all of the red Xs into 1s and 0s and with determining the final list of hosting institutions.
The time required for usage reconciling varies considerably, both with the complexity of the wikis in any particular sheet as well as the degree of difference between two raters. Typically, in a set of 100 wikis, over 50 will have perfect agreement between the two raters (mostly on wikis that do not meet criteria for the study), so the third rater is only examining half of the wikis. Another substantial portion of wikis, typically about 33, will only have a small number of disagreements. The remaining 17 wikis will have more substantial disagreements. Experienced coders typically took approximately 3 hours to reconcile a sheet. Thus, if the initial two raters take approximately 8 hours each to code the demographic questions from the WQI, and a third rater takes approximately 3 hours to reconcile, then each set of 100 wikis will take approximately 20 worker-hours for the demographic coding. Thus, a sample of 2000 wikis might take about 400 worker hours. In our group, with 10 researchers working about 8 hours a week at $16/hour, that would take approximately five weeks to complete and cost about $6400.
Once reconciled, this sheet will consist of 100 rows, each representing a wiki, where each row records demographic information about the wiki. We then use the filtering option to remove all wikis that are not eligible for our study (private, deleted, not U.S., not K-12) from the set. Typically, approximately 60% of wiki URLs out of a random sample of publicly-viewable, education-related wikis will prove to be ineligible for our study. This filtered set of eligible is saved to be prepared for quality coding.
Wiki Coding Round 3: Quality Coding
We believe that quality is a time-varying feature of wiki-learning environments, and therefore we evaluate wikis at multiple time points. In our most recent studies, we evaluate wikis at 1, 7, 14, 30, 60, 100, and 400 days. We provide a more complete description of our reasons for choosing these time points in Part IV: Developing the WQI Protocols, but we summarize our rationale here. We know from prior research that the median lifetime of wikis in U.S. public schools is about 13 days, that on average most wiki activity happens within the first weeks after a wiki’s creation, and that on average wiki quality growth is greatest within the first week. Since, capturing wiki quality growth in the first two weeks of a wiki’s lifetime is vital, we take measurements with the WQI at days 1, 7, and 14. In our most recent studies, we also have taken measurements at days 30 and 60, which represent two and four times the median lifetime, and at day 100, which is approximately the duration of one semester. If we needed to reduce the number of measurements in order to save costs, we would probably eliminate the measurement at day 60 first and then day at 30. We also take measurements at day 400, since we know that many wikis experience a surge of activity after approximately one year.
To generate our quality coding sheets, we take our completed demography sheets and we use a SAS routine to generate one row for each of the occasions of measurement for each wiki. The SAS routine takes the wiki’s creation date, and adds 1 day for the day one measure, adds 7 days for the day seven measure, and so forth. Thus if a coder is assigned to evaluate a wiki created on Feb. 1 on its seventh day, they will know to evaluate all edits made through Feb. 8. Since most reconciled demography sheets contain approximately 40 eligible wikis, if we conduct an analysis that includes seven occasions of measurement, our quality sheets will typically have approximately 280 rows.
To code the quality sheets, coders first examine each wiki and determine the last day it was changed. The wiki is then coded for all occasions of measurement that capture changes in the wiki up to day 400. For instance, if a wiki’s last change is on its 16th day, it will be coded for days 1, 7, 14, and 30. If a wiki’s last change is on its 99th day, it will be coded on days 1, 7, 14, 30, 60, and 100. If its last change is on its 645th day, it will be coded on days 1, 7, 14, 30, 60, 100, and 400. The quality coding sheets that we generate have a row for every occasion of measurement, and coders manually determine which rows are to be completed and which are to be left blank. On average, we code approximately 4 time periods per wiki. Thus, on a sheet with 40 wikis, we are likely to code approximate 160 rows.
At each occasion of measurement we identify behaviors on the wiki that provide opportunities for students to develop 21st century skills that have occurred up to the occasion of measurement. Therefore, when we evaluate wiki quality at day 14, we evaluate all opportunities for students to develop 21st century skills up through day 14. This means that any of our quality items coded “1” at day 1 will also be coded “1” at all subsequent days. In our coding scheme, opportunities for 21st century skill development cannot disappear. This means that wiki quality scores are monotonic.
Coders use the WQI to identify the presence or absence of 24 types of behaviors that commonly occur on wikis that provide opportunities for students to develop 21st century skills. The behaviors are detailed in the decision rules of the quality items for the WQI.
In order to be able to evaluate how wiki usage changes over time, our coders must be able to evaluate every revision to every page and file on the wiki. In the next section, we detail the strategies that our coders use to conduct this historical analysis.
In addition to coding the 24 quality items of the WQI, our research assistants subjectively rate wikis based on the degree to which they feel that the wiki provides opportunities for participation, expert thinking, complex communication, and new media literacy development. These subjective ratings are never reconciled. They are described further in Section 6: Overall Ratings.
In terms of timing, most coders can code approximately 15 occasions of measurement in one hour, on average (again, some wikis take only a few seconds, and others can take hours). Out of 100 randomly sampled wikis, 40 will be eligible for our study. Each wiki will require on average 4 time periods to be coded, meaning that we typically need to code 160 rows. If coders complete approximately 15 time periods in one hour, then the typical sheet takes between 10 and 12 hours for one coder to evaluate the quality items from the WQI.
Wiki Coding Round 4: Quality Reconciling
After two coders have conducted the quality coding for each wiki, their two sheets are reconciled using the IF function in Excel as described above. All disagreements are marked with an X and highlighted in red using conditional formatting. A third rater then reconciles these quality codes, turning all X’s into 1’s and 0’s. Again, about half of all rows will be in perfect agreement, and then other half will require some form of reconciling. We found that researchers typically took approximately 8 hours to reconcile a typical sheet.
Thus with two people coding, each requiring 12 hours, and approximately 8 hours of reconciling, a typical sheet of 40 wikis (separated out from the original 100 wikis) coded at seven time periods requires approximately 32 worker-hours to complete. Therefore, to code 800 wikis (separated out from the original 2000) requires approximately 640 worker-hours. In our group, with 10 researchers working about 10 hours a week, that would take approximately eight weeks to complete and at $16/hour, cost about $10200. The entire process, therefore, takes a team of 10 trained researchers approximately three months to evaluate 2000 wikis and costs approximately $17,000 at $16/hour.
Merging Coding Data and Other Data
Once all quality coding is completed, all of the reconciled quality sheets can be merged into a dataset which includes all of the demographic and quality codes for each occasion of measurement for each eligible wiki.
This dataset then is merged with two other sources of data. In order to obtain demographic data about hosting schools, we use data from the National Center for Education Statistics Common Core of Data. To obtain this data, first, a researcher examines our records of the hosting institution for each wiki. These hosting institutions can be public schools, independent schools, districts, town libraries, district consortia (like the BOCES in New York or the Area Education Associations in Iowa), or other institutions. For wikis hosted by public schools or districts, we obtain their NCES School ID and NCES District ID from the website http://nces.ed.gov/ccd/schoolsearch/index.asp. We then use these ID numbers to obtain data from three data files hosted at http://nces.ed.gov/ccd/ccddata.asp: the Public Elementary/Secondary Universe Survey Data (which contains data about individual schools) and the Local Education Agency (School District) Universe Survey and Local Education Agency (School District) Finance Survey (which contain data about school districts). Using these ID numbers, we merge the demographic data about schools and districts—such as the percentage of students in a wiki’s hosting school eligible for Free and Reduced Priced Lunches—into our wiki dataset.
For some of our samples, we also have teacher surveys. For a time, PBworks agreed to automatically send a survey solicitation to all creators of education-related wikis. We are dependent upon survey takers entering the URL of their wiki into a question in our survey to identify the wiki associated with each survey takers. We use that URL as the link between our wiki dataset and the survey data.
Outline summary of workflow
- Obtain population level data from PBworks
- Assign DCLCIDs
- Take random sample of wikis
- Use random.org to draw a non-repeating series of DCLCIDs from the population list
- Code for WQI demographic items
- Two coders evaluate demographic items
- Identify disagreements using Excel’s IF function
- Third coder reconciles disagreements
- Filter out ineligible wikis
- Code for WQI quality items
- Two coders evaluate quality items
i. Two coders subjectively rate wiki quality, and these ratings are never reconciled
- Identify quality coding disagreements using Excel’s IF function
- Third coder reconciles disagreements
- Obtain school and district level demographic data
- Coder uses school and district names to identify NCES School and District ID numbers
- Using NCES ID numbers, merge wiki quality data with school and district demographic data
- Obtain teacher survey measures
- Use wiki URL’s provided by teacher survey takers to link wiki quality data with data on teacher attitudes and practices
Section 4: Evaluating Wiki Edit Histories
One of the signature features of our research approach is that we leverage the fact that wikis preserve a real time history of every revision to every page. For instance, if we view only the most recent version of a wiki page, it is impossible to precisely determine the kinds of collaborative behaviors that may be responsible for creating that page. However, if we evaluate every revision for a wiki page, we can determine how multiple contributors work together in varying ways to co-construct the content of the page. As a research team, we have developed several strategies for evaluating wikis and wiki edit histories.
When evaluating a wiki for the first time, most coders will first make a holistic evaluation of the wiki. They will use the navigation settings on the right-hand sidebar of the wiki to review the current versions of each page, to see the kinds of files that are uploaded, and to evaluate the navigational structure of the page. They then use a number of different approaches to begin to probe the historical record of each wiki.
Each wiki includes a Recent Activity link, which takes viewers to an automatically populated page that shows all of the recent activity on the wiki: new pages created, new page revisions, new comments, and new files uploaded. If wikis have very few changes, this link may provide a comprehensive list of the total history of the wiki. For wikis with many pages and changes, this link might only provide a tiny fraction of the total wiki activity. Nonetheless, it always provides a useful overview of the patterns of activity in the recent history of the wiki.
Within the PBworks navigation system, viewers can also follow the Pages and Files link to a listing of all pages created on the wiki and all files uploaded. For each page and file, there is a link to a list of the complete revision history of each page. In the current Graphical User Interface (GUI), this link can be found by mousing over a page or file until the More button appears, clicking the more button, and the clicking the link which says x revisions, where “x” is the number of revisions to the page. Following this link, the viewer will encounter a list of page (or file) revisions continuously recorded to the second.
The most efficient way to manually browse all of the revisions for a page is to use the following procedure: First, a researcher scrolls down to the first version of the page. She then right-clicks the link to this version and then chooses to Open in a New Tab. She will then have a new browser tab with the first version of the page. (It is important for coders to realize that when looking at historical page revisions, the comments are not included in this view. Comments are only viewable when looking at the present version of a page through the regular browser interface.) Next, she right-clicks on the link to the second version of the page, and she opens it in a new tab. She repeats this process for every revision. Her browser will now have a set of tabs where each tab renders a version of the wiki page in chronological order. By clicking on each tab sequentially, the coder can evaluate the changes in the wiki history.
This protocol needs to be modified if coders are evaluating a wiki up to a certain occasion of measurement. For instance, if a coder is only evaluating a wiki through its first 30 days of changes, then the coder first needs to calculate the correct date to stop evaluating changes. In our workflow process, the dates of occasions of measurement are produced along with the coding sheets for wiki quality coding. Thus if a wiki is created on Feb 14th, the coder knows that the day 1 measurement should evaluate all changes through Feb. 15, the day 7 measurement through Feb. 21, the day 14 measurement through Feb. 28, and so forth. With this information, the coder should only open new tabs for revisions that occur before the appropriate cutoff date.
This process can be quite cumbersome, so we developed our own browser interface to evaluate wiki edit histories: the Wiki Coding Tool (WCT) (http://tool.edtechresearcher.com/code/). The problem with the PBworks Pages and Files interface is that time is nested within pages. That is, the edit histories for each page are recorded under the name of each page. Comparing wikis at varying time points using this model is quite difficult. Thus, we created the WCT to organize wikis such that pages are nested within time. That is, the WCT allows a coder to select an occasion of measurement, such as day 7, and to restrict her view to only those pages and revisions that occurred up to or before the wiki’s seventh day.
The WCT uses the PBworks Applied Programming Interface (API) in order to “rearrange” the page revisions for each wiki. APIs on websites are the programming languages that computers use to query websites, in contrast to the navigation buttons and links that humans use in their browsers to query websites. To begin using the WCT, coders enter the URL for a PBworks wiki. Coders then choose an occasion of measurement for the wiki, and they are presented with a dropdown list of all pages that had been created up to that occasion of measurement. When a coder selects a page, she is then presented with the most recent version of the page. There are also “forward” and “back” buttons that allow her to quickly scroll between page revisions from the original version through the final revision before the cut-off time of the occasion of measurement. This system is considerably easier than the process of manually opening every revision to every page.
The WCT illustrates a crucial point for the future of education research in online learning environments: online learning platforms are designed to facilitate entering content. The kinds of navigational structures that facilitate content creation are not necessarily well-suited for evaluation of content creation processes. In our case, PBworks makes it easy for multiple people to contribute to wiki pages, but it is difficult to examine those contributions over time. The development of the WCT was our attempt to resolve this problem, and it suggests to us that education researchers will need to develop expertise in the years ahead in using APIs to reorganize online learning environment data with an eye towards content analysis rather than content contribution.
The WCT has a number of limitations. Periodically, PBworks has changed its URL structure or other features of its service, and these changes have rendered the WCT unusable for periods of time as we re-program the WCT. Also, since the WCT calls up historical versions of pages, coders still need to evaluate the entire wiki to find comments, which are not rendered on historical page revisions. We are still refining the WCT so that it can evaluate all PBworks wikis; currently it cannot evaluate private wikis, even if we have been invited, and it cannot evaluate wikis with unusual characters in its page names. These are tractable problems, but they remain unresolved.
In summary, our coders used a variety of strategies to evaluate wikis and their edit histories. These strategies included browsing the navigation of each wiki, examining the recent activity pages generated automatically by PBworks, and using the PBworks Pages and Files interface and our own Wiki Coding Tool to conduct a detailed evaluation of every page of every wiki.
Section 5: Ethical Concerns in Online Content Analysis and IRB Considerations
The widespread availability of new forms of online data has created a new set of ethical challenges for educational researchers and Institutional Review Boards (IRB). What kinds of protections are necessary to ensure that participants in online learning environments can be kept safe from harm as we research those learning environments? How can researchers educate IRB staff about these new environments so that research can progress and new methods can be used while protecting research subjects? These are open questions as new methods tackle new technologies.
For most of our research methods, our IRB and subject consent protocols were quite typical to other forms of educational research. When we conducted interviews with faculty, we sought consent from schools and teachers. When we conducted interviews with students, we sought consent from schools, parents, and students. Our surveys solicited teacher consent. For our in-class observations, we followed our IRB protocol of requiring teacher and principal consent. Some districts required district-level consent from the central office, and some schools required parental consent for each student in a classroom.
For our analysis of publicly-viewable wikis, we were not required to gather consent. Our IRB determined that these were public Web sites, and since anyone can view and analyze them, so could we. We got some assistance from PBworks in analyzing these sites, particularly in giving us lists of the URLs of publicly-viewable, education-related wikis. They also provided us with some usage information about these wikis, such as the number of registered users, number of edits, number of new pages and so forth. Since all of this information is accessible on the Web, it would have been possible for us to devise analytic tools that would have gathered all of this information ourselves. PBworks eased our task, but they did not provide us with privileged information.
For wikis set to be privately-viewable, we had additional constraints. Our IRB required that we solicit the permission of the wiki creator, but not all members. We only attempted to get permission from wiki creators who were teachers, so we did not attempt to solicit permission to view wikis created by students (which would have required parental consent). We also suggested that wiki creators inform participants that we would be viewing, and we asked them not to invite us to join if they believed that doing so might put someone at risk of any kind of harm. We also committed to never making any kind of change to a wiki, only passively viewing.
As an additional precaution, as a group, we decided not to publish wiki URLs or direct text from wikis in our reporting. We knew from our content analysis that students did not always follow best practices in terms of protecting their identities online (though for the most part they did). Therefore, we decided to describe wikis and their activities, but not to quote content or share links that would bring additional scrutiny to the wiki. We are willing to share this data on request with researchers who would like to conduct additional analyses on our data under the guidance of their own IRB.
Overall, we found that most educators and students were very grateful for the opportunity to share their stories and lend their thoughts to our efforts to understand the use of wikis in K-12 settings. Our IRB at Harvard was very willing to work with us to develop protocols to explore this new domain. As technology develops, researchers and IRB staff will need to continue to collaborate to find safe, effective strategies for studying learning in online environments.
Section 6: Overall Ratings
In addition to the 24 items of the WQI, our coders also made four “overall quality” ratings for every wiki. They assessed the degree to which wikis provide opportunities for students to participate in the wiki, and to develop expert thinking, complex communication, and new media literacy skills. They rated these four domains on a 7-point Likert scale. Research assistants were encouraged to use any criteria that they wanted for these four overall ratings. They could consider our 24 items and/or they could consider any other evidence that might influence their assessment. We asked them to attempt to be consistent internally, but they were not required to have their criteria for the overall ratings cohere to any set of group norms.
We designed these ratings with two purposes. First, we hoped that these more subjective ratings would help us identify dimensions of quality that are not captured by the WQI. For instance, we can examine wikis with high ratings but low WQI scores, or low WQI scores but high ratings to attempt to identify the causes of these discrepancies. Second, we wanted to test whether certain questions might be answered more efficiently with overall ratings rather than quality coding. For instance, if ratings correlated highly with WQI scores, then ratings might prove to be a more time efficient method for evaluating wiki quality that the WQI.
To date, we have not conducted any analyses using the overall ratings. The costs of gathering the ratings was very low after research assistants had done the WQI coding, so we are not concerned if it takes us sometime to circle back to these data.
Section 7: Training protocols
In order to generate and maintain high levels of interrater agreement, our training regime involves three processes: an initial orientation, practice coding of a training set of wikis, and ongoing team meetings. Our goal with this training regime is to develop a cohesive understanding of the content analysis process among our entire team of research assistants. We try to only hire new research assistants once per year, so they can work as a team all year. We also attempt to only hire research assistants with classroom teaching experience, so our assistants will come to the work with a general sense of how schools and classrooms in the United State typically operate. In our most recent studies, our team has comprised approximately 10 people working 10 hours per week each.
When research assistants are brought into the team, they are given an extensive orientation to PBworks wikis and the Wiki Quality Instrument. We first ask new RAs to examine a representative set of wikis and explore this set so they get a general sense of the learning environments that we study. We also ask research assistants to read the Wiki Quality Instrument. We then have a series of introductory meetings that explain the goals of our research team, reviews our publication history, and introduces team members to our work process. Afterwards, we provide a detailed orientation to the Wiki Quality Instrument over several additional meetings. We review each demographic and quality item’s decision rules, show examples of wikis that meet the criteria for each item, and allow experienced team members to discuss common difficulties. We also have experienced coders discuss the strategies that they use to do the detective work required by the demographic questions and to do the content analysis required by the quality items of the WQI. We then provide new research assistants with an introduction to the Wiki Coding Tool, including an online video, an online test of their ability to use the features of the Wiki Coding Tool, and finally an in-person meeting to resolve questions.
After this orientation process, coders begin to practice their skills in a training set. This training set is developed by experience coders who independently use the WQI to evaluate the demographic and quality items and then come together to agree upon a set of correct answers. In developing the training set, we code a large number of wikis, perhaps 200, and then purposively select 50 to be included in the training set. We try to include in this set several kinds of wikis. First, we include wikis with difficult to find information, or information that can only be found if coders use some of the strategies we have developed for systematically analyzing wikis. Second, we include typical wikis that have usage patterns commonly found within the set. Finally, we include wikis that have items that proved difficult to code and difficult for even our experienced raters to agree upon.
We then give coders the first 25 wikis to evaluate at each appropriate occasion of measurement, so a typical training set will include 75 or 100 rows. Research assistants are required to reach 85% agreement with the training set across all categories of the WQI and have an average composite wiki quality score that falls within 1.5 points of the agreed correct average scores before being allowed to begin coding new wikis. After all trainees have completed their first 25 training wikis, we conduct an analysis of the trainee scores to determine which WQI categories have the most disagreement and which wikis have the most disagreement. We then hold a meeting where we give trainees the correct scores and share with them our analysis of disagreement. We review wikis and categories that were problematic, and we answer questions. Trainees are encouraged to go back to the original 25 wikis to review and correct their errors.
If trainees meet acceptable levels of agreement on the first training set, we have them start coding new wikis. If not, they are given the second training set of 25 additional wikis. If they are successful after the second set, we have them start coding new wikis. If not, by this point our group has conducted analyses of many additional wikis that we can use to create additional training sets. In our experience, we could get all of our trainees at acceptable levels of agreement within four training rounds. If we were not successful, we would have found other research tasks for that individual or counseled him or her out of our research group.
This extensive process takes weeks, and the team that we assembled in August was not functioning at full capacity until October.
To maintain a close alignment of scores, research assistants participated in weekly meetings to discuss wikis and quality categories that were particularly difficult to code. In the early months of the year, our training discussions were mostly driven by questions that new research assistants had about difficult wikis to code. As the year went on and we had more data from reconciling disagreements, we could target our discussions to focus on areas where we knew we had low levels of disagreement. We would often collectively analyze difficult wikis.
We also frequently asked research assistants to revisit the WQI. As we shifted from demographic coding to quality coding, or as we shifted from one sample of wikis to another, we required that our research assistants re-read the WQI and revisit the decision rule language that is fundamental to our ability to code in agreement.
Section 8: Conclusion
We believe that the availability of real-time data from online learning environments represents a watershed moment in education research. These data allow researchers to examine detailed records of student-teacher interactions in depth and at scale. Over the past three years, our group has put significant resources behind developing methods for content analysis in diverse learning environments, and our goal in publishing these methods online is to support other researchers who are attempting similar projects.
We welcome questions and feedback from fellow researchers and other interested parties. While we have chosen not to publish here all of the specific worksheets that we used or training sets that we developed, we are happy to share these materials with other researchers with a material interest in those parts of our research program.
 Some wikis are created and then never viewed at all by the creator, and when a coder visits the URL of one of these wikis they receive an error message. Some wikis are created and then viewed by the creator, and our raters could view these, even though they were unchanged.
 Sample sheets are available by request from the authors. We have not posted them here since we have decided not to repost URLs of wikis from our study.
 We have experimented with developing computational tools for determining a wiki’s creation date. We have found that a small number of districts and schools have institutional wiki creation processes. In these cases, API calls to the PBworks data warehouse for the wiki creation date can return dates for when a group of wiki subdomains are named and reserved, rather than when the wiki is actually first generated. Thus we manually check each wiki creation date.
 The Recent Activity link shows links by month and date and not by year, which can cause confusion when wikis have not been edited for several years. A review of the page histories, described in the following paragraphs, can resolve this potential confusion.