Structuring Contingency Table Data
Before any computer analysis can begin, files containing the relevant and appropriately structured data need to be prepared, a point so obvious it probably goes without saying. The most common format is a simple grid, a two-dimensional table with rows indicating cases (persons, etc.; subjects in older literature) and columns indicating variables. Examples of this format include the data sheets of most statistical programs and the individual sheets of most spread sheet programs—although proprietary programs typically obfuscate this simple format with binary files, files that require specialized software to decipher. For text files—files that consist of lines of easily-readable text—one common convention is to separate columns with tab characters (another common convention uses a comma to separate values) and most proprietary programs allow for the import and export of such tab-delimited files. This makes tab-delimited files a useful format for the exchange of grid-organized data.
Entering Data into ILOG: Direct Entry
There are two ways to enter data into ILOG. The first is by direct entry. When ILOG first opens, the default data display is a 2x2x2 contingency table whose cells all contain zero. By default, its three factors (dimensions) are named A, B, and C and their levels are named A1 and A2, B1 and B2, and C1 and C2.
The figure below shows the main window for ILOG when first opened with its tool bar icons and the default data display for a 2x2x2, A by B by C table. The figure shows the B by C table for level A1. To display the B by C table for level A2, either select next or select level A2 for factor A from the drop down list box to the left.
If you intend to analyze a 2x2x2 table, you could proceed directly to enter counts in the appropriate cells. Usually, however, you would begin by selecting Run > Define a New Table, which lets you define the number of factors, the number of levels for each, and factor and level names that reflect your data. You can then enter counts directly into the cells of the contingency table displayed on your computer screen.
The order of the factors matters. An order may seem arbitrary, but necessarily factors must be listed in some order and that order affects how tables are displayed. It should also reflect how you think about your factors. The factor you think of as prior to the others should be listed first, followed by the other factors in order, with the factor you think of as the outcome—the factor you want to explain—coming last. This assures that tables are displayed in a way that makes sense.
For a two-dimensional table, it is conventional to think of rows as representing the antecedent (or given) factor and columns as representing the outcome (or target) factor. With more than two dimensions, a contingency table is made up of several separate two-dimensional tables. Let a, b, c, etc. represent the number of levels for factors A, B, C, etc. Then a three-dimensional table can be represented with aseparate bxc tables, a four-dimensional table with a times b separate cxd tables, and so forth. The rows of the two-dimensional tables ILOG displays represent the next to last factor and the columns represent the last factor. Making the last factor listed represent the outcome of interest ensures that the columns of the two-dimensional tables displayed on the computer screen will represent outcome.
Entering Data into ILOG: File Entry
Instead of entering data directly, data can be read from a tab-delimited text file; thus selecting File > Open an Existing Data File is the second way to enter data into ILOG.
The first line of this file consists of column headings: the first is ID, the next are the names of your factors, and the final is COUNT, all separated with tabs (you can use words other than ID and COUNT if you wish). For the remaining lines, the first column contains an identifier for each line; it can be anything you want. Let N represent the number of factors; then the next N columns contain names for a particular factor level. The final column contains the count for that particular combination of level names (thus each line contains N + 1 tabs).
Several lines could contain the same level names, in which case the counts would accumulate in the designated cell (i.e., the cell indicated by that combination of level names). Thus, if your data are already entered in another program’s datasheet or a spread sheet, a tab-delimited file exported from these programs can be imported directly into ILOG (File > Open an Existing Data File). Moreover, if you entered data directly into ILOG, it can be saved to a file (File > Save the Current Table) and that file imported into a statistical or spread sheet program.
As an example, consider Bakeman and Brownlee’s (1982) study of object struggles in toddlers and preschool children during free play. They asked observers (working from video records) to detect possession struggles—i.e., times when one child (the holder) possessed an object and another (the taker) attempted to take it away—and to code each possession struggle on four dimensions:
- Age—whether the children were observed in the toddler or the preschool classroom,
- Dominance—whether the taker had been judged dominant to the holder,
- Prior Possession—whether the taker had had prior possession of the contested object within the previous minute, and
- Resistance—whether the holder resisted the taker’s attempt (the last three were coded yes or no).
Bakeman and Brownlee regarded Resistance as the outcome of interest, reasoning that holders would be less likely to resist if they believed the taker had a claim on the object, presumably through prior possession, or if the taker were dominant. Thus the factors are ordered Age of children, Dominance of taker, Prior Possession of taker, and Resistance of holder. The data for their study were organized as a 2x2x2x2 table, the tab-delimited version of which is shown below. We use these data subsequently to illustrate various ILOG procedures.
ID | Age | DomiT | PriorT | ResisH | COUNT |
---|---|---|---|---|---|
1 | todler | yes | yes | yes | 19 |
2 | presch | yes | yes | yes | 6 |
3 | todler | no | yes | yes | 16 |
4 | presch | no | yes | yes | 9 |
5 | todler | yes | no | yes | 42 |
6 | presch | yes | no | yes | 18 |
7 | todler | no | no | yes | 61 |
8 | presch | no | no | yes | 27 |
9 | todler | yes | yes | no | 7 |
10 | presch | yes | yes | no | 5 |
11 | todler | no | yes | no | 4 |
12 | presch | no | yes | no | 6 |
13 | todler | yes | no | no | 30 |
14 | presch | yes | no | no | 5 |
15 | todler | no | no | no | 13 |
16 | presch | no | no | no | 4 |
The first line contains names for the four factors for this 2x2x2x2, Age by Dominance by Prior Possession by Resistance contingency table. The remaining 16 data lines contain level names for each factor—these uniquely identify a cell in the table—along with a count for that particular cell. Items on each line are separated with tabs. The file could have more than 16 data lines, in which case counts for additional lines that contained the same level names would accumulate in the designated cell. In the extreme, for each cell there could be as many lines as there are counts in that cell, with each line having the same level names but a count of 1; such a file might result when exporting from a statistical program like SPSS.
When these data are opened in ILOG (and we urge you to try it), they would be displayed on the screen as shown below. The counts displayed on the screen (19, 42, 7, 30) are bolded in the data file shown above.