Running the CompTIES software via iPlant.
You can run the CompTIES code using the iPlant Collaborative interface. The CompTIES code is available as one of many "apps" offered by the iPlant Collaborative, which is a cyberinfrastructure project broadly supporting high-performance computing for the life sciences and social sciences. iPlant hosts the CompTIES app on its Discovery Environment platform, a richly-featured web interface for storing data and executing analyses. In other words, you don't need to download source code or binaries; instead you upload your data to iPlant's Discovery Environment data storage, run the application from a web browser, and later download the results. This page describes how to do so.
The CompTIES app trains and tests the full TIES coupled-oscillator model of emotional interaction. Using cross-validation, this learns shared parameters for the oscillator and tests those models.
Quick StartIn a nutshell, you will run the app like so:
- Format your data as a CSV file. Upload your file to the Data Store.
- Locate and open the app called Temporal Interpersonal Emotional Systems (TIES) inference run.
- Browse to your data filename as the input data.
- Choose an "observable" category name, a "distinguisher" category name, and optional "moderator" and "grouping variable" category names.
- Click "Launch Analysis" and let the tool go to work!
The input data file contains measurements and other information about participants, who are paired into dyads (i.e., two individual participants). The file contains both time-varying and time-invariant data.
The input data file is organized as a table of numbers, stored in comma-separated-variable (CSV) format. (Spreadsheets such as Excel and OpenOffice can export data in this form.) The first row stores text labels, and rows 2 and below hold numeric data for the participants. Each row stores information about one participant at one moment in time. Each column stores one category of data.
The text labels in the first row describe the data categories. Each entry in rows 2 and below should generally be numbers. Two column labels are mandatory: Dyad, and time.
Required column labels (and meanings)
Two of the columns must have the exact labels below. The labels are case-sensitive.
Dyad — a unique dyad number, e.g., 1, 2, 3, which is an identifier shared by exactly two participants. This number is used extensively in the output.
time — an index of the time measurement, e.g., 1, 2, 3, . . . .
Other required columns
At least two more columns are required, but the labels are up to you:
(a distinguisher category) — one column must contain just 0 or 1 and differentiate between the two members of the dyad. For example, in a mother-daughter study this column might have row-1 label "is_mother" and rows 2 and below could store 1 for the mother, and 0 for the daughter. You must specify this label name when you run the app.
(an observable category) — at least one column contains time-varying data for the participant. For example, this might be respiration rate. It could have row-1 label "resp_rate" and rows 2 and below would store numeric measurements, probably changing on each row.
The goal of the CompTIES code is to try to infer an oscillator model that "explains" these data, i.e., a model whose oscillations tend to fit the observations with low root-mean-square (RMS) error.
(optional: a moderator category) — if you wish to investigate the explanatory power of a time-invariant factor, then your file should include one or more columns of such data. When they are continuously-varying numerical values, we call them "moderators." For example, this might be age. It could have row-1 label "age_years" and constant value (perhaps rounded) per participant. The hypothesis is that the oscillator parameters are a linear function of the moderator (or, that they tend to be, if the parameters are stochastic).
Moderators can describe individuals, or entire dyads.
(optional: a grouping variable category) — This is similar to a moderator, but for discrete, categorical values. If the hypothesized explanatory factor takes the form of discrete categories that describe dyads, then the category label is a "grouping variable." In other words, the dyads can be partitioned into disjoint groups, and you hypothesize that the oscillator parameters can be predicted by group identity. For example, the grouping variable might be the dyad's preferred language of interaction, possibly encoded using numeric labels (e.g., 0=Tamil, 1=Kannada, 2=Telugu, etc.), and recorded in one column. For each group identity, independent sets of oscillator parameters will be inferred, one set per group.
Like a moderator, the values of a grouping variable must be time-invariant. Note that group identity applies to dyads, not individuals. Thus both individuals in a dyad must have the same group-identity label in the grouping-variable column.
For more discussion about the differences between moderator and grouping-variable factors, please see below, Moderator or Grouping Variable?
Extra columns are ignored, and thus you may store all your measurements in one file. Unreferenced columns have no effect on the model.
Every column in the file must have a row-1 label composed of solely of letters, numbers, and underline characters. Do not use spaces, punctuation, or other characters. Each label must be unique. Labels are case-sensitive: you must match upper case and lower case exactly whenever you refer to one.
If necessary, you can use the special string "NA" in place of a number to indicate missing observable data. However, if "NA" is present in the Dyad, time, or distinguisher column of any row, that entire row is unusable and is disregarded.
Input File Example
The fictional example below shows the format. Here's how it would look in LibreOffice Calc or Excel:
If you open the CSV file in a text editor, the first few lines of the above example would look something like this:
Before you launch the application, you will be prompted for several items:
- Analysis Name — this becomes part of the name of the folder to be created to store your output. You can accept the default value or modify it, however best fits your personal style for organizing your files.
- Comments — this is an optional, free-form text field for remarks about this experiment.
- Select output folder — this should be the name of one of your Data Store folders. You can accept the default value or modify it.
- Retain inputs? This checkbox lets you make a local copy of the input datafile into the output folder. This could be useful if the input file is subject to change.
The "Inputs" section of the app interface prompts you for two filenames.
- Data file input (CSV) — using the Discovery Environment interface, browse to the name of the input file.
- Configuration filename — advanced users can specify a filename here with special options for the inference engine. If omitted, the program uses a sensible set of default options.
Parameters sectionThe "Parameters" section of the interface asks for several category names and modeling decisions.
Observable(s) — You should enter the exact name(s) of the column label(s) for the observable data column(s) (i.e., the labels are case sensitive). You may use more than one observable, with names separated by commas (e.g., blood_pressure, heart_rate). Whitespace between observable category names has no effect.
Moderator(s) — this is an optional category name, or list of names, for moderator data, which does not change with time. If you use this field, you must enter the exact name of the column label for the moderator column, including lower-case or capital letters. You may use more than one moderator, with the names separated by commas (e.g., health, wealth). Whitespace between moderator category names has no effect.
Grouping variable — this is an optional category name used to partition the dyads into two or more groups. If you use this field, you must enter the exact name of the column label. Only one grouping variable category name is accepted, but the values in this category can partition the dyads into as many sub-groups as desired.
Distinguisher category name — this is a category name for data that corresponds to an individual participant, does not change with time, and is 0 or 1 to differentiate between two members of the dyad.
Infer baseline models? — This checkbox determines which models the CompTIES app will infer. The app will always infer the full-power TIES model (a shared-parameter coupled oscillator, using all moderators and any grouping variable). If this checkbox is checked, the app will also infer four simpler models, called baseline models, for the sake of comparison. The baseline models, in order of increasing complexity, are as follows:
- Average model — explains the observable data using a constant value (equal to the average value of the observable over time).
- Line model — explains the data by fitting a line through the observables.
- Individual coupled oscillator model — explains the data by learning a shared error variance, then choosing the best-fitting oscillator to the observable data used for training. This model tends to overfit.
- Shared-parameter coupled oscillator without moderator — This model disregards any moderator or grouping-variable factors. In all other respects is the same as the full-power TIES model.
Moderator value the same for both individuals? — This checkbox determines whether the CompTIES app should assume that a moderator value is shared among both members of a dyad, or (if unchecked) that it can vary between the two individuals.
Moderator or Grouping Variable?
If you are testing the hypothesis that some known per-dyad or per-individual time-invariant factor can help explain the observed oscillations in your data, then you should use either a moderator or a grouping variable. But which one?
A grouping variable is intended for discrete values that describe categories lacking a natural sequential order. Each dyad must belong to exactly one such category. For example, if you hypothesize that the first language of the dyad members helps explain your data, you might record categorical values for each dyad indicating 1=Catalan, 2=Korean, 3=Urdu, etc. The order of these numbers is meaningless: Catalan is neither greater than nor less than Korean, just a different category. But, this fact is not obvious to the TIES modeler. By indicating that this category is a grouping variable, you tell the TIES modeler to ignore the order properties of the category values.
The TIES modeler uses grouping variables to segregate the data, and then it infers separate shared-parameter coupled oscillator models for each group, where the shared parameters are learned only from dyads in the same group. All else being equal, data with fewer groups or more dyads per group will yield results with better significance. Of course a grouping variable must define at least two groups to have any explanatory power.
Grouping-variable categories describe dyads, not individuals. Both individuals of a dyad must share the same grouping-variable category. This is not a limitation, since discrete individual differences can be combined to form dyad-level categories. For example, deaf identity of a mother-daughter dyad could be encoded as 0=(mother deaf and daughter deaf), 1=(only mother deaf), 2=(only daughter deaf), 3=(neither is deaf).
A moderator is intended for numerical values that have a meaningful natural order. Examples include age, body-mass index, number of siblings. A moderator category might assume discrete values, but the order of the values naturally has meaning. For example, if body-mass index truly helps predict good oscillator parameters, then two individuals with BMIs of 30 and 31 (all else being equal) will have oscillator parameters more similar to each other than to those of an individual with BMI of 20.
The TIES modeler uses moderators in a linear regression model, either to determine oscillator parameters, or (if the parameters are stochastic) to determine the distributions of the oscillator parameters.
You can choose to make moderators either characterize the individuals in your study, or to characterize entire dyads. That choice affects the complexity of the regression used to set the oscillator parameters. Currently the CompTIES app requires you to choose this property once for all moderators: the app does not presently support mixtures of both dyad-level and individual-level moderators. This is not a serious limitation because individual-level moderators can be reformulated as dyad-level moderators. For example, the individual-level moderator "age_years" for a mother-daughter dataset could be reformulated into two dyad-level moderators "mother_age_years" and "daughter_age_years." NOTE TO EB/KB/JG: Is this correct? Would such a reorganization be perfectly equivalent? Is the motivation for individualized moderators purely convenience? -- AP
This section is obsolete in 2015 and needs to be rewritten.
The analysis creates an output folder, using the name specified at launch-time. Inside there are subfolders for the inference results, and with baseline models for comparison.
Results from the TIES model
There is a subdirectory named shared-param-CLO which stores all the results of the training and testing. The errors subfolder shows fitting error.
This file, in the errors subfolder, contains the RMS fit error between the data and the oscillator outputs, for each dyad in the input, when it is used for testing (not training).
This file can be useful for diagnosing problems. One can see if there are dyads with data that never fit well. This might mean the data are outliers, or corrupted somehow. Person-0 represents the dyad member with zero-value distinguisher, and Person-1 represents the dyad member with one-valued distinguisher. This file shows errors during the fitting (the early 80% of the data) in two columns, and during the prediction (the late 20% of the data). A quality fit will have low prediction error.
This file represents the average of the columns of err-couples.txt — that is, it shows the RMS fitting error averaged across time and across couples.
Results from baseline models
There are more subdirectories containing similarly-organized results for the three baseline models (flat average value, straight line fit, and independent coupled oscillator). The error results are found in error/err-couples.txt and error/err-summary.txt with the same interpretation as the results in shared-param-CLO (see above).
Interpretation of results
(fill in more later.) Basic story: as the baseline models get more sophisticated (average is simplest, line-fit is intermediate, independent-CLO is the most sophisticated), the fit gets better but the predictions get worse. By taking a Bayesian approach and introducing (and learning) a prior distribution over oscillator characteristics, the TIES model predicts better than any of the baseline models.