Skip to main content

Introduction

This document describes the Triple-S XML format for survey data and variables. Triple-S and Triple-S XML are trademarks of The Triple-S Group. The specification has been changed to be specific to the Walr Platform. It adheres to the 2.0 specification but does not have support for hierarchical strcuture. There is also a section that describes the different extensions we have made to the standard, such as multilists and grid.

Background

The aim of the Triple-S standard is to define a means of transferring the key elements of entire surveys between different survey software packages across various hardware and software platforms.

Summary

A Triple-S survey is described in two text files. One, the Metadata File, contains version and general information about the survey together with definitions of the survey variables. This is used to interpret the contents of the Data File. It is recommended that the Metadata File has a file extension of 'xml' (or \'sss\' for compatibility with previous versions of the standard) and the corresponding Data File has the same name but with the extension \'asc\', or if comma-delimited data is used, the extension 'csv'.

The format of each of the files has been designed to enable software read/write routines to be easy to implement. To further aid the development process the files are relatively simple to read by eye.

Compatibility

Triple-S XML version 2.0 is developed from Triple-S XML version 1.2. It is designed to retain substantial compatibility with the earlier standard. The aim is that any valid Triple-S XML version 1.2 specification should require only limited, or even no, changes to become a valid XML version 2.0 specification.

The Metadata File Outline

The Metadata File is coded in XML syntax according to rules given by the associated Triple-S XML DTD (Document Type Definition). The Metadata File contents describe two aspects:

  1. The file itself in terms of version number, date and time of creation etc.
  2. The survey in terms of the survey variables, or the hierarchy in terms of the contributing Metadata Files.

The following shows an outline of the contents of the Metadata File for a survey:

<?xml version="1.0"?>
<sss version="2.0">
<date>date_text</date>
<time>time_text</time>
<origin>origin_text</origin>
<user>user_text</user>

<survey>
<name>survey_name</name>
<title>survey_title_text</title>

<record ident="record_ident">
<variable ident="variable_ident" type="variable_type">
<!-- variable_details -->
variable_details
</variable>
<!-- ... -->
<variable ident="variable_ident" type="variable_type">
<!-- variable_details -->
variable_details
</variable>
</record>
</survey>
</sss>

Note that the file starts with a declaration that it consists of XML. The rest of the file is specified in terms of elements such as <date> and <time>, some of which (such as <survey> ... </survey>) also encapsulate other elements, and some of which (such as <record ident="record_ident">) also include attributes.

XML

Introduction

Extensible Markup Language (XML) is a set of rules for encoding documents in a machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all open standards. The design goals of XML emphasize simplicity, generality, and usability. It is a textual data format with strong support via Unicode for the languages of the world. Although the design of XML focuses on documents, it is also widely used for the representation of arbitrary data structures. The use of XML to describe the metadata has implications on the terminology used within the Triple-S standard and the rules for specifying the Metadata File.

Terminology

The characters which make up an XML document are divided into markup and content. Markup and content may be distinguished by the application of simple syntactic rules. All strings which constitute markup either begin with the character "<" and end with a ">", or begin with the character "&" and end with a ";". Strings of characters which are not markup are content.

A Tag is a markup construct that begins with "<" and ends with ">". Tags come in three flavors: start-tags, for example <section>, end-tags, for example </section>, and empty-element tags, for example <line-break />.

An Element either begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag. The characters between the start- and end-tags, if any, are the element's content and may contain further markup, including other elements, which are called child elements. An example of an element is <Greeting>Hello, world.</Greeting>.

An Attribute is a markup construct consisting of a name/value pair that exists within a start-tag or empty-element tag. An example would be <step number="3">Connect A to B.</step> where the name of the attribute is "number" and the value is "3".

The XML specification defines a document as a text that is well-formed, i.e., it satisfies a list of syntax rules provided in the XML specification. The list is fairly lengthy; some key points are:

  • It contains only properly encoded legal Unicode characters.
  • None of the special syntax characters such as "<" and "&" appear except when performing their markup-delineation roles.
  • The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and none overlapping.
  • The element tags are case-sensitive; the beginning and end tags must match exactly.
  • There is a single "root" element that contains all the other elements.

In addition, to be a valid Triple-S Metadata File, the XML document must conform to the rules of the Triple-S DTD as defined in this standard.

Unicode

XML uses the Unicode character set. Unicode provides a consistent way of encoding multilingual plain text and brings order to a chaotic state of affairs that has made it difficult to exchange text files internationally. The design of Unicode is based on the simplicity and consistency of ASCII but goes far beyond ASCII's limited ability to encode only the Latin alphabet. The Unicode Standard provides the capacity to encode all of the characters used for the written languages of the world. To keep character coding simple and efficient, the Unicode Standard assigns each character a unique numeric value and name.

An encoding defines how characters are to be represented in a byte, word, or double-word-oriented format (i.e., in 8, 16, or 32-bits per code unit). Some encodings only permit a subset of the Unicode character set to be represented (e.g., 7-bit ASCII). Other encodings (e.g., UTF-8) can be used to represent the full Unicode character set. The default encoding for XML is UTF-8, which is a way of transforming all Unicode characters into a variable-length encoding of bytes. It has the advantages that the Unicode characters corresponding to the familiar 7-bit ASCII set have the same byte values as ASCII and that Unicode characters transformed into UTF-8 can be used with much existing software without extensive software rewrites. If any other encoding is used, then it must be specified in the initial XML declaration. The most common encoding is Latin-1 (or more precisely ISO8859-1), in which case the initial XML declaration should be:

<?xml version="1.0" encoding="ISO-8859-1"?>

The use of the correct encoding attribute is very important, as the validity of the Metadata File relies on it only containing correctly encoded Unicode characters.

Formatting Recommendations

An XML file does not need to have any formatting, but in order to improve the readability of the Metadata File, it is recommended that:

  • The file is organized into lines using CR, LF (decimal 13, decimal 10) combinations. However, they should be avoided within elements that contain text (e.g., <title>, <label>, or <value>) where their presence could affect how the text is processed. Note that the <br/> element is provided in these situations to indicate where new line breaks should appear within the text.
  • At most one element, or element with associated attributes, appears on one line.
  • Lines are indented with space or tab characters to reflect the structure inherent in the file. An indent is applied after every element that contains other elements.

Comments

Comments may be used to annotate contents or to temporarily hide sections of the file from the XML parsing mechanism. These are standard XML comments and start with the conventional XML construct of <!-- and end with -->. Comments are optional and can appear any number of times in the Metadata File.

  • <!--comment_text--> can be used anywhere (after the initial <?xml …> declaration) to indicate parts of the Metadata File that are to be ignored.
  • A comment_text may include any text except two successive dash characters, --. For example: <!--Data collected from 12-18th June 2011-->

Triple-S Names

A Triple-S Metadata File contains a number of “names”, such as the names of the variables or the hierarchy level identifiers. In order to be generally useful these “names” have a restricted definition:

  • Names must start with a letter (A-Z or a-z) or _ (underscore) character.
  • Subsequent characters can be letters (A-Z or a-z), digits (0-9), _ (underscore), or . (period) characters.
  • Names are case sensitive (i.e. upper and lower case are different).
  • Names must be unique within their type.
  • Names must not begin, or end, with whitespace (e.g. spaces, tabs), neither should they contain embedded whitespace.
  • Names may be unlimited in length.

Although the definition of Triple-S names is described above, most systems that import Triple-S files will also have their own limits and restrictions. The most common will be a maximum size, not case sensitive, and a more restricted set of characters used for e.g. variable names. In these cases the importer may have to generate new names that conform to their own limits and restrictions.

Triple-S Numbers

A Triple-S Metadata File will contain many instances of attribute values that are “numbers”. These are either integers, or real numbers which are written with a decimal point.

In order to be generally useful, all integer numbers that are used as attribute values are restricted to signed 32-bit integer values (i.e. -2147483648 to 2147483647).

This limit affects at least the ident attribute in a <variable> element, the locations within a <position> element, integral values within a <range> element, and a numeric code and score within a <value> element.

Note that many integer numbers (e.g. the ident attribute, the locations within a <position> element) are further restricted to positive integer values (i.e. 1 to 214783647).

Real numbers must have a decimal point, plus zero or more decimal places. They are not limited in value. They may be used as the value of the score attribute in a <value> element, and as the attribute values within a <range> and <value> elements for a variable of type quantity.

General Metadata File Elements and Attributes

This section describes the syntax and function of each of the elements and attributes used to describe the Triple-S XML Metadata File itself. The elements and attributes are shown in the order they are expected in the file.

<sss version="sss_version" [ languages="language_list" ] [ modes="mode_list" ] >

The <sss> element is always required and is used to encapsulate the entire specification document. It contains a mandatory attribute version and optional languages and modes attributes.

The version attribute is used to indicate the version of the Triple-S standard that applies to this specification.

If the sss_version is 1.1 or 1.2 then only elements and attributes from the Triple-S XML version 1.1 or 1.2 standard are used. If the sss_version is 2.0 then the definition complies with the version 2.0 standard, and new elements and attributes from the Triple-S XML version 2.0 standard may be present.

The languages attribute is used to indicate that there are some multilingual texts within the Triple-S definition and to define the language identifiers that are used for those texts.

<sss version="2.0" languages="en fr">

The modes attribute is used to indicate that there are specific texts within the Triple-S definition for interviewing and analysis.

<sss version="2.0" modes="analysis">
<date>date_text</date>

Optional. The date_text should represent the date the file was created.

<time>time_text</time>

Optional. The time_text should represent the time the file was created.

<origin>origin_text</origin>

Optional. The origin_text should describe the originating system (program and operating system).

<user>user_text</user>

Optional. The user_text should indicate the name of the user who created the file.

Following the above elements, either a <survey> or a <hierarchy> element describing the actual content of the Metadata File should appear. A <survey> element (see the section Survey Elements and Attributes) describes either a simple flat set of data, or a Data File that forms part of a hierarchy. A <hierarchy> element (see the section Hierarchy Elements and Attributes) describes the overall structure of a hierarchical set of Data Files. Note that a Triple-S Metadata File cannot contain both <survey> and <hierarchy> elements.

This representation uses Markdown with XML blocks to highlight the XML elements and attributes.

Survey Elements and Attributes

This section describes the syntax and function of each of the elements and attributes used to form the content of a Triple-S XML <survey> element. The elements and attributes are shown in the order they are expected in the file.

<survey>

Mandatory. Introduces details of the data for a flat survey, or of a Data File that forms part of a hierarchical survey.

<name>survey_name</name>

Optional. See the section on Triple-S Names for the definition of the survey_name. For those systems with no specific survey naming convention this element could be used to hold the filename, for example:

<name>SP1025</name>

<version>survey_version</version>

Optional. The version number of the survey, for example:

<version>3.1</version>

<title>survey_title_text</title>

Optional. The survey_title_text should represent the survey title. The title may optionally include any number of <br/> elements to indicate new line breaks. For example:

<title>Fitness Survey<br/>First wave</title>

The title may also contain language-specific and mode-specific texts. These are described in a later section on Specialised Texts.

<record ident="record_ident" 
[ href="datafile_uri" ]
[ format="record_fmt" ]
[ skip="n" ] >

Mandatory. One <record> element starts after <survey> (or any survey description elements if present). It is used to introduce the definition of the variables that are held in the Data File.

The record_ident is any single character A to Z or a to z.

The optional datafile_uri can be used to specify an explicit location for the Data File that is described by this Triple-S XML specification. Note that using an href attribute ties the specification to the Data File and may cause problems if the Metadata File and Data File are moved.

For example: <record ident="A" href="responses.asc">

The optional record_fmt can be used to declare the format of the Data File that corresponds to this specification. The default format is fixed format fields, but if specified then it must be one of:

  • csv: the data representation is comma separated values, using one field for each variable, with data values similar to the fixed format. Note that for data in csv format the position element refers to the field number.
  • fixed: the data representation is fixed format fields (the only format supported by previous versions of the Triple-S standard).

The optional skip attribute can be used to ignore one or more initial records in the Data File. This will be most useful for csv Data Files where the first line is often used as documentation (e.g. names for the columns/fields) for the succeeding values.

For example: <record ident="A" format="csv" skip="1">

The record_ident can be used in conjunction with the variable_ident (see the <variable> element later) to generate unique variable names on import.

Following the <record> element, for each variable being described there must be a <variable> element:

<variable ident="variable_ident" type="variable_type" [ use="use_type" ] 
[ format="variable_fmt" ]

Mandatory. The variable_ident is a positive number with or without leading zeroes. Each variable_ident must be unique within the containing <record> element.

The variable_type must be one of:

  • single: categorical with one response allowed
  • multiple: categorical with any number of responses
  • quantity: numeric value (integer number or real, decimal number)
  • character: character value
  • logical: Yes/No or True/False value
  • date: variable contains a date. The date value must be stored in the YYYYMMDD basic ISO 8601 format.
  • time: variable contains a time. The time value must be stored in the HHMMSS basic ISO 8601 format.

The use_type is optional and describes the role of this variable in the survey. Only a subset of variable types may have a use attribute (as detailed below), and the use_type must be one of:

  • serial: this variable contains the serial number (or other identification field) for the case. There can be at most one serial variable and it must be either a character or a positive integer quantity. The data values must be unique and should not be missing.
  • weight: this variable contains a case weight. There can be at most one weight variable and it must be a quantity. The data values should be non-negative and not be missing.

The variable_fmt is optional and can be used to declare the format of the codes for this variable. Only variables of type single may have a format attribute. The default format for all variables of type single is numeric, but if specified for a variable then it must be one of:

  • literal: all the codes for this variable are to be treated as characters, rather than numbers. Literal codes are case-sensitive (i.e. “a” and “A” are different).
  • numeric: all the codes for this variable are to be treated as numbers.

For example:

<variable ident="10" type="single">

or:

<variable ident="1" type="quantity" use="serial">

or:

<variable ident="7" type="single" format="literal">

<name>name_text</name>

Mandatory. The name_text should represent the name of the variable in the survey. See the section on Triple-S Names for the definition of the name_text.

For example: <name>Q1a</name>

<label>label_text</label>

Mandatory. The label_text should represent the label or question text for the original variable, for example:

<label>First visited</label>

The text may optionally include any number of <br/> elements to indicate new line breaks. The label element may also contain language-specific and mode-specific texts (see later section on Specialised Texts).

<position start="start_location" 
[ finish="finish_location" ] />

Mandatory. Describes the location of the data values within the data record. The interpretation of this element depends on the format of the Data File (see the record element earlier):

For fixed format fields:

The start_location and finish_location are positive integers, which represent the character positions, with the first position in the data record being 1.

For example: <position start="21" finish="24"/>

The finish_location must be greater than or equal to the start_location. The finish attribute may be omitted if the finish_location is the same as the start_location.

The <position> element defines the part of the data record that is allocated to holding the value of the variable. The <size>, <values> and <spread> elements describe which parts of the data record are to be interpreted as the value, and what are the legal values of the variable. As a consequence the <position> element must define a length that is at least as long as that implied by the <size>, <values> and <spread> elements.

The parts of the data record defined by the <position> elements of different variables may appear in any order, may overlap each other, and do not have to describe the entire data record.

For comma separated values:

The start_location is a positive integer, which represents the field number, with the first field in the data record being 1.

For example: <position start="5"/>

Since the position for a csv file refers to fields, and there is exactly one field per variable, the finish_location will always be the same as the start_location. It would therefore be usual for the finish_location to be omitted where a csv Data File is used. However, importers should not assume that this will be the case as some exports may always explicitly include it.

The fields of the data record defined by the <position> elements may appear in any order, may be used more than once, and do not have to describe the entire data record

<filter>filter_name</filter>

Optional. The filter_name must be the name (as defined by the <name> element) of a previously defined logical variable. The value of this logical variable determines if the current variable is available for that case.

For example: <filter>EverVisited</filter>

The name of the logical variable must be unique. All variable types, including logical variables, can have one <filter> element. However, note that variables used as a serial or weight must have no missing values, hence it is inappropriate for these to have a <filter> specified.

The elements that can follow the <position> or <filter> element vary according to the variable_type :

  • single: Mandatory values element
  • multiple: Optional spread element
    Mandatory values element
  • quantity: Mandatory values element
  • character: Mandatory size element
  • logical: Nothing extra
  • date: Optional values element
  • time: Optional values element
<spread subfields="num_subfields" 
[ width="subfield_width" ] />

Optional element only used with multiple type variables. The <spread> element indicates that the data values are coded as a series of category values in consecutive subfields (rather than the default multiple format of a series of 0/1 characters).

The num_subfields attribute must be a positive integer, and denotes the number of subfields within the overall field that is defined by the <position> element. The subfield_width is also a positive integer and denotes the width of each subfield. For fixed Data Files, the <position> element must define a width of at least (num_subfields * subfield_width). The subfield_width must be large enough to hold the largest category value specified for the multiple, for example:

<spread subfields="5" width="3"/>

The width attribute may be omitted when used in conjunction with fixed data if the num_subfields exactly fills the space defined by the <position> element. In this case the subfield_width is determined by dividing the width derived from the <position> element by num_subfields. Note that for csv data the width attribute must always be specified as it cannot be determined from the overall width defined in the <position> element.

<values>

Mandatory for single, multiple and quantity variable types, optional for date and time variable types. The <values> element is used to define the set of legal values and optional text labels for values (e.g. categorical codes).

A <values> element contains at most one <range> element and/or one or more <value> elements. If a <range> is present then it must be the first element.

<range from="start_value" to="finish_value" />

Optional first or only element. The <range> indicates an overall range of legal values for the variable. The finish_value must be equal to or greater than the start_value. This element may be followed by any number of <value> elements each defining a particular value.

The <range> element may not be used when the attribute format="literal" is specified on the associated <variable> element.

<value code="code_value" 
[ score="score_value" ] >value_text</value>

Any number of optional elements that may be used to give labels to specific values of the variable. The value_text may optionally include any number of <br/> elements to indicate new line breaks. The value_text may also contain language-specific and mode-specific texts (see later section on Specialised Texts).

If no <range> element has been specified then there must be at least one <value> element. If a <range> element has been specified then the code_value may lie within or outside the defined start_value and finish_value. Apart from this, all code_values must be unique within each <values> element.

The optional score attribute can only be used when the variable is of type single. It allows score values to be assigned to the individual code values to be used for computing statistics such as Mean, Standard Deviation etc. The score_value must be a number, and may be positive, negative or zero, with or without a decimal point and decimal places. The omission of a score implies that records having that value code should be omitted from the base for any statistical computation for that variable.

For single variables:

The start_value, finish_value and code_value for a variable of type single depend on whether the attribute format="literal" is specified on the <variable> element. If this attribute is not present or format="numeric" is used or implied, then these codes must all be positive integers or the value zero. However when format="literal" is specified, then all code_values (even those that look like numbers) are treated as case-sensitive characters, and the <range> element cannot be used.

The <value> elements do not need to be in any order, nor need they form a complete set with every possible value code present. There is no upper limit to the number of <value> elements which may be specified within a variable definition.

For example:

<values> 
<!--3 labelled categories-->
<value code="1">Yes</value>
<value code="2">No</value>
<value code="9">Refused</value>
</values>

Or:

<variable  format="literal"> 
. . .
<values>
<!—character category codes-->
<value code="00">Never</value>
<value code="01">Once a week</value>
<value code="02">Once a month</value>
<value code="03">Less frequently</value>
<value code="X">Don’t know</value>
<value code="XX">Refused</value>
</values>

Or:

<values> 
<!--with scores-->
<value code="1" score="2">Very satisfied</value>
<value code="2" score="1">Satisfied</value>
<value code="3" score="0">Neither</value>
<value code="4" score="-1">Unsatisfied</value>
<value code="5" score="-2">Very unsatisfied</value>
<value code="9">DK/NS</value>
</values>

For multiple variables:

The start_value, finish_value and code_value must all be positive integers. The <value> elements do not need to be in any order, nor need they form a complete set with every possible value code present. There is no upper limit to the number of <value> elements, which may be specified within the corresponding variable definition.

For example:

<values> 
<!--3 labelled categories-->
<value code="1">USB Memory Stick</value>
<value code="2">CD</value>
<value code="3">DVD</value>
</values>

Or:

<values> 
<!--unlabelled with two explicit categories-->
<range from="1" to="19" />
<value code="98">Don’t Know</value>
<value code="99">Refused</value>
</values>

For quantity variables:

The start_value, finish_value and code_value explicitly define the valid range, and implicitly define the type (i.e. integer or real), format and physical size of data for the variable. The valid range for a variable of type quantity can include positive or negative values. Negative values are identified by a single leading minus sign, '-'. Positive values are identified by the absence of a sign.

For example:

<values> 
<!--integers from 1 to 100-->
<range from="1" to="100" />
</values>

Or:

<values> 
<!--0 to 500 with 2 dp, plus 1 explicit value-->
<range from="0.00" to="500.00" />
<value code="999.99">Don’t Know</value>
<values>

For a quantity variable of type real, the number of decimal places must be the same for all these attribute values within a values element. The number of decimal places must be identical to the number of decimal places used to represent the data in the corresponding Data File.

The start_value, finish_value, code_value and score_value must contain at least one digit. The table below gives examples of correct and incorrect representations:

Value

  • 1: Correct (integer)
  • 1.0: Correct (real)
  • +1.0: Incorrect - 'plus' sign not allowed
  • -1.0: Correct (real)
    • 1.0: Incorrect - contains embedded spaces
  • 1.: Correct (real)
  • .1: Correct (real)
  • -.1: Correct (real)
  • -. : Incorrect - no numeric digits present

There is no upper or lower limit to the magnitude of the values that may be assigned to a quantity variable.

For date and time variables:

The start_value, finish_value and code_value explicitly define the valid range. Note that the format for date and time variables is fixed (YYYYMMDD for dates and HHMMSS for times). The valid range for a variable of type date or time must conform to this format.

For example:

<values> 
<!--dates within 2011-->
<range from="20110101" to="20111231" />
</values>

</values>

Mandatory if there is a <values> element. Completes the description of the valid values for the variable.

For character variables:

<size>size_specification</size>

Mandatory for character type variables. Defines the maximum number of characters in the data for the variable. The size_specification must be a positive integer; there is no defined upper limit to the size_specification.

For example:

<!--100 characters maximum-->  
<size>100</size>

Finally, for all variable types:

</variable>

Mandatory. Completes definition of the variable.

Then either the definition of another variable (introduced by another variable element), or:

</record>

Mandatory. Finishes the definition for the set of variables.

</survey>

Mandatory. Finishes the definition for the survey.

Specialised Texts

The text for a survey <title>, variable <label> or values <value> element may contain any number of specialised text elements. These are in addition to the plain or default text that may or may not be present for these elements. There are two types of specialised texts:

Languages

The use of multiple language texts within a specification must be signalled by a list of the language identifiers that are used. This is done by adding a languages="language_identifier_list" attribute on the initial <sss> element.

For example:

<sss version="2.0" languages="en-GB en-US fr">
<text xml:lang="language_ident">formatted_text</text>

Optional. The formatted_text may optionally include any number of <br/> elements to indicate new line breaks.

For example:

<value code="1">Yes
<text xml:lang="en-GB">Yes</text>
<text xml:lang="en-US">Sure</text>
<text xml:lang="fr">Oui</text>
</value>

Although there is no restriction on language_ident, the intended values of the xml:lang attribute are described in the official W3C XML version 1.0 specification as:

"The values of the attribute are language identifiers as defined by IETF (Internet Engineering Task Force) RFC 3066, Tags for the Identification of Languages (http://www.ietf.org/rfc/rfc3066.txt) or its successor on the Standards Track.

Note:

IETF RFC 3066 tags are constructed from two and three-letter language codes as defined by ISO 639 (Codes for the representation of names of languages), from two-letter country codes as defined by ISO 3166 (Codes for the representation of countries and their subdivisions – part 1 (country codes)), or from language identifiers registered with the Internet Assigned Numbers Authority, Register of Language Tags."

Modes

The use of specialised texts for interviewing and/or analysis within the specification must be signalled by a modes="mode_identifier_list" attribute on the initial <sss> element.

Two explicit modes are available: “interview” and “analysis”. In the absence of a mode specification, the appropriate text is assumed to be used in both modes.

For example:

<sss version="2.0" modes="analysis">
<text  mode="mode_identifier">formatted_text</text>

Optional. The formatted_text may optionally include any number of <br/> elements to indicate new line breaks.

For example:

<label>How old are you?
<text mode="analysis">Age of respondent</text>
</label>

The standard does not support the use of embedded HTML within texts (with the one exception of <br/> elements to indicate new line breaks). Exporters should remove any HTML when generating texts, in particular for the “analysis” mode.

The language and mode attributes may be combined if the appropriate languages and modes attributes appear on the <sss> element.

For example:

<label>Age
<text xml:lang="en-GB" mode="interview">How old are you?</text>
<text xml:lang="fr" mode="interview">Quel est votre âge?</text>
<text xml:lang="en-GB" mode="analysis">Age of respondent</text>
<text xml:lang="fr" mode="analysis">Âge de répondant</text>
</label>

The Data File

Overview

The Data File is composed of individual records. Each record contains the responses for each of the variables in the corresponding Metadata File given by one respondent. Note there is no requirement that all data in the Data File records be described by the associated Metadata File.

The individual records can consist of either fixed format fields or comma separated values. All records in the Data File must be of the same type (i.e. all fixed or all csv).

Data is recorded in fields and arranged in the manner defined by the <position> elements of the variables in the Metadata File. The type and other definitions for the corresponding variable determine the interpretation of each field.

It is recommended that import programs ignore all parts of the data record not defined by <position> elements, including those beyond the highest location defined by a <position> element.

Basic Formatting Rules

  1. Other than the record terminator (see below), only characters in the range decimal 32 to 255 are valid (i.e. the Data File is always ISO-8859-1 regardless of the encoding used in the Metadata File).
  2. Each record is terminated by either CR/LF, LF/CR, CR or LF, where CR is the carriage return character (decimal 13) and LF is the line feed character (decimal 10). Whichever terminator is used must be employed consistently - that is the same terminator must be used throughout the file.
  3. The number of records in the file determines the number of respondents. There is no maximum number of records (and hence respondents) in the file.
  4. There is no specific end-of-file character. The end of the file is determined by its physical size.
  5. There is no maximum record length.

Fixed Format Files

The Data File is composed of fixed format records. If any record is shorter than the highest location defined in a <position> element then the extra columns should be treated as blank.

Data is recorded in fields of fixed length and arranged in the manner defined by the start and finish attributes of the <position> elements for the variables in the Metadata File. The type and other definitions for the corresponding variable determine the interpretation of each field.

CSV Files

The Data File is composed of records of varying length. All records should contain the same number of values which must be at least as many as the highest field defined in a <position> element.

Data is recorded in fields that generally follow the style generated by the Excel spreadsheet program. The following summarises the format of a Triple-S csv Data File:

  1. Each record is one line and may not contain embedded line-breaks (even within a quoted string in a field).
  2. Data fields are separated with commas.
  3. Leading and trailing space-characters adjacent to comma field separators are ignored.
  4. Character data fields with embedded commas must be delimited with double-quote characters.
  5. Character data fields that contain double quote characters must be surrounded by double-quotes, and the embedded double-quotes must each be represented by a pair of consecutive double quotes.
  6. Data fields with leading or trailing spaces must be delimited with doublequote characters.
  7. A data field representing a bit-style multiple which begins with "0" (zero) should always be delimited with double-quote characters.
  8. Any data field may be delimited with double quotes. The delimiters will always be discarded.
  9. The initial records in a csv file may be header records containing items such as column (field) names.

Individual Data Items

The following pages describe the methods used to represent data for each type of variable. In all cases, a field composed entirely of space characters represents missing data for that variable.

Note: In the following tables the character “b” is used in the data record column to represent a space (blank column), and the character “x” indicates a data record column that should contain either a space or zero character.

Variables of type Single

Data for Singles may be recorded as either numeric codes or literal strings.

Numeric codes

Data is recorded as an integer number or 0 (zero) as described by the <values> element.

The data field length is derived from the <value> and <range> elements in the <values> element, and is the minimum number of characters required to represent the largest value. Thus, variables with values up to 9 have a data field one character long; variables with values up to 99 have a data field length of 2, and so on. If a particular data value requires less than the maximum for the field, it should be right justified using leading space or zero characters as padding.

For example:

Data valueMaximum in <values> element<position> elementData record b=space
79start="21" finish="21"7
79start="21" finish="22"07 or b7
799start="21" finish="22"07 or b7
799start="21"illegal
799start="21" finish="24"0007 or bbb7
1799start="21" finish="22"17
1799start="21" finish="24"0017 or bb17
1429999start="21" finish="24"0142 or b142
missing9999start="21" finish="24"bbbb
799start="4" (csv format)7 or 07 or b7

If the data field length from each <value> or <range> element is less than that defined in the corresponding <position> element then it is assumed to be right justified within the locations defined in the <position> element. Export programs must ensure that any extra columns contain leading blanks or zeros.

Literal strings

Data is recorded as characters (even if the code is numeric) as described by the <values> element. The literal codes are case-sensitive and may contain blanks.

The data field length is derived from the <value> elements contained within the <values> element, and is the minimum number of characters required to represent the longest literal. If a particular data value requires less than the maximum for the field, it should be left justified using space characters as padding.

For example:

Data valueMaximum code length in <values> element<position> elementData record b=space
A1start="21" finish="21"A
A1start="21" finish="22"Ab
A2start="21" finish="22"Ab
A2start="21"illegal
A2start="21" finish="24"Abbb
ZZ2start="21" finish="22"ZZ
ZZ2start="21" finish="24"ZZbb
missing4start="21" finish="24"bbbb
A1start="4" (csv format)A or "A"
A2start="4" (csv format)A or "A" or "A "

If the data field length from the <value> element is less than that defined in the corresponding <position> element then it is assumed to be left justified within the locations defined in the <position> element. Export programs must ensure that any extra columns contain blanks.

Note that in a CSV Data File any literal value with embedded commas, or leading/trailing spaces, must be delimited with double-quote characters

Variables of type Multiple

Data for Multiples may be recorded as either one character per value (bitstring format), or as a list of values (spread format).

Bitstring format

Data is recorded with one character per category of the corresponding variable. A character ‘1’ is used to signify that a category has been selected, a character ‘0’ signifies that a category is not selected. The category value refers to the relative position of the 0/1 code in the data field: thus a category value of 9 will always refer to the code in the 9th location of the data field even if some lower category values have not been defined. An import program should ignore the locations of undefined category values.

The data field length is the highest category value in the associated <value> or <range> elements. If the data field length is less than the <position> element then it is assumed to be left justified within the locations defined by the position. Export programs should ensure that any extra columns contain blanks or zeros.

Note that in a CSV Data File any data field representing a bit-style multiple which begins with "0" (zero) should always be delimited with double-quote characters.

For example:

Data valueMaximum in <values> element<position> elementData record b=space, x=space or zero
11 to 9start="21" finish="29"100000000
11,2,3 and 9start="21" finish="29"100xxxxx0
1, 31 to 12start="21" finish="32"101000000000
none1 to 99start="21" finish="120"000000000...0
2, 81 to 9start="21" finish="30"010000010b or 0100000100
21,2,3 and 9start="21" finish="24"illegal
missing1 to 9start="21" finish="29"bbbbbbbbb
missing1,2,3 and 10start="21" finish="30"bbbxxxxxxb
11 to 9start="5" (csv format)100000000 or "100000000"
2, 81 to 9start="5" (csv format)"010000010"

Spread format

Data is recorded as a series of subfields each containing one category value of the variable. The category value is recorded as an integer number as described in the associated <values> or <range> elements. The number 0 should be used to represent subfields that are not needed.

The data subfield length is the minimum number of characters required to represent the largest value in the values block. Thus variables with values up to 9 have a data subfield one character long, variables with values up to 99 have a data subfield length of 2, and so on. If any particular data value requires less than the maximum for the subfield, it should be right justified using leading space or zero characters as padding. Data values may be stored in any or all subfields.

If the data subfield length is less than the subfield defined in the <spread> element then it is assumed to be right justified within the width defined in the spread. Export programs must ensure that extra columns contain blanks or zeros within the subfields.

If the total width of the subfields is less than that defined in the <position> element then the subfields are stored consecutively left justified within the locations defined by the position. Export programs must ensure that any extra columns contain blanks or zeros.

For example:

Data valueMaximum in <values> element<spread> element<position> elementData record b=space
11 to 9subfields="2" width="1"start="21" finish="22"10 or 01
11, 2, 3 and 9subfields="2" width="1"start="21" finish="22"10 or 01
1, 31 to 9subfields="2" width="1"start="21" finish="22"13
11 to 9subfields="2" width="2"start="21" finish="24"b1b0 or b0b1 or 0100 etc.
none1 to 9subfields="2" width="1"start="21" finish="22"00
21, 2, 3 and 9subfields="2" width="1"start="21" finish="24"20bb or 02bb or 2000 etc.
1, 421 to 999subfields="2" width="3"start="21" finish="26"001042
missing1 to 999subfields="2" width="3"start="21" finish="26"bbbbbb
11 to 9subfields="2" width="1"start="4" (csv format)1 or 10 or "10" or "01"
11 to 99subfields="2" width="2"start="4" (csv format)"0100" or "0001"

Variables of type Quantity

Data is recorded as a number with the same number of decimal places as were used in the <values> and <range> elements in the specification of the corresponding variable. A decimal point (i.e. full stop or period, ‘.’) should always appear if one was used within the <values> element specification.

For example:

Data value<range> element<position> elementData record b=space
7from="0" to="99"start="21" finish="22"b7 or 07
7.00from="0.00" to="99.99"start="21" finish="25"b7.00 or 07.00
-7from="-99" to="99"start="21" finish="23"b-7 or -07
7from="-1" to="99"start="21" finish="22"b7 or 07
7from="-1" to="99"start="21" finish="23"bb7 or b07 or 007
-1.00from="-1.00" to="99.99"start="21" finish="26"b-1.00
17from="0" to="999"start="21" finish="22"illegal
99from="0" to="50" with additional <value code="99">start="21" finish="22"99
missingfrom="0" to="999"start="21" finish="23"bbb
7from="0" to="99"start="4" (csv format)7 or 07 or "07"
-1.00from="-1.00" to="99.99"start="4" (csv format)-1.00 or "-1.00"

The data field length must accommodate the longest allowable value defined by the <values> and <range> elements. When calculating the physical size of data for the variable, an allowance should be made for the sign of negative values. Negative numbers are represented with a leading minus sign, '-'. No such allowance should be made for (the sign of) positive values. If a particular value can be represented in a smaller length then it is right justified in the data field and leading spaces or zeros are used as padding. For negative values the spaces should appear to the left of the '-', but leading zeros should appear to the right of the '-'.

If the data field length from the <values> element is less than that defined in the <position> element then it is assumed to be right justified within the locations defined in the position. Export programs must ensure that any extra columns contain blanks or zeros.

Variables of type Character

Data is recorded as the original character string.

The length of the field is simply the value defined by the <size> element of the corresponding variable. If the data field length from the <size> element is less than that defined in the <position> element then it is assumed to be left justified within the locations defined in the <position> element. Import programs should then ignore any extra parts of the position field.

For example a character variable of: <size>10</size> and data as the word character would be recorded as: "character ".

Variables of type Logical

Data is recorded such that character ‘0’ represents FALSE and character ‘1’ represents TRUE.

The length of the field is always one character. If the <position> element defines a width of more than one character then the rightmost character is used and all others should be ignored.

For example, a value of true would be represented as: 1

Variables of type Date

Data is recorded in the YYYYMMDD basic ISO 8601 format where “YYYY” is the 4 digit year, “MM” is the 2 digit month, and “DD” is the 2 digit day.

The length of the field is always 8 characters. If the <position> element defines a width of more than 8 characters then the leftmost characters are used and all others should be ignored.

For example, a value of 1st April 2011 would be represented as: 20110401

Variables of type Time

Data is recorded in the HHMMSS basic ISO 8601 format where “HH” is the 2 digit hour using the 24-hour clock, “MM” is the 2 digit minute, and “SS” is the 2 digit second.

The length of the field is always 6 characters. If the <position> element defines a width of more than 6 characters then the leftmost characters are used and all others should be ignored.

For example, a value of 4.15pm would be represented as: 161500

Extended Triple-S

Namespace Declaration

<sss xmlns:miextsss="urn:mipro.net:miextsss:v.1.0">

Extension Elements

  • Grid:
<miextsss:grid></miextsss:grid>

Represents a Walr grid question (rn/rm) and can contain one or many Triple-S variables of type multiple or single. If all variables are singles this construction will create a single choice grid.

  • Group:
<miextsss:group></miextsss:group>

Represents a Walr question with more than one or many sub questions. Can contain one or many Triple-S variables and grids.

  • Multi choice question from singles:
<miextsss:multilist></miextsss:multilist>

Can contain one or many Triple-S variables of type single. This construction will be converted to a Walr multi choice question.

  • Single/multi with open rows:
<miextsss:nmopen></miextsss:nmopen>

Can contain one or many Triple-S variables of type single, multiple and character. This construction will be converted to a Walr multi choice question with open rows.

  • Numeric question with many rows:
<miextsss:numericlist></miextsss:numericlist>

Can contain one or many quantity variables, will be converted to a Walr numeric question with one row for each variable.

  • Labels in constructions:
<miextsss:label></miextsss:label>

Label changes in variables are made directly in <label> elements. Constructions can have their own label texts by using this element. If a construction does not have a miextsss:label element, the label will be taken from the first variable inside the construction.

In groups, miextsss:label will be used as Walr question text (qtext) and each variable will use its label as sub question text (stext). In all other construction, miextsss:label will be used as sub question text (stext) only and override the label element.

Extension Attributes

  • Codes:
miextsss:code

Code changes are defined with this attribute, can be set in value elements to override an existing Triple-S code

<value code=”3” miextsss:code=”99”>

The code 99 will be used in the final Walr questionnaire but code 3 will be used to collect data from Triple-S.

  • Identifiers:
miextsss:qno

Triple-S identifiers can be overridden in the Walr questionnaire by setting this attribute in variables or constructions. If not used, the Triple-S name will be used as identifier.

<variable miextsss:qno=”Q1”>, <miextsss:grid miextsss:qno=”grid1”>
  • Weights:
miextsss:isweight

Triple-S only supports one weight variable in a questionnaire. This can be overridden by miextsss:isweight=”true” on one or many quantity variables.

  • Weight type:
miextsss:weighttype

can be set to population or sample (population is default)

  • Special numeric handling:
miextsss:flength, miextsss:fdecimals

Use these attributes to tell the converter that this quantity variable should be formatted as a Walr F question with the specified length and decimals

<variable type=”quantity” miextsss:flength=”4” miextsss:fdecimals=”2”>
  • Internal handling of numeric types:
miextsss:type=”integer|decimal|extended|datetime”

Can be set on quantity variables to specify the numeric type. Note: this is an internal attribute and should only be used when a type conversion is possible for all respondents.

Converting to decimal (double) from integer is in most cases a safe conversion, but not the other way around.

  • Excluded elements:
miextsss:excluded

Triple-S elements can be excluded from the Walr questionnaire by using this attribute in variable, construction and value elements:

<miextsss:multilist>
<miextsss:label>Multi from singles</miextss:label>
<variable type="single">
...
<values>
<value code="1" miextsss:code="10">Brand A</value>
<value code="2" miextsss:excluded="true">Not answered</value>
</values>
</variable>
<variable type="single">
...
<values>
<value code="1" miextsss:code="20">Brand B</value>
<value code="2" miextsss:excluded="true">Not answered</value>
</values>
</variable>
</miextsss:multilist>

Will create a multi choice question with two rows, Brand A and Brand B

Conversion Process - SPSS SAV

SAV files are converted to Extended Triple-S using the following rules:

  • A numeric variable is converted to a Triple-S quantity variable unless the SAV variable has value labels in which case the variable is converted to a Triple-S single variable. miextsss:type will be used to specify the numeric type, i.e. integer (if all values are valid 32 bits integers) or decimal (for all other value types)
  • A string variable is converted to a Triple-S character variable unless the SAV variable has value labels in which case the variable is converted to a Triple-S single variable.
  • A Date variable is converted to a Triple-S quantity variable with miextsss:type=”datetime”. Data will be saved as the number of ticks that represents the date and time of each value. Explicitly declared missing values are handled as blanks. For example, if code 99 represents a missing value, all occurrences of 99 in the variable data are converted to blanks (no answer).

Examples

Metadata File Example (CSV raw data)

The example defines a survey with twelve variables each demonstrating one or more v2.0 features as annotated…

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE sss PUBLIC "-//triple-s//DTD Survey Interchange v2.0//EN" "http://www.triple-s.org/dtd/sss_v20.dtd">

<sss version="2.0" modes="interview analysis">
<!-- introducing modes used to specialise texts at Q2 -->

<date>14 April 2011</date>
<time>16:00</time>
<origin>SurveyProg v1.3.05</origin>
<user>User Site</user>

<survey>
<name>SP5201-1</name>
<title>Historic House Exit Survey<br/>First Wave</title>

<record ident="V" format="csv" skip="1">
<!-- csv file specification: skip first record, default name .csv -->

<!-- ... -->

</record>
</survey>
</sss>

Data File Example (CSV raw data)

The example Data File is intended to be used in conjunction with the Metadata File above.

RESPONDENT_ID,Q1.a,Q1.b,Q2,Q3,Q4,Q3.a,Q5,Q6,Q7,Q8,WT
520001,20110504,112000,0,101010001,2,Nottingham Goose Fair,51,25,1,A,1.131
520002,20110506,134300,2,"010000000",9,,2,100,0,,0.9921
520003,20110503,180500,1,110000001,1,"""Heritage"" Zone",92,999,1,C,1.0089

Interpretation

The Triple-S Metadata File applied to the above Data File should result in the following interpretations.

Respondent 1Respondent 2Respondent 3
RESPONDENT_ID520001520002520003
Date of visitMay 4th, 2011May 6th, 2011May 3rd, 2011
Time of visit11:20 am1:43 pm6:05 pm
Visited beforeNo, first visitVisited before thatVisited before within the year
Attractions visitedSherwood Forest, "Friar Tuck" Restaurant, Mining Museum, OtherNottingham CastleSherwood Forest, Nottingham Castle, Other
Other attractionsNottingham Goose Fair(no reply)"Heritage" Zone
Overall impressionGoodDK/NSVery Good
Two favourite attractionsMining Museum, Sherwood ForestNottingham CastleOther, Nottingham Castle
Miles travelled25100Not stated
Would come againtruefalsetrue
When come againWithin 3 months(not asked)More than 1 year
Case weight1.1310.99211.0089

Please note that the XML code is truncated for brevity. You can replace the <!-- ... --> comment with the rest of your XML code.