Portál diabetes.zcu.cz

File reader

The file reader is an entity, that reads a foreign file format and emits all relevant information as device events. This might be an insulin pump data dump file, or any other similar file with a structured data format. For now, this entity is designed to work with glucose readings (IG, BG) only, hence the parameter definitions.

Configuration

Input_File (string) - a path to the file for data extraction
Maximum_IG_Interval (time) - maximum interval between interstitial glucose readings, that is accepted within a single time segment
- when an IG value with time spacing greater, than this option comes, the current time segment is ended and a new one is started
Shutdown_After_Last (boolean) - when operating in the asynchronous mode, should we emit the Shut_Down device event after the last step?
Minimum_Required_IGs (integer) - how many interstitial glucose levels are required to even consider a given time segment
Require_BG (boolean) - if this parameter is set to true, the time segment will be considered if and only if it contains at least one blood glucose level

Supported interfaces

None.

General function

The file reader opens the source file and attempts to recognize the format using a signature approach (see below). When a pattern matches, the appropriate extractor is instantiated and extraction process begins. The extractor first loads all values and relevant information into memory, analyzes it, organizes values into time segments and discards unused levels (e.g. when certain conditions set by Minimum_Required_IGs and Require_BG are not met). Then, it sends all extracted info through the filter chain as device events.

The format recognizer and extractor works with a rule set stored in file, that allows for additional format adding without the need for recompilation. A basic rule set is embedded within the file, but it can be merged with a rule set stored in an external file.

Historically, the file reader supported data file anonymization for storing the original file without sensitive information. The anonymization step is not needed as we do not extract and store such information in the SmartCGMS anymore.

We currently support the following file formats:

CareLink Pro data dump file
DiaSend export
iPro glucose monitor data file
Dexcom insulin pump data dump file
Ohio T1DM dataset

Rule set file format

The rule set file format is an INI file. As the extractor currently supports table-structured (e.g. a CSV) and hierarchical (e.g. an XML) file formats, the notation for the fields take two different forms.

For the recognizer and extractor, the INI contains sections with names, that serves as a format identifier. This section name must be identical in both recognizer and extractor rule set files.

The key in every section is a cell/field locator. For the table-structured formats, it could be either a comma-separated row-column pair (e.g. 4,6), or a letter-number pair (e.g. B4, as commonly seen in the table processor software). For the hierarchical formats, it is a path within the tree, with each node separated by a slash (e.g. /Patient/GlucoseReadings). Optionally, when using an XML with attributes, the attribute of the given element is selected by appending the attribute name preceded by a dot (e.g. /Patient/Events/Event.DisplayTime.

A simple conditional is supported for an exact match. For example, when the value types are distinguished by an attribute, we use a special rule event-cond-header to link the selection rule with extraction rule. An example follows:

/Patient/Values/Value.Type=event-cond-header

?event-cond-header>InterstitialGlucose=ig-header
?event-cond-header>BloodGlucose=bg-header

This first locates the /Patient/Values/Value element and reads the Type attribute using the selection rule (first line). Then, the element is evaluated using the two rules below. If the Type attribute matches the string InterstitialGlucose, the extraction rule ig-header is used. Similarly, when it matches the BloodGlucose string, the extraction rule bg-header is used.

Format recognizer

The format recognizers use rules to recognize file format. It always tries to match all the known formats, regardless of the file type.

Additional rules might be stored in a file named patterns.ini

The key contains a cell/field locator. The value is a string, that must be matched. Note that the recognizer recognizes the file as the first format, that matches all rules. Therefore, when specifying a new format, you should define a sufficiently comprehensive set of rules.

For example, we may recognize a format based on the device signature; therefore, a single rule suffices:

[my-new-format]
0,1=Super CGM Device

Using the format defined in the example above, the recognizer looks for a string value Super CGM Device in a cell, that is located in the row index 0 and column index 1. When this rule matches, the my-new-format is recognized and extraction rules for this format are used.

Extractor

Additional rules might be stored in a file named format_rules.ini

The extractor use rules to identify data streams of useful values. Then it iterates the rule over all linked cells/fields (e.g. the whole column or an element subtree).

The key contains a cell/field locator. The value is a string, that identifies a known (internal) field extraction rule (see below). When a table-structured format is extracted, we locate the header of the column. When a hierarchical format is extracted, we locate the element group (that shares the tag name within a single parent), that contains the extracted elements.

For example, let us suppose we have a CSV file (of the above mentioned format), that contains only the time and interstitial glucose readings. This may look as follows:

Device;Super CGM Device
time;value
2022-09-01 15:50;7.557
2022-09-01 15:55;7.62
...

Then we need two extraction rules:

[my-new-format]
1,0=datetime-header
1,1=ist-header

This identifies the data stream starting with a header cell at 1,0 and 1,1 as a stream of datetime values and interstitial glucose values, respectively. The extractor then moves to the next row and starts data extraction by iterating the extraction rule on all subsequent rows.

Field extraction rules

The extractor needs to identify the types of data streams by matching the values to their meanings. Currently supported field extraction rules are:

date-header - date of measured value
time-header - time of measured value
datetime-header - date and time of measured value (as a single cell)
ist-header - interstitial glucose levels (mmol/L)
ist-mgdl-header - interstitial glucose levels (mg/dL)
isig-header - ISIG values
blood-raw-calibration-header - blood glucose calibration values (mmol/L)
blood-calibration-header - blood glucose calibration values (mg/dL)
blood-raw-header - blood glucose values (mmol/L)
blood-header - blood glucose values (mg/dL)
insulin-bolus-header - insulin bolus values (U)
insulin-basal-rate-header - insulin basal rate values (U/hr)
insulin-temp-basal-rate-header - temporary basal rate values (U/hr)
insulin-temp-basal-rate-datetime-header - start of temporary basal rate
insulin-temp-basal-rate-datetime-end-header - end of temporary basal rate
carbohydrates-header - carbohydrates intake values (g)
event-header - generic events, we distinguish between them using conditionals
event-datetime-header - datetime of an event
event-cond-header - event conditional, see above
physical-activity-header - physical activity intensity
physical-activity-duration-header - duration of the physical activity
skin-temperature-header - temperature of the skin
air-temperature-header - temperature of the air
heartrate-header - heartrate (BPM)
electrodermal-activity-header - electrodermal activity value
steps-header - number of steps
sleep-quality-header - sleep quality index
acceleration-magnitude-header - acceleration vector magnitude

The above listed rules are directly linked to the internally implemented rules, that matches the extracted values to a signal.

There should not be a need for custom field extraction rule definition. However, if you need to define such a rule, you need to link it to an existing internal rule. This is useful basically only when the extracted field needs to be converted to an internal unit by multiplying it by a given multiplier, or when the value needs to be parsed out of a string.

The field extraction rule set supports the following parameters:

header - this is the parameter, that links to the internal rule by its string identifier (see below)
multiplier - multiply the extracted value by this factor
stringformat - an expected format of a string, that contains the value; uses the %f placeholder to find the actual value

Additional rules might be stored in a file named format_rule_templates.ini

To define a field extraction rule, create new section with your chosen name and link the rule with the internal rule. For example, let us suppose we have an insulin bolus value defined in mU (instead of U), and also stored in a string with an additional characters we want to parse out (e.g. "Bolus 1500 mU"). Then we need to define a new rule:

[insulin-bolus-mu-header]
header=insulin-bolus
multiplier=0.001
stringformat=Bolus %f mU

We recognize the following internal rules:

date
time
datetime
ist
isig
blood
blood-calibration
insulin-bolus
insulin-basal-rate
insulin-temp-basal-rate
insulin-temp-basal-rate-datetime
insulin-temp-basal-rate-datetime-end
carbohydrates
event
event-datetime
event-condition
physical-activity
physical-activity-duration
skin-temperature
air-temperature
heartrate
electrodermal-activity
steps
sleep-quality
acceleration-magnitude

Documentation