File reader
The file reader is an entity, that reads a foreign file format and emits all relevant information as device events. This might be an insulin pump data dump file, or any other similar file with a structured data format. For now, this entity is designed to work with glucose readings (IG, BG) only, hence the parameter definitions.
Configuration
Input_File
(string) - a path to the file for data extractionMaximum_IG_Interval
(time) - maximum interval between interstitial glucose readings, that is accepted within a single time segment- when an IG value with time spacing greater, than this option comes, the current time segment is ended and a new one is started
Shutdown_After_Last
(boolean) - when operating in the asynchronous mode, should we emit theShut_Down
device event after the last step?Minimum_Required_IGs
(integer) - how many interstitial glucose levels are required to even consider a given time segmentRequire_BG
(boolean) - if this parameter is set totrue
, the time segment will be considered if and only if it contains at least one blood glucose level
Supported interfaces
None.
General function
The file reader opens the source file and attempts to recognize the format using a signature approach (see below). When a pattern matches, the appropriate extractor is instantiated and extraction process begins.
The extractor first loads all values and relevant information into memory, analyzes it, organizes values into time segments and discards unused levels (e.g. when certain conditions set by Minimum_Required_IGs
and
Require_BG
are not met). Then, it sends all extracted info through the filter chain as device events.
The format recognizer and extractor works with a rule set stored in file, that allows for additional format adding without the need for recompilation. A basic rule set is embedded within the file, but it can be merged with a rule set stored in an external file.
Historically, the file reader supported data file anonymization for storing the original file without sensitive information. The anonymization step is not needed as we do not extract and store such information in the SmartCGMS anymore.
We currently support the following file formats:
- CareLink Pro data dump file
- DiaSend export
- iPro glucose monitor data file
- Dexcom insulin pump data dump file
- Ohio T1DM dataset
Rule set file format
The rule set file format is an INI file. As the extractor currently supports table-structured (e.g. a CSV) and hierarchical (e.g. an XML) file formats, the notation for the fields take two different forms.
For the recognizer and extractor, the INI contains sections with names, that serves as a format identifier. This section name must be identical in both recognizer and extractor rule set files.
The key in every section is a cell/field locator. For the table-structured formats, it could be either a comma-separated row-column pair (e.g. 4,6
), or a letter-number pair (e.g. B4
, as commonly seen
in the table processor software). For the hierarchical formats, it is a path within the tree, with each node separated by a slash (e.g. /Patient/GlucoseReadings
). Optionally, when using an XML with attributes, the attribute of the given element is selected
by appending the attribute name preceded by a dot (e.g. /Patient/Events/Event.DisplayTime
.
A simple conditional is supported for an exact match. For example, when the value types are distinguished by an attribute, we use a special rule event-cond-header
to link the selection rule with extraction rule. An example follows:
/Patient/Values/Value.Type=event-cond-header
?event-cond-header>InterstitialGlucose=ig-header
?event-cond-header>BloodGlucose=bg-header
This first locates the /Patient/Values/Value
element and reads the Type
attribute using the selection rule (first line). Then, the element is evaluated using the two rules below. If the Type
attribute
matches the string InterstitialGlucose
, the extraction rule ig-header
is used. Similarly, when it matches the BloodGlucose
string, the extraction rule bg-header
is used.
Format recognizer
The format recognizers use rules to recognize file format. It always tries to match all the known formats, regardless of the file type.
Additional rules might be stored in a file named patterns.ini
The key contains a cell/field locator. The value is a string, that must be matched. Note that the recognizer recognizes the file as the first format, that matches all rules. Therefore, when specifying a new format, you should define a sufficiently comprehensive set of rules.
For example, we may recognize a format based on the device signature; therefore, a single rule suffices:
[my-new-format]
0,1=Super CGM Device
Using the format defined in the example above, the recognizer looks for a string value Super CGM Device
in a cell, that is located in the row index 0 and column index 1. When this rule matches, the my-new-format
is recognized and extraction rules for this format are used.
Extractor
Additional rules might be stored in a file named format_rules.ini
The extractor use rules to identify data streams of useful values. Then it iterates the rule over all linked cells/fields (e.g. the whole column or an element subtree).
The key contains a cell/field locator. The value is a string, that identifies a known (internal) field extraction rule (see below). When a table-structured format is extracted, we locate the header of the column. When a hierarchical format is extracted, we locate the element group (that shares the tag name within a single parent), that contains the extracted elements.
For example, let us suppose we have a CSV file (of the above mentioned format), that contains only the time and interstitial glucose readings. This may look as follows:
Device;Super CGM Device
time;value
2022-09-01 15:50;7.557
2022-09-01 15:55;7.62
...
Then we need two extraction rules:
[my-new-format]
1,0=datetime-header
1,1=ist-header
This identifies the data stream starting with a header cell at 1,0
and 1,1
as a stream of datetime values and interstitial glucose values, respectively. The extractor then moves to the next row and starts
data extraction by iterating the extraction rule on all subsequent rows.
Field extraction rules
The extractor needs to identify the types of data streams by matching the values to their meanings. Currently supported field extraction rules are:
date-header
- date of measured valuetime-header
- time of measured valuedatetime-header
- date and time of measured value (as a single cell)ist-header
- interstitial glucose levels (mmol/L)ist-mgdl-header
- interstitial glucose levels (mg/dL)isig-header
- ISIG valuesblood-raw-calibration-header
- blood glucose calibration values (mmol/L)blood-calibration-header
- blood glucose calibration values (mg/dL)blood-raw-header
- blood glucose values (mmol/L)blood-header
- blood glucose values (mg/dL)insulin-bolus-header
- insulin bolus values (U)insulin-basal-rate-header
- insulin basal rate values (U/hr)insulin-temp-basal-rate-header
- temporary basal rate values (U/hr)insulin-temp-basal-rate-datetime-header
- start of temporary basal rateinsulin-temp-basal-rate-datetime-end-header
- end of temporary basal ratecarbohydrates-header
- carbohydrates intake values (g)event-header
- generic events, we distinguish between them using conditionalsevent-datetime-header
- datetime of an eventevent-cond-header
- event conditional, see abovephysical-activity-header
- physical activity intensityphysical-activity-duration-header
- duration of the physical activityskin-temperature-header
- temperature of the skinair-temperature-header
- temperature of the airheartrate-header
- heartrate (BPM)electrodermal-activity-header
- electrodermal activity valuesteps-header
- number of stepssleep-quality-header
- sleep quality indexacceleration-magnitude-header
- acceleration vector magnitude
The above listed rules are directly linked to the internally implemented rules, that matches the extracted values to a signal.
There should not be a need for custom field extraction rule definition. However, if you need to define such a rule, you need to link it to an existing internal rule. This is useful basically only when the extracted field needs to be converted to an internal unit by multiplying it by a given multiplier, or when the value needs to be parsed out of a string.
The field extraction rule set supports the following parameters:
header
- this is the parameter, that links to the internal rule by its string identifier (see below)multiplier
- multiply the extracted value by this factorstringformat
- an expected format of a string, that contains the value; uses the%f
placeholder to find the actual value
Additional rules might be stored in a file named format_rule_templates.ini
To define a field extraction rule, create new section with your chosen name and link the rule with the internal rule. For example, let us suppose we have an insulin bolus value defined in mU (instead of U), and also stored in a string with an additional characters we want to parse out (e.g. "Bolus 1500 mU"). Then we need to define a new rule:
[insulin-bolus-mu-header]
header=insulin-bolus
multiplier=0.001
stringformat=Bolus %f mU
We recognize the following internal rules:
date
time
datetime
ist
isig
blood
blood-calibration
insulin-bolus
insulin-basal-rate
insulin-temp-basal-rate
insulin-temp-basal-rate-datetime
insulin-temp-basal-rate-datetime-end
carbohydrates
event
event-datetime
event-condition
physical-activity
physical-activity-duration
skin-temperature
air-temperature
heartrate
electrodermal-activity
steps
sleep-quality
acceleration-magnitude