Data sources enable passive data collection from participants’ devices and wearables, operating through automatic and continuous background processes that collect physiological, behavioral , and environmental data. Unlike Activities such as surveys and cognitive tasks (which require active participant engagement and are triggered by specific events), data sources run continuously in the background.
In this article, we explain how to add, configure, view, export, and remove data sources in your Avicenna study.
Adding Data Sources
To add a data source to your study:
-
Go to your study dashboard.
-
Go to
Data Sources
section.
- Click on the
Add New Data Source
button.
- Select a data source. You’ll see a list of data sources supported by Avicenna, organized into categories such as Apple HealthKit, Contact, Digital Footprint, and more. These categories help you quickly find data sources based on the type of information they collect. Scroll through the categories and select the data source that best fits your study’s needs.
- Configure the data source.
- Mandatory vs. Optional: First, you should specify whether providing this data source is mandatory or optional for your study participants. If a data source is marked as optional, the Avicenna app allows participants to opt out of this data source within the app. Note that in most cases, participants can simply revoke the necessary permissions for Avicenna to collect the requested data source. In this case, this lack of necessary permission is reported via the Audit Trail.
- Name and Description: Then, you should choose a name and a description for your data source. These values will be shown to the participant to explain what is being collected and why. You may add more details on why your study collects certain data sources within the informed consent, but the description here can also help participants to better understand why a specific data source is needed for your study.
The example of the name and description shown to the participant:
- Click on
Add
to finish adding the data source.
[!note]
Localization support: If your study is available in multiple languages, names and descriptions can be translated.
View Data Sources
Once data sources are added, you can view and manage them along with their configurations here. You can sort the list by type, category, name, and mandatory status.
If you click on each data source’s menu, you can see the following options:
- Edit: To edit a data source’s configuration, simply press this button and apply your modifications.
- Go to Data Export: Pressing this button will take you to the Data Export page where you can export the data collected by this data source.
- Remove: Press this button and confirm your intent if you want to remove the data source from your study. This will stop collecting that data for your study immediately. If you want to delete the data for this data source as well, mark the Delete the data from the data source checkbox as checked. If for any reason you decided to delete the data after you deleted the data source with that checkbox left unchecked, please contact Avicenna support staff.
[!note]
If any activity in your study contains at least one Proximity triggering logic in its latest version (whether published or draft), the Bluetooth Beacon data source cannot be removed. Similarly, activities containing at least one Geofencing triggering logic will prevent removal of the GPS data source.
Export Data
You can use Researcher Dashboard to export data collected from data sources or activity responses. Both data sources and activity responses support export as CSV, though you have more options depending on the type of the data being exported. For example, for GPS you may also choose KML
or GPX
export, or for contact network data you may also choose to export as GEXF
.
To export your data, open the researcher dashboard and navigate to the Data Export
page:
In this page, you start with choosing the list of participants for whom you want to export the data, and the type of data you want to export. Depending the type of data, Avicenna may ask you to choose the export format as well. For most data sources, you can download the data as a CSV file. Although for GPS you can also choose KML
or GPX
, and for Bluetooth Beacons you can choose GEXF
as well.
After selecting the export format, you can choose the date range as well. Pressing Export
will start the data export process. The export may take up to a few hours to complete, depending on the size of the data. You can always come back this page to check the status of your data export. When the data export file is ready, you will receive an email about it, and you can come back to the Data Export file to download the file.
Note that the Data Export page will also list all Survey Response export requests, even though for exporting the survey responses you need to use the Responses page, as explained in the View Responses document.
Each data export will be available for download for up to 7 days. After that, Avicenna automatically deletes the export file. If you need to download the data again, you must create a new data export.
At the moment Avicenna does not have any limitations on the size of the data export file. But for sensor data this file can be very large, specially if a long date range and many participants are chosen to be exported. So it’s not uncommon for the file to surpass 10GB in size. There is no limitation at the moment on the file size, though we do request that if you expect your data export to be very large, break it into multiple requests, so it can generate multiple files.
Data Fields
The data fields you will find in each file depend on the type of data being exported. For survey responses, the list of available fields are explained in the View Responses document. For the sensor-based data sources, the list of data sources are described in their related section in the Data Sources document.
Downloading Record Counts
You can download a CSV file containing the count of the collected records in each hour-long interval grouped by data source for each participant in the study. This provides a quick overview of the data collection volume across different data sources in your study.
To do that, on the Data Export
page, click on Download All Data Sources Record Counts as CSV
.
The downloaded CSV file includes the following columns:
data_source_id
: The unique identifier for the data source. See the Data Source ID Reference for more details.user_id
: The participant’s unique identifier.participant_type
: Either “Main” or “Test”.device_id
: The unique identifier of the device that collected the data.time_bin
: The date and hour for which the aggregated count is reported.count
: The number of records collected for that data source on that date.
Data Source ID Reference
The following table provides a reference for mapping data source IDs to their corresponding data sources:
ID | Data Source |
---|---|
1 | Accelerometer |
2 | Ambient Temperature |
3 | Gyroscope |
4 | Gravity |
5 | Light |
6 | Linear Acceleration |
7 | Magnetic Field |
8 | Orientation |
9 | Pressure |
10 | Proximity |
11 | Relative Humidity |
13 | WiFi |
14 | Bluetooth |
15 | GPS |
16 | Battery |
19 | Ambient Audio |
20 | App Usage (Legacy) |
24 | Screen State |
25 | Pedometer |
26 | Activity Recognition |
27 | Bluetooth Beacon |
30 | Fitbit Heart Rate |
33 | Fitbit Sleep |
37 | Garmin Health |
39 | HealthKit |
42 | Garmin Health Daily |
43 | Garmin Health Heart |
44 | Garmin Health Respiration |
45 | Garmin Health Sleep Daily |
46 | Garmin Health Sleep |
47 | Fitbit Activity Summary |
48 | Garmin Health Pulse Ox |
49 | Garmin Health Stress |
50 | Garmin Health Body Composition |
51 | Garmin Health User Metrics |
52 | Weather |
53 | Fitbit Activity |
54 | Fitbit Sleep Level |
55 | Fitbit Active Zone |
58 | WHOOP Workout |
59 | WHOOP Sleep |
60 | WHOOP Recovery |
61 | Polar Exercise |
62 | Polar Sleep |
63 | Polar Continuous Heart Rate |
64 | Polar SleepWise Circadian Bedtime |
65 | Polar SleepWise Alertness |
66 | Fitbit Weight Log |
67 | SensorKit Heart Rate |
68 | SensorKit Accelerometer |
69 | SensorKit Rotation Rate |
70 | SensorKit Ambient Light |
71 | SensorKit Ambient Pressure |
72 | SensorKit Device Usage Report |
73 | SensorKit Keyboard Metrics |
74 | SensorKit Message Usage Report |
75 | SensorKit On Wrist State |
76 | SensorKit Pedometer |
77 | SensorKit Phone Usage Report |
78 | SensorKit Telephony Speech Metrics |
79 | SensorKit Siri Speech Metrics |
80 | SensorKit Visits |
81 | SensorKit Wrist Temperature |
82 | Hexoskin Shirt |
98 | HealthKit Activity |
99 | HealthKit Vital Signs |
100 | HealthKit Sleep Analysis |
103 | HealthKit State of Mind |
105 | Web Activity Tracking |
Direct Database Access
While Avicenna’s data export allows you to create complex queries and export any data from your study as CSV, this will not cover all analysis cases. For more advanced use-cases, you may need to connect directly to the database.
We can provide direct database access to your team to query and work with your study data. At the moment, this feature is not automatically provided. If you need to have direct database access, please contact Avicenna Support.
Handling of Timezones
Every piece of information stored in Avicenna is time-stamped as appropriate. All time values are stored internally in UTC.
However, keep in mind that all participants’ data exported from your study will be based on the participants’ timezones. This is because presenting participants’ data in their local timezones enhances researchers’ ability to interpret and analyze the data accurately, aligning it with the study protocol and the participants’ context.
Common Data Fields
You can access the collected data either by exporting them via the Data Export page, or by directly querying them using Kibana. The data format is different based on the data source, for example, GPS data contains location coordinates, while the Pedometer data contains the number of steps taken. Regardless, there are some common fields for each record of each data source that we explain below.
Study ID: The unique ID of the study provided the data. Internally stored as study_id
.
User ID: The unique ID of the participant provided the data. Internally stored as user_id
.
Device ID: The unique ID of the smart device provided the data. Internally stored as device_id
.
Record Time: The time this record was captured. Internally stored as record_time
.
Relative Record Time: The time this record was captured, relative to the participation period’s start time, in milliseconds. For example, 3,600,000 indicates the record was captured 1 hour after the participant joined the study. Internally stored as rel_record_time
. Please note that this field won’t be updated if you change a participant’s start time.
Data Collection Behavior of Avicenna
Avicenna supports data collection from Android, iOS, and wearable devices.
Permissions and Setup Flow
Some data sources require specific permissions. When participants join a study that includes such sources and permissions haven’t been granted yet:
A message appears at the top of the study homepage, stating that the study setup is incomplete.
Participants must either:
- Grant the required permissions, or
- Select “Don’t have this device” (available only for wearable data sources).
This excludes that specific data source for the participant.
[!note]
Participants can later revisit the Data Sources page to update permissions. The Avicenna app also allows participants to revoke permissions at any time.
Web-Only Participation
Participants using only the web app (not the mobile app) cannot contribute data from phone-based sensors. However, wearable data can still be collected, because:
- Wearable data is pulled from OEM servers (e.g., Garmin), not directly from the device.
- Participants grant access to their account on the wearable provider’s server.
- Avicenna fetches this data at the end of every day.
As a result, wearable-related data sources can be configured via the web app.
Phone Sensor Data Collection Timing
For mobile sensors like GPS and Pedometer, Avicenna requests data from the OS once every 5 minutes.
- iOS guarantees this 5-minute interval
- Android doesn’t guarantee it and might provide data either less often or more often than 5 minutes.
Sensor Data Collection Models
1. Continuous Collection (e.g., Pedometer)
In this approach, the device’s operating system continuously collects data. The OS then provides all the collected data to the Avicenna app when Avicenna queries it from the device. For example, Android and iOS devices continuously count the participant’s steps. If a study has the Pedometer sensor enabled, the Avicenna app queries the pedometer data once every 5 minutes, but it gets the total number of steps taken since the last request. So even though the Avicenna app queries data once every 5 minutes, it collects all steps taken by the participant. Similarly, Android and iOS always check whether the screen is on or off. When the screen state changes, the OS notifies the Avicenna app, regardless of the 5-minute data query interval.
2. Episodic (Burst-Based) Collection (e.g., GPS)
In this approach, Avicenna asks the OS every 5 minutes to collect data for a certain period, called Burst Length. The burst length is different for different data sources. For example, GPS keeps collecting data until it reads three accurate data points in a maximum time of 60 seconds. For battery, Avicenna collects one record in each cycle. For the accelerometer, Avicenna collects data for 60 seconds.
For details on the data collection behavior of each data source, please refer to the relevant documentation page following this section.
Troubleshooting
Low or missing records
If you think some data records are missing or there are fewer data records than you expected, whether after exporting/downloading the data or by viewing the In-Operation
or other plots on the Participation
page:
- Check the general steps to diagnose participation issues.
- Check if the study setup (that pink banner at the top of the study’s homepage) is completed. You can check the Application State logs on Kibana to see which permissions are granted/revoked or not granted at all. Note that some data sources might not need specific permission. On the other hand, the participant should prevent the mobile app from being restricted/terminated by the operating system. Check these pages for more details:
- If you haven’t marked your data source as
Mandatory
, the participant might have opted out of its data collection. You can check that by theStudy Data Sources
field under Application State logs on Kibana. - Check the data collection behavior of Avicenna to understand possible limitations and differences between Android and iOS, and see what you can expect in general. For each data source, check the additional details, if any, on the collection behavior under their own pages. For example, for GPS, see this section. This is especially useful if you checked the In Operation status and thought it was low; a value less than 100% does not necessarily translate to data loss.
- Check if the participant is using the web app by checking their devices. Some data sources (e.g., GPS) collect data using mobile apps only.
- Check for the “Data Collection Failed” and “Data Collection Cycle Did Not Start” audit logs.
- Check if the sensor is working properly on the device. You can test that by collecting similar data using other apps or, in the case of wearable data sources (e.g., Garmin), see if you or the participant can see any data under the corresponding accounts.