Wearable data used to reduce sex bias in illness prediction

q w, tags: wearable - images.unsplash.com

In recent news, researchers have introduced a general feature selection technique aimed at reducing sex-based biases in chronic illness prediction algorithms. Leveraging wearable device data from the TemPredict Study, which involved over 63,000 Oura Ring users, the study offers new insights into the role wearable technology can play in personalized health monitoring.

By addressing sex-based and other demographic biases in prediction models, this research could transform how we understand and manage chronic conditions such as diabetes and hypertension.

The TemPredict Study, conducted over a nine-month period in 2020, gathered physiological and survey data to create a robust dataset. This dataset was structured to support sex-debiasing efforts in health algorithms, providing valuable insights into chronic illness patterns across different population demographics.

The study’s findings demonstrate the potential of wearable device data in creating more inclusive and accurate health predictions, especially for groups often overlooked in traditional healthcare research.

Comprehensive data collection with the Oura Ring

Participants in the study wore the Oura Ring, a wearable health tracker that collects a range of physiological data points, including temperature, heart rate, respiratory rate, and physical activity. This data was wirelessly synced to participants’ smartphones and then securely uploaded to a cloud storage system.

The Oura Ring’s ability to continuously monitor these metrics provided researchers with an extensive source of health data, invaluable for examining chronic illness patterns over time.

Out of the 63,153 Oura Ring users who participated, a subset of 12,387 participants was identified as having at least one “usable” day, which researchers defined as a day with a minimum of four hours of non-missing temperature data during both awake and asleep states.

This subset included individuals with and without chronic conditions, providing a balanced sample for examining chronic illness patterns. Further narrowing the dataset, researchers selected 7,209 adults aged 40 and older to create a demographic focus on middle-aged and older adults, a group often at higher risk for chronic illnesses.

Each participant’s wearable data was analyzed across 82 attributes derived from five physiological data streams: temperature, heart rate, heart rate variability, respiratory rate, and physical activity. These features provided a rich basis for examining health patterns in individuals with and without chronic illnesses.

For instance, researchers evaluated metrics such as mean, standard deviation, skewness, and kurtosis to capture the full variability within the dataset. To explore relationships between paired data streams, they employed partial correlation analysis, which allowed them to isolate the influence of confounding factors, improving the reliability of their results.

Feature engineering for cohort creation

The next step in the research involved feature engineering, where the team categorized data into cohorts based on three main demographic labels: reported sex, age group, and chronic condition status.

By segmenting participants into these demographic groups, the researchers could better identify patterns and potential biases in the data. This cohort-based approach facilitated comparisons across different demographic profiles, highlighting differences in physiological patterns associated with chronic illness.

To ensure statistical robustness, each cohort was required to meet minimum participation standards. Specifically, each cohort needed at least ten participants, each contributing data for at least 28 consecutive days. This threshold was designed to ensure that each group had sufficient data for meaningful analysis, avoiding the potential for skewed results from small sample sizes.

The researchers calculated median values for each feature to create high-dimensional centroids that represented each cohort. These centroids served as a collective health profile for each group, capturing the median physiological characteristics of participants within each demographic segment.

For data normalization, the researchers used z-scoring, ensuring consistency across features before analyzing the distance between cohort centroids.

To measure cohort similarities and differences, the researchers employed a city-block distance metric combined with average linkage clustering. This clustering process grouped cohort centroids based on proximity in high-dimensional space, revealing clusters of participants with similar health profiles.

To statistically validate these clusters, chi-square tests and Cramer’s V tests were used to examine the associations between demographic variables, such as sex, age, and chronic condition status, and the resulting clusters. This statistical analysis provided a deeper understanding of the impact of demographic factors on chronic illness indicators, enabling the researchers to identify patterns and trends that might otherwise remain undetected.

Physiological and demographic distances

A significant focus of the study was the analysis of physiological and demographic centroid distances, which allowed researchers to map out relationships between cohorts in a high-dimensional feature space.

By calculating city-block distances between cohort centroids, researchers could identify demographic groups that were closest in terms of their physiological profiles, offering insights into how health patterns overlap across different segments of the population.

This spatial analysis revealed key relationships between demographic groups, showing, for instance, that certain age groups shared physiological characteristics despite differences in chronic condition status.

By measuring both demographic and physiological centroid distances, the researchers were able to identify pairs of cohorts that were similar in multiple dimensions, thereby underscoring the need for algorithms that can account for these overlapping characteristics in chronic illness prediction.

To evaluate the significance of these cohort proximities, researchers generated null hypothesis distributions to test the likelihood of observing these overlaps by chance. This rigorous statistical validation provided evidence that the proximities were not random, lending further support to the importance of demographic factors in chronic illness prediction.

The findings highlight the role of age, sex, and chronic condition status as critical variables that influence wearable device data, a consideration that can inform the design of more accurate and inclusive health algorithms.

Statistical analyses and feature separability

In the final phase of the study, the researchers focused on feature separability to better understand how physiological data differed across demographic groups, with a particular emphasis on sex-based disparities.

Through the Common Language Effect Size (CLES) method, the team quantified differences in physiological features across sex-based cohorts, transforming the results into Adjusted CLES values for better comparability across features. This adjustment allowed for an unbiased evaluation of sex-based variations, revealing distinct physiological patterns that could inform sex-specific health predictions.

The researchers applied one-sided Wilcoxon signed-rank tests to determine the statistical significance of the adjusted CLES values across various feature categories. These tests highlighted physiological metrics that consistently varied between male and female participants, underscoring the need to address sex-based bias in wearable device data analysis.

By identifying features with significant differences, the researchers set the stage for refining chronic illness prediction algorithms to account for demographic nuances, potentially reducing sex-based biases in health technology.

To further explore these physiological distinctions, the researchers analyzed feature categories based on their inherent characteristics or complex interactions between different physiological parameters.

For instance, they grouped heart rate and root mean square of successive differences (RMSSD) features due to their established functional relationship, allowing for a more detailed examination of underlying health processes.

Conclusions and future implications

The TemPredict Study’s innovative approach to wearable data analysis has set a new standard for addressing demographic biases in health technology. By integrating advanced feature selection techniques with demographic-specific cohort analysis, researchers demonstrated the potential of wearable devices like the Oura Ring to contribute to more personalized and inclusive health predictions.

This study underscores the importance of demographic considerations in chronic illness prediction, particularly in the context of wearable technology, which is becoming increasingly prevalent in health monitoring..

As the wearable device industry continues to grow, the methods and findings from this study may inspire further research and innovation. By prioritizing demographic diversity in data analysis and algorithm design, future studies can help ensure that wearable technology serves as a tool for everyone, addressing the unique health needs of diverse populations.

This research not only enhances our understanding of chronic illness prediction but also paves the way for a more inclusive and effective future in digital health solutions.

Scroll to Top