Developing a Linear Model for Robomussel Body Temperatures

Background

Rocky intertidal communities are uniquely positioned to respond to anthropogenic climate change. Species in the intertidal are exposed to both daily air and ocean temperatures with relative timings depending on local tidal patterns. Body temperatures therefore vary with several physical variables including air temperature, ocean temperature, and wind-chill, as well as other factors including fog, radiation, and wave splash. This complexity is heightened by the ability of the body to mediate responses to temperature change. In humans, body temperatures are clearly not very sensitive to changes in air temperature. However, smaller species, like those in the intertidal, are more sensitive and responsive to changes in ambient temperatures.

Measurement of species body temperatures is normally a time and labor intensive task resulting in relatively limited data. The Helmuth lab developed and has deployed biomimics (“robomussels”) which continuously record temperatures that are within ~2 degrees C to the body temperatures of living mussels. Mussel and robomussel body temperatures are significantly different from the temperatures measured by unmodified loggers or nearby weather stations, providing information on physiological thermal stresses that are not detectable with commonly-used methods.

Robomussel data at Cape Mendocino, Oregon, was used to develop a predictive model based on physical data. Oregon tides are unique in that low-tides coincide with max daily air temperatures making this location particularly susceptible to heatwaves and climate change. Average, minimum, and max daily robomussel temperatures were separately evaluated. Parameters evaluated included on-site ocean temperature data, remote-sensed 2 meter air temperature (from ERA-5), remote-sensed 2 meter dewpoint temperature (from ERA-5), remote sensed 10m windspeed (from ERA-5), and remote-sensed sea surface temperature (from Optimally Interpolated Sea Surface Temperature) data. Windspeed and air temperature were used to calculate a wind-chill factor which was included as a parameter. All remote sensed data was downloaded in the form of a netcdf file which had to be converted to a pandas data frame to be combined with the robomussel data frame. Temperatures were converted to Celsius and units for wind were converted to kilometers per hour to be used in the wind chill calculation.

Robomussels allow for low-labor continuous monitoring of mussel body temperatures. However, many sites have limited robomussel data, especially prior to the mid-2000s. Maintenance of the robomussels also leads to large data gaps which inhibits researcher’s abilities to assess how mussel body temperatures have changed and predict how a continuing warming climate will impact mussel body temperatures. Remote-sensed data has the potential to fill this data gap going back at least 30+ years if it can be reliably used to model mussel body temperatures.

Project Goal

The goal of this project was to use remote-sensed physical data to develop a linear model of robomussel body temperatures. Initially remote-sensed sea surface temperature was used in the model, however after evaluating remote-sensed sea surface temperature against on-site sea surface temperature loggers the remote-sensed data was deemed to be too divergent from the real conditions at the site. A linear model using remote-sensed sea surface temperature did not accurately predict on-site sea surface temperatures (coefficients of determination ~ -2). On-site sea surface temperature is still much more available than robomussel temperature. Using on-site sea surface temperature data in the model instead of remote-sensed data still improves overall knowledge of robomussel body temperatures.

The Model

A major drawback of using different sources of physical data in a model is the issue of collinearity. It is expected that air temperature, dewpoint temperature, and ocean temperature would all have some degree of collinearity. To assess this collinearity the variance_inflation_factor (VIF) from the statsmodels package was utilized. This method requires a constant in either the first or last column which was added to the data frame using add constant from the statsmodels package. A wind chill parameter was used in an intermediate step to calculate the windchilltemp from wind speed and air temperature. Analyzing the parameters using the VIF gives the following results where t2m = mean daily air temperature, mean = mean daily on-site sea surface temperature, u10 = mean daily wind speed, d2m= mean daily dewpoint temperature, Temperature = average daily robomussel temperature, windchilltemp= difference between windchill and t2m:

A VIF of 1 indicates that there is no correlation between a independent variable and any of the other variables. VIF between 1 and 5 indicates moderate correlation but the data frame does not necessarily require corrective measures. VIFs greater than 5 represent critical levels of multicollinearity where coefficients and p-values are unreliable. In this case t2m and d2m both have unacceptable VIFs. To attempt to remedy this, t2m was removed from the data frame because it had a low coefficient after modeling which results in the VIF values becoming much more acceptable.

Minimum, Average, and maximum daily robomussel temperatures were the outcome variables used in the model. The daily nature of this data was matched with the hourly data among the remote-sensed data by grouping remote-sensed values by each day. Different combinations of maximum, minimum, and median values for parameters were used in the model. A standard scaler was utilized to transform the data to compare coefficients. Ridge regression was used as the linear model which further corrects for multicollinearity with a test size of 30%.

Results

The model performed best in predicting minimum daily robomussel temperatures. Using only maximum daily dewpoint temperature and mean daily on-site sea surface temperature data provided a coefficient of determination of 0.67, coefficients for mean sea surface temperature and max dewpoint temperatures of 0.55566499 and 0.33200815, respectively, and a model mean squared error of 1.02.

Plot of y_pred vs y_test for minimum robomussel temperatures

Average and maximum robomussel daily temperatures were not successfully modeled. Coefficients of determinations in both cases were negative, although average robomussel temperatures had less negative coefficients of determination. Varying the exact parameters between minimum values of wind speed to maximum values of air temperature did not significantly improve the model performance. Higher regularization or alpha values slightly improved the performance of the model but only to the extent that the coefficient of determination reached 0.0 in the case of the average mussel body temperature. Utilizing a different linear model other than ridge regression was not performed due to the collinearity in the dataset. Should another model perform better the resulting coefficients would not be necessarily accurate.

Plot of y_pred vs y_test for average robomussel temperatures
Plot of y_pred vs y_test for max robomussel temperatures

Although there is a relationship in both the average and maximum robomussel plots of y_pred vs y_test, within the model this relationship does not help to predict robomussel body temperatures.

Conclusions

Minimum robomussel body temperatures were successfully modeled likely due to the fact that these minimum temperatures would be expected to be similar to the sea surface temperatures, but in actuality they were most correlated with maximum dewpoint temperatures. Both average and maximum robomussel temperatures could not be modeled likely due to the need to include a radiative analysis. Including an analysis of longwave radiation, specific heat capacity, and mussel body mass converted to a temperature reading might be able to fill this gap. Remote sensed data offers continuous data with coverage across the world but is not reliable for small scale measurements. This drawback is shown even in the attempt to model on-site sea surface temperatures using the OISST dataset. Despite the fact the OISST dataset is the highest resolution sea surface dataset available, using remote-sensed data as well as buoys and other marine observations, it is not a good approximation of the actual intertidal temperatures. With that said, the relative success of modeling minimum mussel body temperatures may still be a useful metric for scientists. Although maximum or averages would give the best picture of physiological stress in the intertidal, minimum temperatures can still reveal trends of time and potentially point towards sub-lethal heat stresses.