2.1. Data
The approach that is presented in this study has been implemented and tested on actual CyGNSS L1 data, which are the uncalibrated delay-Doppler maps,
, power measurements at different delay, and Doppler shifts obtained with the receiver on-board CyGNSS low earth orbiters at 1 or 0.5 s integration. The data are obtained from CYGNSS Level 1 Science Data Record Version 2.1 netCDF files, ID PODAAC-CYGNS-L1X21, variable name
raw_counts ‘DDM bin raw counts’ [
40]. Further details regarding the DDMs and their geophysical content can be obtained in e.g., [
2,
3].
The study covers the periods 9–29 September 2018 and 26 August to 6 September 2019, during hurricane seasons. Only observables that pass different quality flags that are provided in the data set are considered. The study focuses on the capability of this method to retrieve high winds, so it only includes tracks for which some samples present winds above 20 m/s, according to the European Centre for Medium-range Weather Forecast/Copernicus Climate Change Service (ECMWF/C3S) ERA5 reanalysis (ECMWF Reanalysis 5th Generation) [
41]. We also filter out of the analysis tracks of data shorter than 600 samples.
Table 1 compiles the list of selection criteria, and
Section 2.2 describes the reasons.
Table 2 compiles the main tropical cyclones for which valid CyGNSS data have been found during the analyzed periods, thus included in this study. We consider that a CyGNSS track covers a given cyclone when it reaches the area within the 34 kn (17.5 m/s) wind radii maximum extent, as provided in the International Best Track Archive for Climate Stewardship (IBTrACS) products�[
42], available at
https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/. This source provides four values of the 34�kn radii, one for each cardinal direction. We�have averaged them in order to provide an approximate indication of whether a CyGNSS track reaches the cyclone area or�not.
The methodology is based on a variational retrieval with respect to a background wind field. The ECMWF/C3S ERA5 reanalysis wind fields [
41], at 25 km and 1 h spatio-temporal resolution, have been used, interpolating to the location of the GNSS-R specular point.
For validation purposes, CyGNSS tracks are co-located with other spaceborne sensors: ASCAT-A/B C-band wind scatterometer measurements e.g., [
43], and SMOS e.g., [
44] and SMAP e.g., [
45] L-band radiometric wind retrievals. We have used the products, as provided at PODAAC (
https://podaac-tools.jpl.nasa.gov/drive/files/allData/ascat/), IFREMER (
ftp://eftp1.ifremer.fr) and REMSS (
ftp://ftp.remss.com), respectively. Measurements are considered to be co-located when occurring within a grid cell of 0.25
0.25
, and with a maximum time delay of 1 h.
Table 3 summarizes the number of L1 observations co-located with other spaceborne wind speed measurements, after filtering out by quality and selection criteria (
Table 1).
For illustration purposes, two wind speed images that were retrieved from Sentinel-1A and -1B C-band SARs have also been used. They have been obtained from
https://cyclobs.ifremer.fr e.g., [
46].
Finally, the rain rates, as provided in the Integrated Multi-satellitE Retrievals for GPM (IMERG) products, have been used to assess potential residual rain-induced effects. The IMERG final run datasets, GPM_3IMERGHH V06, provide multi-satellite precipitation estimation from passive microwave (PMW) sensors combined with zenith-angle-corrected, inter-calibrated merged geostationary infrared (IR) observations, and adjusted with surface precipitation data [
47]. The half-hourly precipitation estimates have a resolution of 0.1
0.1
and they are available over the globe from
https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGHH_06.
2.2. Retrieval Approach
The variational approach to data assimilation is usually formulated as a minimization problem of the cost function
where
denotes the measured observables;
x the model state variables;
the background values of the model state variables; and,
H denotes a forward operator in order to reproduce the observables based on the model state. The matrices
B,
E, and
F are covariance matrices, describing the assumed uncertainties in the background model, the measurements, and the forward operator, respectively. For a given type of observables (e.g., CyGNSS level-1 data), the operator
H and the measurements and operator covariances must be developed. An additional term can also be included, in order to constrain the smoothness and the dynamics (Laplacian, divergence, and vorticity) [
31]. In this exercise, where the observables are not assimilated into a model yet, only the second term will be minimized, the measurement covariance matrix is assumed to be the identity, the forward model covariance is not used, and the wind speed is the sole parameter of the state under consideration,
x, becoming our retrieved parameter.
Table A1 in
Appendix A compiles and explains the symbols that are used in this Section, how they map to CyGNSS and NWP variables, and the particularities of our implementation.
The observables are based on level-1 CyGNSS delay Doppler Maps (DDM), normalized with the floor noise. After normalization, only the peak value is used in the retrievals, in the form of signal peak to noise: . The range of values in the processed data set is 0.1 to 152, with a mean value of 2.7 and 50% of the values between 1.2 and 2.8. The spatial resolution (pixel size) that is associated to this peak signal to noise ratio is better than ∼20 km, with a spatial sampling of ∼7 km [∼3.5 km], a given by the displacement of the specular point in the one second [half second] interval between subsequent observables.
The forward operator
H is an implementation of the GNSS bistatic radar equation [
2] as an open source package, called ‘wavpy’ [
48]. ‘Wavpy’ is a library for GNSS-R data analysis and modelling that follows an object oriented approach, with tools to cover multiple aspects of GNSS-R, ranging from just computing the reflectivity parameters for an specific type of surface and incidence angle, to a complete simulation of a GNSS-R scenario. It is coded in C++ and Fortran90 but also includes a python envelop so it can function as a python package. The ’wavpy’ classes that are used in this implementation can generate simulated DDM based on a set of input parameters in order to account for the geometry, the instrumental performance (noise figures, transmitted powers, antenna patterns, etc), and surface properties (permittivities, roughness/wind). The flexibility of ‘wavpy’ enables using, as input, either the wind speed, the wind vector, isotropic or anisotropic mean square slopes of the surface, parameters of an analytical wave spectrum, or a discretized wave spectrum. For simplicity, this study has used wind speed as single surface roughness input and unknown to be retrieved. The geometry and the instrumental values (e.g sampling, DDM resolution, antenna gain pattern) are provided by the metadata and CyGNSS mission documentation [
40]. Because the current implementation of ‘wavpy’ only uses the diffuse scattering component, we expect that wind speeds below 5 m/s might not be properly modelled. This could be avoided if the coherent component modelled in [
49] was added to ‘wavpy’. Besides the intrinsic uncertainties, our model implementation has two additional deficiencies. First, we do not take the actual values of the ratio of the gain to noise system temperature into account. Second, we do not use actual values of the transmitted power and antenna pattern. Note that these are multiplicative deficiencies of much longer scale variability than the fluctuations expected in the wind fields. The mispointing of the antenna pattern due to changes in the platform’s attitude is not considered, and we have assessed that it also presents slow-varying effects along the track.
To reduce this deficiency, the first step before the variational retrieval is to calibrate the observed power, so it statistically aligns with the model. In our implementation, this is done track-wise as a pre-processing step, but it could be included in the variational approach to be simultaneously solved to the retrievals (e.g., in a data assimilation scheme). The calibrated power is then defined as
where
is a slowly varying function of the longitude of the specular point,
, and
p is obtained as the weighted polynomial fit of the ratios
r
with
and the weights equal to 1 when the wind in the background model lays between 5 and 25 m/s, and zero otherwise. This range is selected in order to avoid the limitations in our forward model (low winds) and the background model (high winds). Both first and second order polynomials have been tested, with better results being obtained with the linear fit (the only results shown hereafter). In this study, the a-priori wind fields,
, have been extracted from ECWMF C3S ERA5 re-analysis [
41]. This step is applied to each CyGNSS track independently. Furthermore, to guarantee that the slowly varying calibration function
does not absorb actual wind fluctuations, only tracks longer than 600 samples are used in the analysis (a few thousand km length). We also define a quality parameter for each track, as the ratio between the number of samples in the range 5–25 m/s to the total number of the samples in the track (fraction of samples used for the alignment). Tracks for which the alignment used less than 20% of the samples are disregarded.
Table 1 compiles the criteria for CyGNSS data selection. After calibration, the calibrated observables used for the retrievals (thus, our
in Equation (
1)) are generated, as in (
2).
The forward model is linearized around the a-priori wind field
. If
, with
small, then
The minimum of the second term of Equation (
1), when identity covariances are assumed, reduces to the
. If
, then it can be written as the inverse model
where
is the numerical derivative of the forward operator around the wind speed provided by the background model
.
A step-wise summary of the retrieval algorithm, with the particularities of our implementation is:
To reject data according to the criteria in
Table 1.
To compute the signal peak to noise levels from the raw_counts DDM variables in CyGNSS data.
To run the simulated values of the DDM, using ’wavpy’ fed with the ERA5 wind speed values interpolated to the CyGNSS observations: .
To compute the individual ratios r between modelled and observed S.
To fit a polynomial (linear) fit to .
To calibrate the observables multiplying with their corresponding polynomial value: .
To compute the numerical derivative of the model around the ERA5 wind speed value, .
To retrieve the wind speed that minimizes the simplified version of Equation (
1), following Equation (
5):
.
It is known that NWP models tend to underestimate the high winds e.g., [
34,
35]. The question is whether this calibration and variational scheme, which is tied to a NWP model, prevents the retrievals to sense high winds, dragged by the a-priori NWP fields used for calibration and variational approaches. We are interested in this particular question and, for this reason, we have only analyzed CyGNSS tracks for which the selected background NWP model (ERA5) presents some winds above 20 m/s.