Basic Examples
Take random times and corresponding values for the function f(t)=sin(t)+cos(3t)/2:
Subtract the mean of the data values and plot the resulting time series:
Plot the periodogram computed from this unevenly spaced set of measurements:
Compute the larger frequency (the location of the larger spike):
Compute the period from this frequency:
Plot the time series “folded” by this period (times are computed modulo the computed period) to see the approximately sinusoidal variation in the light curve:
The string-length approach also shows this periodicity:
One can get a fairly good estimate of the frequencies present using far fewer samples, so long as they are irregularly spaced in time. We take 20 random points from the original set above:
Plot the periodogram from this subset:
Estimate the period for the larger frequency:
Scope
We use a light curve time series from the Wolfram Demonstration “CepheidVariableStarLightCurve”:
Subtract the mean of the data values and plot the resulting time series:
Plot the periodogram computed from this unevenly spaced set of measurements:
Compute the main frequency as the location of the larger spike:
Compute the period from this frequency:
Plot the time series “folded” by this period (times are computed modulo the computed period) to see the approximately sinusoidal variation in the light curve:
Plot the Lomb-Scargle periodogram for the same data:
The computed main frequency and period agree with the values from the Deeming periodogram:
Plot the string-length values for the same data:
Compute the best estimate for the period based on the minimum value attained by the Lafler-Kinman computation for this data:
A periodogram can often detect multiple frequencies provided they are well separated, even if they are not commensurate (that is, there is no actual “period” for the data). We extract values from such a function with frequencies at :
Plot the periodogram for this data:
We get one estimated frequency around midway between 1 and :
Another is approximately π:
A third is close to :
We can get rough estimates with far fewer sampled values:
Peaks of the periodogram give estimates of the frequencies:
Properties & Relations
When the data is equally spaced on the horizontal axis, the irregular periodogram is essentially the square of the Fourier transform; we modify a previous example to demonstrate this:
Subtract the mean of the data values and plot the resulting time series:
Plot the periodogram computed from this (evenly) spaced set of measurements:
The square of the Fourier transform gives a similar result once we properly scale as frequencies on the horizontal axis:
In the Wolfram Language this can also be computed using PeriodogramArray:
Yet another way to obtain these absolute values is with the Fourier transform of the set of values convolved with itself:
The Lafler-Kinman string-lengths-based measure can be made to more closely resemble an ordinary periodogram using reciprocals. We use a basic example:
Now again show the lafler-Kinman plot using this data:
We go from time to frequency using the formula f=2π/t, and likewise go from small to large on the vertical axis by taking reciprocals:
Generate and plot a list of {foldtime,magnitude} pairs:
Now do the transformation and replot:
We see that the main spike coincides with that of the Deeming periodogram. We also see a small bump in the Deeming periodogram near frequency 3. For the Lafler-Kinman version this corresponds to a fold time near 2.1:
Compute a lot of values in the vicinity of this floding time:
Applications
Sunspot activity cycles
Sunspot activity follows an approximately 11 year cycle. Download a century of monthly averaged values from the Wolfram Data Repository:
Show the plot for this time series:
Obtain the times and mean-centered values:
Plot the Deeming periodogram for this time series:
Compute the maximum frequency and the corresponding estimated period in months:
Plot the Lomb-Scargle periodogram for this time series:
Compute the maximum frequency and the corresponding estimated period in months:
Plot a list of string-lengths vs. folding period for this time series:
Compute the estimated period from the low point position:
All three estimates are in the ballpark of 10.5 years. We plot the time series epoch-folded by the second and third period estimates, as well as by the conventional 11 years:
Another cepheid variable star analysis
Download and process times and values of the light curve for the variable star known as cep2308:
Show a plot of measured amplitudes over time:
ListPlot[Transpose[{times2308, values2308}],AxesLabel→{"time (days)","amplitude fluctuation"},ImageSize→400]
Plot the Deeming periodogram and compute the corresponding estimated period:
Plot the Lomb-Scargle periodogram and compute the corresponding estimated period:
Plot the Lafler-Kinman folded epoch total string lengths and compute the corresponding estimated period:
The period estimates all agree to several places. We plot the time series folded by epoch using the second one:
Searching for periodicity in a chromosome
Coding sections of genes are well known to exhibit a periodicity of 3 base pairs (bp). Often there are other periodicities. We show a method for finding them, using chromosome 12 from Saccharomyces cerevisiae (baker’s yeast) as an example:
Literature on the subject states that it has a periodicity in the 10-11 range, when looking at tetramers of the form A*T*, where the star here denotes zero or more occurrances of the preceding nucleotide (some specific values reported are 10, 10.2, 10.4, and 10.5). Split the chromosome into segments of 1000 bp, and locate positions of the 4-mers “AAAA”, ”AAAT”, ”AATT”, ”ATTT” and ”TTTT”:
While there are various ways to turn a gene sequence into a numerical sequence, we instead employ the irregular periodogram on common gaps between the positions we found:
We discard smaller counts if there are more than 500. We further decimate by removing all gap counts that are smaller than at least one of their 19 subsequent neighbors:
Plot number of gaps as a function of gap size:
Treat gap lengths as the “time” variable and gap counts as the dependent variable:
We already know of a periodicity of 3 and seek larger ones, so we plot up to a frequency range that will only give periods of at least 4:
Isolate the large frequency near 0.6 to find a plausible periodicity value:
Homing in near 0.6 in the plot will show that there is a double peak, with the slightly larger one giving rise to a periodicity less than 10. This may be due to an interaction between the periodicity of 3 and the larger one. We can attempt to curtail this effect by repeating the periodogram computations on data that has had gaps that are multiples of 3 removed:
Home in near 0.6:
Recompute the period estimate:
This is in line with values reported in the literature. A string-length plot and optimization gives a value in the same vicinity:
Different settings will give slightly different estimates, for example, with sublengths set to 5000 and upper set to 2000 the estimate is around 10.28. So the method here is viable but only to low precision.