Function Repository Resource:

# IrregularPeriodogram

Compute a periodogram for data from unevenly spaced intervals

Contributed by: Daniel Lichtblau
 ResourceFunction["IrregularPeriodogram"][w,times,vals] computes the irregular periodogram at frequency w for the given set of times and vals. ResourceFunction["IrregularPeriodogram"][w,mat] computes the irregular periodogram for the set of time and value pairs in the two-column matrix mat.

## Details and Options

ResourceFunction["IrregularPeriodogram"] accepts a Method option.
Allowed Method values are "Deeming", "LombScargle", "LaflerKinman" and "StringLength", where the last two are equivalent.
The Deeming periodogram for times {t1,,tn} and corresponding values {v1,,vn} is given by .
Unlike the classical case, the Lomb-Scargle periodogram here does not normalize by the variance of the data values.
Strictly speaking, the "LaflerKinman" setting will not give a periodogram computation. Instead of computing a value based on a given frequency, it computes directly based on a putative period p. Small values indicate a short total length when folding by p along the time axis (that is, sorting the times modulo p and approximating the total length between consecutive time-folded points).

## Examples

### Basic Examples (10)

Take random times and corresponding values for the function f(t)=sin(t)+cos(3t)/2:

 In[1]:=

Subtract the mean of the data values and plot the resulting time series:

 In[2]:=
 Out[2]=

Plot the periodogram computed from this unevenly spaced set of measurements:

 In[3]:=
 Out[3]=

Compute the larger frequency (the location of the larger spike):

 In[4]:=
 Out[4]=

Compute the period from this frequency:

 In[5]:=
 Out[5]=

Plot the time series "folded" by this period (times are computed modulo the computed period) to see the approximately sinusoidal variation in the light curve:

 In[6]:=
 In[7]:=
 Out[7]=

The string-length approach also shows this periodicity:

 In[8]:=
 Out[8]=

One can get a fairly good estimate of the frequencies present using far fewer samples, so long as they are irregularly spaced in time. Take 20 random points from the preceding original set:

 In[9]:=
 Out[9]=

Plot the periodogram from this subset:

 In[10]:=
 Out[10]=

Estimate the period for the larger frequency:

 In[11]:=
 Out[11]=

### Scope (2)

Use a light curve time series from the Wolfram Demonstration Cepheid Variable Star Light Curve Analysis:

 In[12]:=

Subtract the mean of the data values and plot the resulting time series:

 In[13]:=
 Out[13]=

Plot the periodogram computed from this unevenly spaced set of measurements:

 In[14]:=
 Out[14]=

Compute the main frequency as the location of the larger spike:

 In[15]:=
 Out[15]=

Compute the period from this frequency:

 In[16]:=
 Out[16]=

Plot the time series "folded" by this period (times are computed modulo the computed period) to see the approximately sinusoidal variation in the light curve:

 In[17]:=
 Out[17]=

Plot the Lomb-Scargle periodogram for the same data:

 In[18]:=
 Out[18]=

The computed main frequency and period agree with the values from the Deeming periodogram:

 In[19]:=
 Out[19]=

Plot the string-length values for the same data:

 In[20]:=
 Out[20]=

Compute the best estimate for the period based on the minimum value attained by the Lafler-Kinman computation for this data:

 In[21]:=
 Out[21]=

A periodogram can often detect multiple frequencies provided they are well separated, even if they are not commensurate (that is, there is no actual “period” for the data). Extract values from such a function with frequencies at :

 In[22]:=
 Out[22]=

Plot the periodogram for this data:

 In[23]:=
 Out[23]=

There is one estimated frequency around midway between 1 and :

 In[24]:=
 Out[24]=

Another is approximately π:

 In[25]:=
 Out[25]=

A third is close to :

 In[26]:=
 Out[26]=

Rough estimates can be made with far fewer sampled values:

 In[27]:=
 Out[27]=

Peaks of the periodogram give estimates of the frequencies:

 In[28]:=
 Out[28]=

### Properties & Relations (2)

When the data is equally spaced on the horizontal axis, the irregular periodogram is essentially the square of the Fourier transform. Modify a previous example to demonstrate this:

 In[29]:=

Subtract the mean of the data values and plot the resulting time series:

 In[30]:=
 Out[30]=

Plot the periodogram computed from this (evenly-) spaced set of measurements:

 In[31]:=
 Out[31]=

The square of the Fourier transform gives a similar result once frequencies are correctly rescaled on the horizontal axis:

 In[32]:=
 Out[32]=

In the Wolfram Language, this can also be computed using PeriodogramArray:

 In[33]:=
 Out[33]=

Yet another way to obtain these absolute values is with the Fourier transform of the set of values convolved with itself:

 In[34]:=
 Out[34]=

The Lafler-Kinman string-lengths-based measure can be made to more closely resemble an ordinary periodogram using reciprocals. Use a basic example:

 In[35]:=
 Out[35]=

Now again show the Lafler-Kinman plot using this data:

 In[36]:=
 Out[36]=

It is possible to go from time to frequency using the formula f=2π/t, and likewise go from small to large on the vertical axis by taking reciprocals:

 In[37]:=

Generate and plot a list of {foldtime,magnitude} pairs:

 In[38]:=
 Out[38]=

Now do the transformation and replot:

 In[39]:=
 Out[39]=

The main spike coincides with that of the Deeming periodogram. There is a small bump in the Deeming periodogram near frequency 3. For the Lafler-Kinman version, this corresponds to a fold time near 2.1:

 In[40]:=
 Out[40]=

Compute a lot of values in the vicinity of this folding time:

 In[41]:=
 Out[41]=

### Applications (43)

#### Sunspot activity cycles (10)

Sunspot activity follows an approximately 11-year cycle. Download a century of monthly averaged values from the Wolfram Data Repository:

 In[42]:=

Show the plot for this time series:

 In[43]:=
 Out[43]=

Obtain the times and mean-centered values:

 In[44]:=

Plot the Deeming periodogram for this time series:

 In[45]:=
 Out[45]=

Compute the maximum frequency and the corresponding estimated period in months:

 In[46]:=
 Out[46]=

Plot the Lomb-Scargle periodogram for this time series:

 In[47]:=
 Out[47]=

Compute the maximum frequency and the corresponding estimated period in months:

 In[48]:=
 Out[48]=

Plot a list of string lengths versus folding period for this time series:

 In[49]:=
 Out[49]=

Compute the estimated period from the low point position:

 In[50]:=
 Out[50]=

All three estimates are in the ballpark of 10.5 years. Plot the time series epoch-folded by the second and third period estimates, as well as by the conventional 11 years:

 In[51]:=
 Out[51]=

#### Another cepheid variable star analysis (6)

Download and process times and values of the light curve for the variable star known as cep2308:

 In[52]:=

Show a plot of measured amplitudes over time:

 In[53]:=
 Out[53]=

Plot the Deeming periodogram and compute the corresponding estimated period:

 In[54]:=
 Out[54]=
 In[55]:=
 Out[55]=

Plot the Lomb-Scargle periodogram and compute the corresponding estimated period:

 In[56]:=
 Out[56]=
 In[57]:=
 Out[57]=

Plot the Lafler-Kinman folded epoch total string lengths and compute the corresponding estimated period:

 In[58]:=
 Out[58]=
 In[59]:=
 Out[59]=

The period estimates all agree to several places. Plot the time series folded by epoch using the second one:

 In[60]:=
 Out[60]=

#### Searching for periodicity in a chromosome (12)

Coding sections of genes are well known to exhibit a periodicity of three base pairs (bp). Often there are other periodicities. There is a method for finding them, using chromosome 12 from Saccharomyces cerevisiae (baker’s yeast) as an example:

 In[61]:=

Literature on the subject states that it has a periodicity in the 10–11 range when looking at tetramers of the form A*T*, where the star here denotes zero or more occurrences of the preceding nucleotide (some specific values reported are 10, 10.2, 10.4 and 10.5). Split the chromosome into segments of 1000 bp and locate positions of the 4-mers "AAAA", "AAAT", "AATT", "ATTT" and "TTTT":

 In[62]:=

While there are various ways to turn a gene sequence into a numerical sequence, here the irregular periodogram is instead employed on common gaps between the positions found:

 In[63]:=

Smaller counts are discarded if there are more than 500. They are further decimated by removing all gap counts that are smaller than at least one of their 19 subsequent neighbors:

 In[64]:=

Plot the number of gaps as a function of gap size:

 In[65]:=
 Out[65]=

Treat gap lengths as the "time" variable and gap counts as the dependent variable:

 In[66]:=

A periodicity of 3 is already known and larger ones are sought, so plot up to a frequency range that will only give periods of at least 4:

 In[67]:=
 Out[67]=

Isolate the large frequency near 0.6 to find a plausible periodicity value:

 In[68]:=
 Out[68]=

Homing in near 0.6 in the plot will show that there is a double peak, with the slightly larger one giving rise to a periodicity less than 10. This may be due to an interaction between the periodicity of 3 and the larger one. An attempt can be made to curtail this effect by repeating the periodogram computations on data that has had gaps that are multiples of 3 removed:

 In[69]:=
 Out[69]=

Home in near 0.6:

 In[70]:=
 Out[70]=

Recompute the period estimate:

 In[71]:=
 Out[71]=

This is in line with values reported in the literature. A string-length plot and optimization gives a value in the same vicinity:

 In[72]:=
 Out[72]=
 In[73]:=
 Out[73]=

Different settings will give slightly different estimates; for example, with sublengths set to 5000 and upper set to 2000, the estimate is around 10.28. So the method here is viable but only to low precision.

#### Breaking a Vigenère cipher (15)

The Vigenère cipher is attacked from the Rosetta Code text example:

 In[74]:=

Split the text into trigrams and find the positions of each, retaining all that appear at least twice:

 In[75]:=
 Out[75]=

Find distances separating each given triad, and also retain the number of times a given distance appears (these lists will be used to assess the length of the key phrase):

 In[76]:=
 Out[76]=

Use a periodogram to guess the length of the key phrase:

 In[77]:=
 Out[77]=

Find the period corresponding to the highest peak:

 In[78]:=
 Out[78]=

Find the period corresponding to the highest peak at lower frequency:

 In[79]:=
 Out[79]=

The key length guess of 14 is used because, in the worst-case scenario, it simply means a key phrase that is repeated is derived. First, list the most frequent letters in English in order of descending frequencies:

 In[80]:=

A "zero" value is also needed for character codes, so as to translate "A" to have the value 1:

 In[81]:=

A scoring heuristic is required to guess what the Caesar cipher shift is for each of the 14 letter subsets. It is based on numbers and positions of the most frequent letters in each subset that would, given a particular shift, map to letters that actually have the highest frequencies in English:

 In[82]:=

Code is also required to translate (decode) characters given a shift value:

 In[83]:=

Break the coded text into 14 distinct subsets; each will be treated as a Caesar cipher encoding. Find the most commonly appearing letters in each subset in order of descending frequency:

 In[84]:=
 Out[84]=

Find the top-scoring shift translation for each of the 14 cases:

 In[85]:=
 Out[85]=

Compute the corresponding shifts:

 In[86]:=
 Out[86]=

Use the shifts found to translate the key (this is not actually necessary for translating the encoded message):

 In[87]:=
 Out[87]=

Translate the encoded message:

 In[88]:=
 Out[88]=

## Version History

• 3.0.0 – 07 August 2020
• 2.0.0 – 11 November 2019
• 1.0.0 – 22 October 2019