Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

DataReshapers

Guides

  • Data reshaping functions

Tech Notes

  • Data transformation workflows
  • Long form data transformation
  • Wide form data transformation

Symbols

  • CrossTabulate
  • CrossTabulationMatrixQ
  • CrossTensorate
  • CrossTensorateSplit
  • DatasetToMatrix
  • GridTableForm
  • LongFormDataset
  • PivotLonger
  • RecordsSummary
  • RecordsToLongFormDataset
  • RecordsToWideFormDataset
  • SeparateColumn
  • ToAutomaticKeysAssociation
  • TypeOfDataToBeReshaped
  • WideFormDataset
Long form data transformation
Introduction
Long form applications
What kind of data?
Long form vs sparse array
What workflows considered?
References
Long form in more detail
​
Introduction
In this notebook we consider in more detail the (tabular) data transformation operations Long form.
Remark: This notebook is an expanded version of the notebook used in the Wolfram U session [AAv2].
Remark: We use several Wolfram Function Repository (WFR) functions:
ExampleDataset
and
RandomTabularDataset
.
Load the paclet
In[34]:=
Needs["AntonAntonov`DataReshapers`"]
What kind of data?
We concentrate on using data frames (datasets).
More precisely tabular data and simple collections of tabular data.
I.e. in WL -- datasets or lists or associations of datasets.
What workflows considered?
Here is a flow chart that shows the targeted workflows:
In[35]:=
plWorkflows=ImageCrop@Import["https://github.com/antononcube/ConversationalAgents/raw/master/ConceptualDiagrams/Tabular-data-transformation-workflows.jpg"]
Out[35]=
Long form in more detail
Remark: This “Illustrating example 2” from the tech note
Data transformation workflows
. (Also, session [AAv1].)
In[1]:=

Column names as data

Take a dataset with time series specified through separate columns and convert into easier to query dataset:
Get the Lake Mead levels data
In[36]:=
ExampleData[{"Statistics","LakeMeadLevels"},"LongDescription"]
Out[36]=
Elevation of Lake Mead at Hoover Dam, in feet, in years 1935-2009. The observation on January of 1935 was missing.
Get the Lake Mead levels dataset
In[37]:=
dsLakeData=ResourceFunction["ExampleDataset"][{"Statistics","LakeMeadLevels"}]
Out[37]=
Year
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
1935
0.0
708.7
701.7
752.4
806.6
909.1
928.4
925.9
920.8
1936
907.9
908.4
906.9
922.2
982.4
1015.5
1020.4
1024.4
1024.6
1937
1022.2
1026.2
1031.0
1044.6
1078.7
1096.6
1102.8
1099.6
1097.6
1938
1095.0
1094.85
1100.2
1109.2
1134.4
1165.6
1173.5
1171.95
1173.8
1939
1165.4
1157.4
1158.2
1162.9
1176.2
1183.45
1181.3
1177.35
1178.95
1940
1166.9
1166.0
1164.65
1164.85
1175.35
1182.15
1179.8
1176.1
1174.45
1941
1167.1
1167.55
1170.35
1175.85
1201.25
1215.55
1220.4
1216.5
1209.85
1942
1184.8
1176.75
1171.25
1182.9
1195.5
1213.15
1213.2
1210.4
1205.0
1943
1182.45
1179.6
1177.0
1180.6
1188.75
1199.4
1201.55
1199.65
1195.7
1944
1170.9
1164.2
1158.4
1158.1
1173.2
1194.85
1199.55
1193.9
1187.1
1945
1161.25
1155.75
1149.6
1147.4
1161.3
1174.55
1180.7
1177.95
1177.95
1946
1157.55
1151.65
1147.45
1148.2
1155.15
1163.95
1163.3
1161.25
1157.65
1947
1142.6
1133.1
1135.38
1134.49
1151.93
1169.64
1177.18
1180.13
1178.02
1948
1164.52
1158.8
1154.51
1158.77
1175.95
1192.15
1190.32
1186.95
1180.82
1949
1158.04
1150.99
1147.17
1148.28
1164.51
1189.07
1196.61
1193.17
1186.86
1950
1163.4
1156.7
1151.75
1152.13
1158.57
1173.95
1177.15
1173.36
1168.77
1951
1151.47
1168.02
1144.07
1141.34
1146.16
1163.53
1168.19
1167.0
1163.75
1952
1149.06
1143.0
1133.99
1141.96
1171.53
1198.06
1199.88
1195.74
1190.0
1953
1162.56
1157.12
1152.42
1148.4
1147.33
1165.11
1165.87
1163.38
1157.67
1954
1141.44
1138.22
1134.08
1130.26
1130.76
1129.88
1126.25
1120.82
1115.39
rows 1–20 of
75
columns 1–10 of
13
Convert to long form
In[38]:=
dsLong=
LongFormDataset
[dsLakeData,"Year","VariablesTo""Month","ValuesTo""Elevation"]
Out[38]=
Year
Month
Elevation
1935
Apr
752.4
1935
Aug
925.9
1935
Dec
908.4
1935
Feb
708.7
1935
Jan
0.0
1935
Jul
928.4
1935
Jun
909.1
1935
Mar
701.7
1935
May
806.6
1935
Nov
908.3
1935
Oct
914.9
1935
Sep
920.8
1936
Apr
922.2
1936
Aug
1024.4
1936
Dec
1023.5
1936
Feb
908.4
1936
Jan
907.9
1936
Jul
1020.4
1936
Jun
1015.5
1936
Mar
906.9
rows 1–20 of
900
Modify month from string value to integer value
In[39]:=
dsLong=dsLong[All,Join[#,"Month"DateList[{#Month,{"MonthShort"}}]〚2〛]&]
Out[39]=
Year
Month
Elevation
1935
4
752.4
1935
8
925.9
1935
12
908.4
1935
2
708.7
1935
1
0.0
1935
7
928.4
1935
6
909.1
1935
3
701.7
1935
5
806.6
1935
11
908.3
1935
10
914.9
1935
9
920.8
1936
4
922.2
1936
8
1024.4
1936
12
1023.5
1936
2
908.4
1936
1
907.9
1936
7
1020.4
1936
6
1015.5
1936
3
906.9
rows 1–20 of
900
Group by year and show short form of the result
For each year find the corresponding time series (of lake levels)
Show a sample of the time series found above

Calendar time series

The previous time series are relative in time. What if we want to do calendar time series?
First we have to modify the month column to be "ObservationTime":
Split the long form dataset by year and make time series with the columns "ObservationTime" and "Elevation":
Plot the obtained time series:

Alternative solution without long form

(This sub-section was not discussed in Wolfram U recording of the “Q&A Introduction” session. It is given in this notebook in order to make a more convenient reference.)
We can apply the Split-Transform-Combine pattern directly without using long form:
Note that:
◼
  • In the transformation above we implicitly rely on the sorted order of the month name columns.
  • ◼
  • Using Rest and First in the third argument given to GroupBy corresponds to the transformations to get the long form.
  • Long form applications
    Many applications of Long form transformation come from the fact that metadata is converted to data
    In[6]:=

    Combinations of the heterogenous data

    Using long forms makes easier the programmatic manipulation of heterogenous data.
    Get datasets with different number of rows and columns that correspond to items and variables of different kinds:
    Convert all datasets into long form datasets:
    For each long form dataset change the automatic key column to include dataset's name:
    Join all long form datasets into one dataset and show a sample:
    Long form vs sparse array
    Sparse matrices (sparse arrays) have representation very similar to that of long form. Here is a random sparse matrix:
    Generate random (sparse) matrix
    Matrix form of the random matrix
    Matrix plot of the random matrix
    Compare the array rules of that sparse matrix with its long form representation.
    Here are the array rules:
    Here is tabulation of the array rules:
    Here is the long form representation:
    Sort the long form dataset:
    Cross tabulate the long form
    Matrix form of the corresponding sparse matrix:
    Here is comparison of the representations of sparse arrays vs long form:
    References

    © 2025 Wolfram. All rights reserved.

    • Legal & Privacy Policy
    • Contact Us
    • WolframAlpha.com
    • WolframCloud.com