Preprocessing data
The package offers preprocessing procedures.
Downsampling data: by grouping data back blocks of fixed size (resolution) and applying the mean function to each block.
Normalization X,Y,Z coordinates: normalizing the coordinates to the range [0,1] by subtracting the minimum and dividing by the maximum.
Standardization of V values: standardizing the V values by subtracting the mean and dividing by the standard deviation.
Downsampling
The downsampling procedure is implemented as taking the average of a block of given resolution. Can be easily customized.
Normalization of XYZ and standardization of V
Normalizing the XYZ
coordinates ensures that all the coordinates are within a similar range, which can help prevent one coordinate from dominating the kriging interpolation process. It is especially useful when the XYZ
coordinates have different scales or units.
Standardizing the V
values by centering them around 0 and giving them equal variances can help ensure that the kriging algorithm is not influenced by the absolute magnitude of V
. This can be beneficial if the V
values have a large range or if you want to focus more on the relative differences between V
values rather than their absolute values.
By normalizing the XYZ coordinates and standardizing the V
values, you align the scales of the different variables and make them more comparable, which can improve the performance of the kriging algorithm.
[1]:
from py3dinterpolations.core.griddata import GridData
from py3dinterpolations.modelling.preprocessor import Preprocessor
from py3dinterpolations.plotting.plotting import plot_downsampling
import pandas as pd
[2]:
df = pd.read_csv(
"../../../tests/fixtures/griddata_default_colnames.csv",
)
df.tail()
[2]:
ID | X | Y | Z | V | |
---|---|---|---|---|---|
278 | ID00 | 15.194 | 0.0 | 12.0 | 9.047969 |
279 | ID00 | 15.194 | 0.0 | 10.0 | 10.077271 |
280 | ID00 | 15.194 | 0.0 | 8.0 | 20.082454 |
281 | ID00 | 15.194 | 0.0 | 6.0 | 19.042223 |
282 | ID00 | 15.194 | 0.0 | 4.0 | 12.889411 |
[3]:
gd = GridData(df)
gd.data.head()
[3]:
V | ||||
---|---|---|---|---|
ID | X | Y | Z | |
ID30 | 62.163 | 14.336 | 20.0 | 7.523950 |
18.0 | 7.504403 | |||
16.0 | 12.431670 | |||
14.0 | 12.653931 | |||
12.0 | 17.956143 |
[4]:
len(gd.data)
[4]:
283
Preprocessor
class
The Preprocessor
class is used to preprocess the data. It can be used to downsample the data, normalize the XYZ
coordinates, and standardize the V
values.
This class will return a new GridData
object with the preprocessed data. The original GridData
object will not be modified.
The new GridData
object will have the argument preprocessor_params
that contains the parameters used for preprocessing. This can be used to reverse the preprocessing.
[5]:
preproc_gd = Preprocessor(
gd,
normalize_xyz=True,
standardize_v=True,
downsampling_res=5
).preprocess()
[6]:
preproc_gd.data.head()
[6]:
V | ||||
---|---|---|---|---|
ID | X | Y | Z | |
ID30 | 0.347923 | 0.194902 | 1.00 | -0.530115 |
0.75 | -0.313169 | |||
0.50 | -0.188019 | |||
0.25 | -0.403830 | |||
ID29 | 0.164617 | 0.711821 | 1.00 | -0.535986 |
[7]:
len(preproc_gd.data)
[7]:
127
Plot comparison of data before and after downsampling
This method allows to plot the griddata before and after the downsampling. the plot allows to assess how the downsampling affects the data, by smoothing out peaks.
[8]:
len(preproc_gd.data)
[8]:
127
[9]:
plot_downsampling(gd, preproc_gd)
[9]: