Preprocessing data

The package offers preprocessing procedures.

  • Downsampling data: by grouping data back blocks of fixed size (resolution) and applying the mean function to each block.

  • Normalization X,Y,Z coordinates: normalizing the coordinates to the range [0,1] by subtracting the minimum and dividing by the maximum.

  • Standardization of V values: standardizing the V values by subtracting the mean and dividing by the standard deviation.

Downsampling

The downsampling procedure is implemented as taking the average of a block of given resolution. Can be easily customized.

Normalization of XYZ and standardization of V

Normalizing the XYZ coordinates ensures that all the coordinates are within a similar range, which can help prevent one coordinate from dominating the kriging interpolation process. It is especially useful when the XYZ coordinates have different scales or units.

Standardizing the V values by centering them around 0 and giving them equal variances can help ensure that the kriging algorithm is not influenced by the absolute magnitude of V. This can be beneficial if the V values have a large range or if you want to focus more on the relative differences between V values rather than their absolute values.

By normalizing the XYZ coordinates and standardizing the V values, you align the scales of the different variables and make them more comparable, which can improve the performance of the kriging algorithm.

[1]:
from py3dinterpolations.core.griddata import GridData
from py3dinterpolations.modelling.preprocessor import Preprocessor
from py3dinterpolations.plotting.plotting import plot_downsampling

import pandas as pd
[2]:
df = pd.read_csv(
    "../../../tests/fixtures/griddata_default_colnames.csv",
)
df.tail()
[2]:
ID X Y Z V
278 ID00 15.194 0.0 12.0 9.047969
279 ID00 15.194 0.0 10.0 10.077271
280 ID00 15.194 0.0 8.0 20.082454
281 ID00 15.194 0.0 6.0 19.042223
282 ID00 15.194 0.0 4.0 12.889411
[3]:
gd = GridData(df)
gd.data.head()
[3]:
V
ID X Y Z
ID30 62.163 14.336 20.0 7.523950
18.0 7.504403
16.0 12.431670
14.0 12.653931
12.0 17.956143
[4]:
len(gd.data)
[4]:
283

Preprocessor class

The Preprocessor class is used to preprocess the data. It can be used to downsample the data, normalize the XYZ coordinates, and standardize the V values.

This class will return a new GridData object with the preprocessed data. The original GridData object will not be modified.

The new GridData object will have the argument preprocessor_params that contains the parameters used for preprocessing. This can be used to reverse the preprocessing.

[5]:
preproc_gd = Preprocessor(
    gd,
    normalize_xyz=True,
    standardize_v=True,
    downsampling_res=5
).preprocess()
[6]:
preproc_gd.data.head()
[6]:
V
ID X Y Z
ID30 0.347923 0.194902 1.00 -0.530115
0.75 -0.313169
0.50 -0.188019
0.25 -0.403830
ID29 0.164617 0.711821 1.00 -0.535986
[7]:
len(preproc_gd.data)
[7]:
127

Plot comparison of data before and after downsampling

This method allows to plot the griddata before and after the downsampling. the plot allows to assess how the downsampling affects the data, by smoothing out peaks.

[8]:
len(preproc_gd.data)
[8]:
127
[9]:
plot_downsampling(gd, preproc_gd)
[9]:
../_images/examples_preprocessing_11_0.png