Preprocessing data

The package offers preprocessing procedures.

Downsampling data: by grouping data back blocks of fixed size (resolution) and applying the mean function to each block.
Normalization X,Y,Z coordinates: normalizing the coordinates to the range [0,1] by subtracting the minimum and dividing by the maximum.
Standardization of V values: standardizing the V values by subtracting the mean and dividing by the standard deviation.

Downsampling

The downsampling procedure is implemented as taking the average of a block of given resolution. Can be easily customized.

Normalization of XYZ and standardization of V

Normalizing the XYZ coordinates ensures that all the coordinates are within a similar range, which can help prevent one coordinate from dominating the kriging interpolation process. It is especially useful when the XYZ coordinates have different scales or units.

Standardizing the V values by centering them around 0 and giving them equal variances can help ensure that the kriging algorithm is not influenced by the absolute magnitude of V. This can be beneficial if the V values have a large range or if you want to focus more on the relative differences between V values rather than their absolute values.

By normalizing the XYZ coordinates and standardizing the V values, you align the scales of the different variables and make them more comparable, which can improve the performance of the kriging algorithm.

[1]:

from py3dinterpolations.core.griddata import GridData
from py3dinterpolations.modelling.preprocessor import Preprocessor
from py3dinterpolations.plotting.plotting import plot_downsampling

import pandas as pd

[2]:

df = pd.read_csv(
    "../../../tests/fixtures/griddata_default_colnames.csv",
)
df.tail()

[2]:

	ID	X	Z	V
278	ID00	15.194	12.0	9.047969
279	ID00	15.194	10.0	10.077271
280	ID00	15.194	8.0	20.082454
281	ID00	15.194	6.0	19.042223
282	ID00	15.194	4.0	12.889411

[3]:

gd = GridData(df)
gd.data.head()

[3]:

				V
ID	X	Y	Z
ID30	62.163	14.336	20.0	7.523950
			18.0	7.504403
			16.0	12.431670
			14.0	12.653931
			12.0	17.956143

[4]:

len(gd.data)

[4]:

`Preprocessor` class

The Preprocessor class is used to preprocess the data. It can be used to downsample the data, normalize the XYZ coordinates, and standardize the V values.

This class will return a new GridData object with the preprocessed data. The original GridData object will not be modified.

The new GridData object will have the argument preprocessor_params that contains the parameters used for preprocessing. This can be used to reverse the preprocessing.

[5]:

preproc_gd = Preprocessor(
    gd,
    normalize_xyz=True,
    standardize_v=True,
    downsampling_res=5
).preprocess()

[6]:

preproc_gd.data.head()

[6]:

				V
ID	X	Y	Z
ID30	0.347923	0.194902	1.00	-0.530115
			0.75	-0.313169
			0.50	-0.188019
			0.25	-0.403830
ID29	0.164617	0.711821	1.00	-0.535986

[7]:

len(preproc_gd.data)

[7]:

Plot comparison of data before and after downsampling

This method allows to plot the griddata before and after the downsampling. the plot allows to assess how the downsampling affects the data, by smoothing out peaks.

[8]:

len(preproc_gd.data)

[8]:

[9]:

plot_downsampling(gd, preproc_gd)

[9]:

../_images/examples_preprocessing_11_0.png

Preprocessing data

Downsampling

Normalization of XYZ and standardization of V

Preprocessor class

Plot comparison of data before and after downsampling

`Preprocessor` class