pyprecag.processing

pyprecag.processing.block_grid(in_shapefilename, pixel_size, out_rasterfilename, out_vesperfilename, nodata_val=-9999, snap=True, overwrite=False)[source]

Convert a polygon boundary to a 0,1 raster and generate a VESPER compatible list of coordinates for kriging.

Args:

in_shapefilename (str): Input polygon shapefile pixel_size (float): The required output pixel size out_rasterfilename (str): Filename of the raster Tiff that will be created out_vesperfilename (str): The output vesper file nodata_val (int): an integer to use as nodata snap (bool): Snap Extent to a factor of the Pixel size overwrite (bool): if true overwrite existing file

Requirements: Input shapefile should be in a projected coordinate system……..

Notes:

Define pixel_size and no data value of new raster see http://stackoverflow.com/questions/2220749/rasterizing-a-gdal-layer

This works, but the extent of the output differs from that of an arcpy generated raster. See post: https://gis.stackexchange.com/q/139336

pyprecag.processing.calc_indices_for_block(image_file, pixel_size, band_map, out_folder, indices=[], image_epsg=0, image_nodata=None, polygon_shapefile=None, groupby=None, out_epsg=0)[source]
Calculate indices for a multi band image then resample to a specified pixel size

and block grid extent for each shapefile polygon.

Use this tool to create single band images, one for each index and shapefile polygon combination. A group-by column may be used to dissolve multiple polygons belonging to an individual block.

If a polygon shapefile is not specified, polygons will be created from the images’ mask

The polygon shapefile will be re-projected to match the image file

A block grid will be created for each feature for the nominated pixel size and used as the base for analysis

“image_epsg” and “image_nodata” can be used to set the coordinate system and image nodata
values when they are not present within the image file.

The output filename will consist of the selected feature and calculated index.

The processing steps to achieve this are:
  • Reproject image to new coordinate system if required.
  • Calculate indices
  • Dissolve polygons optionally using the group-by column
Loop through each polygon feature and……
  • Create Block Grid
  • Clip image to polygon
  • Resample and fit to block grid for a pixel size and using Average resampling technique
  • Identify holes and fill if necessary
  • smooth image using a 5x5 pixel moving average (focal_statistics)
Parameters:
  • image_file (str) – the input image file
  • pixel_size (int) – the pixel size used for resampling
  • band_map (pyprecag.bandops.BandMapping) – A dictionary matching band numbers to band type (ie Red, Green, Blue etc.)
  • out_folder (str) – The output folder for the created images.
  • indices (List[str]) – The list of indices to calculate.
  • image_epsg (int) – epsg number of the image to be used when missing in image
  • image_nodata (int) – nodata value of the image to be used when missing in image
  • polygon_shapefile (str) – a polygon shapefile used to cut up an image.
  • groupby (str) – the column/field to use to group multiple features.
  • out_epsg (int) – The epsg number representing the coordinate system of the output images.
Returns:

the list of created images.

Return type:

List[str]

pyprecag.processing.clean_trim_points(points_geodataframe, points_crs, process_column, output_csvfile, boundary_polyfile=None, out_keep_shapefile=None, out_removed_shapefile=None, remove_zeros=True, stdevs=3, iterative=True, thin_dist_m=1.0)[source]

Clean and/or Trim a points dataframe.

Preparation includes:
  • Clip data to polygon.
  • Move data to a Projected Coordinate system.
  • Remove values where data_column are less than or equal to zero
  • Calculate Normalised value for data_column (number of StDev).
  • Iteratively Trim outliers based on Normalised data_column
  • Remove points closer than a set distance (trim_dist_m)
Parameters:
  • points_geodataframe (geopandas.geodataframe.GeoDataFrame) – The input points geodataframe
  • points_crs (pyprecag_crs.crs) – The Spatial Reference System of the point_geodataframe
  • process_column (str) – The column to normalise, trim and clean.
  • output_csvfile (str) – The Trimmed & Cleaned output CSV file
  • out_keep_shapefile (str) – Optionally save the Trimmed & Cleaned to a shapefile
  • out_removed_shapefile (str) – Optionally save a shapefile containing features removed while cleaning/filtering. A column called filter will be added showing the reason a point was removed.
  • boundary_polyfile (str) – Optionally a polygon used to Clip the points.
  • remove_zeros (bool) – Optionally remove values where data_column are <= to zero prior to removing outliers.
  • stdevs (int) – The number of standard deviations used to trim outliers
  • iterative (bool) – Optionally Iteratively Trim outliers based on Normalised Data column
  • thin_dist_m (float) – A distance in metres representing the minimum allowed distance between points. Points less than this distance will be removed.
Returns:

Representing the cleaned/trimmed data file.
New columns included in the output are:

Eastings, Northings - The point coordinates in a projected coordinate system EN_Epsg - The epsg number of the projected coordinate system Nrm_ - normalise data_column calculation.

pyprecag_crs.crs: The pyprecag CRS object of the points dataframe.

Return type:

geopandas.geodataframe.GeoDataFrame

pyprecag.processing.create_points_along_line(lines_geodataframe, lines_crs, distance_between_points, offset_distance, out_epsg=0, out_points_shapefile=None, out_lines_shapefile=None)[source]

Add points along a line using a specified distance and create left/right parallel points offset by a distance.

If the lines are in a geographic coordinate system they will be re-projected to a projected coordinate system.

All touching lines will be treated as one. MultiPart geometry will be converted to single part geometry. The first and last points will be offset from start/end of the line evenly. Attributes from the input lines will be lost.

line_crs is used to ensure that the correct wkt definition is maintained when using geopandas.

Parameters:
  • lines_geodataframe (geopandas.geodataframe.GeoDataFrame) – A Geopandas dataframe containing Lines
  • lines_crs (pyprecag.crs.crs) – The detailed coordinate system
  • distance_between_points (int) – The separation distance between points.
  • offset_distance (int) – The distance between the Strip point and parallel point.
  • out_epsg (int) – Optionally specify the epsg number for the output coordinate system. This should be a project coordinate system
  • out_points_shapefile (str) – Optionally specify shapefile path and filename used to save the points to. If a path is not supplied, it will save the file to TEMPDIR by default.
  • out_lines_shapefile (str) – Optionally specify shapefile path and filename used to save the lines to. If a path is not supplied, it will save the file to TEMPDIR by default
Returns:

The geodataframe containing the created points. pyprecag.crs.crs: The coordinate system of both the points and lines geodataframe. geopandas.geodataframe.GeoDataFrame: The geodataframe containing the created lines.

Return type:

geopandas.geodataframe.GeoDataFrame

pyprecag.processing.create_polygon_from_point_trail(points_geodataframe, points_crs, out_filename, thin_dist_m=1.0, aggregate_dist_m=25, buffer_dist_m=10, shrink_dist_m=3)[source]

Create a polygon from a Point Trail created from a file containing GPS coordinates.

The point order should be sorted by increasing time sequence to ensure the ‘dot-to-dot’ occurs correctly.

The workflow is as follows:
Points -> Lines -> Buffer Out (expand) -> Buffer In (shrink)

For efficiency, points will be thinned. This will remove points less than the set thinDist. Resulting points will be connected to form lines.

Line ends are detected where the distance between points is greater than the aggregate distance.
This will occur at a turning point or interruption (ie creek) and gps point collection is

stopped. Typically this distance is slightly greater than the row/swath width.

Buffering is used to convert to polygon and remove the gap between row_count/swaths. The buffer distance is usually half the row/swath width.

Buffering with a negative value is used to remove excess area on the outside of the polygon. This shrink distance is usually around 7m less than the buffer distance.

Parameters:
  • points_geodataframe (geopandas.geodataframe.GeoDataFrame) – Input points vector geodataframe
  • points_crs (pyprecag_crs.crs) – The Projected Spatial Reference System of the point_geodataframe
  • out_filename (str) – Output polygon file. This should be the same format as the input.
  • thin_dist_m (float) – The minimum distance in metres between points to be used to thin the points dataset.
  • aggregate_dist_m (int) – A floating number representing the maximum distance between point. This is used to detect a line end. Typically this is slightly larger than the row/swath width.
  • buffer_dist_m (int) – The Buffer distance in metres. Typically half the swath or row width.
  • shrink_dist_m (int) – The shrink distance in metres. Typically about 7 less than the buffer distance.
pyprecag.processing.extract_pixel_statistics_for_points(points_geodataframe, points_crs, rasterfiles, output_csvfile, function_list=[<function nanmean>], size_list=[3])[source]

Extract statistics from a list of rasters at set locations.

All raster files in the list should be of the same pixel size. While multi-bands raster files are supported as an input, statistics will only be calculated and extracted for the first band.

Statistics are calculated on pixel values with a square neighbourhood and saved to a CSV file.

Pixels assigned with Nodata are converted to np.nan and excluded from the calculations.

All original columns are included in the output csv file in addition to columns containing the values for each raster -> size -> statistic combination. The column names can either be specified or derived via the out_colname argument in raster_ops.focal_statistics.

A size of 1 can be used to extract exact pixel values and whereby no statistics are calculated.

Parameters:
  • points_geodataframe (geopandas.geodataframe.GeoDataFrame) – The input points geodataframe of locations to extract statistics.
  • points_crs (pyprecag_crs.crs) – The Spatial Reference System of the point_geodataframe
  • rasterfiles (List[str]) – the list of paths & file names for the input rasters
  • output_csvfile (str) – the path and filename of the output CSV.
  • function_list (List[function]) – A list of statistical functions to apply to the raster. These can include numpy functions like np.nanmean or custom ones like pixel_count
  • size_list (List[int]) – The list of neighbourhood sizes used to apply statistical filtering.
Returns:

dataframe of the points and calculated statistics pyprecag_crs.crs: The pyprecag CRS object of the points dataframe.

Return type:

geopandas.geodataframe.GeoDataFrame

pyprecag.processing.kmeans_clustering(raster_files, output_tif, n_clusters=3, max_iterations=500)[source]

Create zones with k-means clustering from multiple raster files as described in

The input raster files should all:
  • have the same pixel size
  • be in the same coordinate system
  • should overlap

Only the first band of each raster will be used.

The output TIFF image extent will be the minimum overlapping extent of the input images. Each image will be resampled to a fixed coordinate to ensure pixels between images align.

Parameters:
  • raster_files (List[str]) – The list of input raster files
  • output_tif (str) – The output TIFF file
  • n_clusters (int) – The number of clusters/zones to create.
  • max_iterations (int) – Maximum number of iterations for k-means algorithm in a single run.
Returns:

A dataframe containing cluster statistics for each image.

Return type:

pandas.core.frame.DataFrame

pyprecag.processing.multi_block_bands_processing(image_file, pixel_size, out_folder, band_nums=[], image_epsg=0, image_nodata=0, polygon_shapefile=None, groupby=None)[source]
Derive multiple resampled image bands matching the specified pixel size and block grid extent
for each shapefile polygon.

Use this tool create individual band images for each polygon within a shapefile. A group-by column may be used to dissolve multiple polygons belonging to an individual block. The fitting of rasters to a base Block (grid) ensures for easier, more accurate multi-layered analysis required by in Precision Agriculture.

The processing steps to achieve this are:
  • Dissolve polygons optionally using the group-by column
Loop through each polygon feature and……
  • Create Block Grid
  • Clip image to polygon
  • Resample and fit to block grid for a pixel size and using Average resampling technique
  • Identify holes and fill if necessary
  • smooth image using a 5x5 pixel moving average (focal_statistics)

If a polygon shapefile is not specified, polygons will be created from the images’ mask

The polygon shapefile will be re-projected to match the image file

A block grid will be created for each feature for the nominated pixel size and used as the base for analysis

“image_epsg” and “image_nodata” can be used to set the coordinate system and image nodata values
when they are not present within the image file.

The output filename will consist of the selected band number or the value of a band’s rasterio custom name tag

Parameters:
  • image_file (str) – An input image
  • pixel_size (int, float) – The desired output pixel size in metres.
  • out_folder (str) – The output folder for the created images.
  • band_nums (List[int]) – a list of band numbers to process. If empty all bands will be used
  • image_epsg (int) – the epsg number for the image. Only used if not provided by the image.
  • image_nodata (int) – the nodata value for the image. Only used if not provided by the image.
  • polygon_shapefile (str) – a polygon shapefile used to cut up an image.
  • groupby (str) – the column/field to use to group multiple features.
Returns:

a list of created files.

Return type:

List[str]

pyprecag.processing.persistor_all_years(raster_files, output_tif, greater_than, target_percentage)[source]

Determine the performance persistence of yield by across multiple years as described in Bramley and Hamilton (2005)

The “Target over all years” method assigns a value to each pixel to indicate the number of instances (in the raster list) in which that pixel was either less than or greater than the mean (+/- a nominated percentage) of that raster.

All input rasters MUST overlap and have the same coordinate system and pixel size.

If a path is omitted from output_tif it will be created in your temp folder.

References

Bramley RGV, Hamilton RP (2005) Understanding variability in winegrape production systems 1. Within vineyard variation in yield over several vintages. Australian Journal Of Grape And Wine Research 10, 32-45. doi:10.1111/j.1755-0238.2004.tb00006.x.

Parameters:
  • raster_files (List[str]) – List of rasters to use as inputs
  • output_tif (str) – Output TIF file
  • greater_than (bool) – if true test above (gt) the mean
  • target_percentage (int) – the percent variation either above/below the mean. This should be a integer between -50 and 50
Returns:

The output tif name

Return type:

str

pyprecag.processing.persistor_target_probability(upper_raster_files, upper_percentage, upper_probability, lower_raster_files, lower_percentage, lower_probability, output_tif)[source]
Determine the probability of a performance being exceeded or not being met with an upper and
lower limit as described in Bramley and Hamilton (2005).

The “Target probability” method builds on the target over all years method, in that it includes an upper range (i.e. cells with a given frequency of values that are above the mean +/- a

given percentage) and a lower range (i.e. cells with a given frequency of values that are below the mean +/- a given percentage).

A value is assigned to each pixel which indicates whether the performance in that pixel over a given proportion of years is:

  1. Greater than the mean plus or minus the nominated percentage (value = 1)

b) Less than the mean plus or minus the nominated percentage (value = -1) The remaining pixels which do not fall into category a) or b) are given a value of 0.

All input rasters MUST overlap and have the same coordinate system and pixel size.

If a path is omitted from output_tif it will be created in your temp folder.

References

Bramley RGV, Hamilton RP (2005) Understanding variability in winegrape production systems 1. Within vineyard variation in yield over several vintages. Australian Journal Of Grape And Wine Research 10, 32-45. doi:10.1111/j.1755-0238.2004.tb00006.x.

Parameters:
  • upper_raster_files (List[str]) – List of rasters to used for the analysis of the UPPER category
  • upper_percentage (int) – the percent variation either above/below the mean to apply to the UPPER raster category.
  • upper_probability (int) – the probability percentage to apply to the LOWER category
  • lower_raster_files (List[str]) – List of rasters to used for the analysis of the LOWER category
  • lower_percentage (int) – the percent variation either above/below the mean to apply to the LOWER raster category.
  • lower_probability (int) – the probability percentage to apply to the LOWER category
  • output_tif (str) – Output TIF file
pyprecag.processing.random_pixel_selection(raster, raster_crs, num_points, out_shapefile=None)[source]

Select randomly distributed valid data pixels from a raster file and convert to points representing the center of the pixel. There is an option to save to shapefile if required.

Note

When opening a raster using RasterIO the coordinate system is imported from the Proj4 string
and converted to a crs_wkt. This means that the crs_wkt recorded against the opened dataset
does not always equal the crs_wkt of the original raster file. To remedy this, use pyprecag_crs.getCRSfromRasterFile to create a crs object
Parameters:
  • raster (rasterio.io.DatasetReader) – Opened raster file via rasterio.open(os.path.normpath())
  • num_points (int) – The number of random sample points to select.
  • raster_crs (pyprecag_crs.crs) – The Spatial Reference System for the raster file
  • out_shapefile (str) – Optional.. the path and name of a shapefile used to save the points.
Returns:

A dataframe containing the select pixels as points pyprecag_crs.crs: The pyprecag CRS object of the points dataframe.

Return type:

geopandas.geodataframe.GeoDataFrame

pyprecag.processing.resample_bands_to_block(image_file, pixel_size, out_folder, band_nums=[], image_epsg=0, image_nodata=None, polygon_shapefile=None, groupby=None, out_epsg=0)[source]
Derive multiple resampled image bands matching the specified pixel size and block grid extent
for each shapefile polygon.

Use this tool create individual band images for each polygon within a shapefile. A group-by column may be used to dissolve multiple polygons belonging to an individual block. The fitting of rasters to a base Block (grid) ensures for easier, more accurate multi-layered analysis required by in Precision Agriculture.

The processing steps to achieve this are:
  • Reproject image to nominated coordinate system
  • Dissolve polygons optionally using the groupby column
Loop through each polygon feature and……
  • Create Block Grid
  • Clip image to polygon
  • Resample and fit to block grid for a pixel size and using Average resampling technique
  • Identify holes and fill if necessary
  • smooth image using a 5x5 pixel moving average (focal_statistics)

If a polygon shapefile is not specified, polygons will be created from the images’ mask

The polygon shapefile will be re-projected to match the image file

A block grid will be created for each feature for the nominated pixel size and used as the base for analysis

“image_epsg” and “image_nodata” can be used to set the coordinate system and image nodata values
when they are not present within the image file.

The output filename will consist of the selected band number or the value of a band’s rasterio custom name tag

Parameters:
  • image_file (str) – An input image
  • pixel_size (int, float) – The desired output pixel size in metres.
  • out_folder (str) – The output folder for the created images.
  • band_nums (List[int]) – a list of band numbers to process. If empty all bands will be used
  • image_epsg (int) – epsg number of the image to be used when missing in image
  • image_nodata (int) – nodata value of the image to be used when missing in image
  • polygon_shapefile (str) – a polygon shapefile used to cut up an image.
  • groupby (str) – the column/field to use to group multiple features.
  • out_epsg (int) – The epsg number representing the output coordinate system.
Returns:

a list of created files.

Return type:

List[str]

pyprecag.processing.ttest_analysis(points_geodataframe, points_crs, values_raster, out_folder, zone_raster='', control_raster='', size=5, create_graph=False)[source]

Run a moving window t-test analysis for a strip trial as described in Lawes and Bramley (2012).

Format of the points must be from the create_points_along_line tools. All input rasters must be of the same coordinate system and pixel size and overlap with the points.

Output statistics include:
controls_mean - row by row mean of the control columns treat_diff - row by row difference between the treatment and controls_mean columns av_treat_diff - calculate mean of values using a moving window using the treat_diff column p_value - calculate p_value using a moving window using treatment and controls_mean columns RI - Response Index using the treatment and controls_mean columns
Output Files include:
For each line and strip combination :
  • png Map showing orientation of the line (start and ends)
  • png set of graphs
  • CSV file of derived statistics.
Reference:
Lawes RA, Bramley RGV. 2012. A Simple Method for the Analysis of On-Farm Strip Trials.
Agronomy Journal 104, 371-377.
Parameters:
  • points_geodataframe (geopandas.geodataframe.GeoDataFrame) – points derived using create_points_along_line
  • points_crs (pyprecag.crs.crs) – the coordinate system for the points.
  • values_raster (str) – a kriged raster containing the treatment values
  • out_folder (str) – folder for output files.
  • zone_raster (str) – a raster containing zones.
  • control_raster (str) – a kriged raster of
  • size (int) – the size used to calculate the moving window statistics.
  • create_graph (bool) –
Returns:

dataframe containing output statistics

Return type:

pandas.core.frame.DataFrame