data.StratifiedSampler

class data.StratifiedSampler(*args: Any, **kwargs: Any)[source]

A custom sampler that performs stratified sampling based on a partition criterion.

Note: Make sure that num_bins is chosen sufficiently small to avoid too many empty bins.

Parameters:
  • data_source – The data source to be sampled from.

  • partition_criterion – A callable function that takes a data source and returns a list of values used for partitioning.

  • num_samples – The total number of samples to be drawn from the data source.

  • num_bins – The number of bins to divide the partitioned values into. Defaults to 10.

  • replacement – Whether to sample with replacement or without replacement. Defaults to True.

  • verbose – Whether to print verbose output during sampling. Defaults to True.