buscarpy package

Module contents

buscarpy.calculate_h0(labels_: ndarray[tuple[int], dtype[int64]] | List[int] | Series, N: int, recall_target: float = 0.95, bias: float = 1) float | None

Calculates a p-score for our null hypothesis h0, that we have missed our recall target recall_target.

Parameters:
  • labels (list|np.array|pd.Series) – An ordered sequence of 1s and 0s representing, in the order in which they were screened, relevant and irrelevant documents respectively.

  • N (int) – The total number of documents from which you want to find the relevant examples. The size of the haystack.

  • recall_target (float) – The proportion of truly relevant documents you want to find, defaults to 0.95

  • bias (float) – The assumed likelihood of drawing a random relevant document over the likelihood of drawing a random irrelevant document. The higher this is, the better our ML has worked. When this is different to 1, we calculate the p score using biased urns.

Returns:

a p-score for our null hypothesis. We can reject the null hypothesis (and stop screening) if p is below 1 - our confidence level.

Return type:

float

buscarpy.generate_dataset(N: int = 20000, prevalence: float = 0.01, bias: float = 10, random_seed: int | None = None) DataFrame

Generate a dataset resembling the kind created through machine learning prioritised screening.

Parameters:
  • N (int) – The number of documents returned by the query

  • prevalence (float) – The proportion of those documents which are relevant

  • bias (float) – The likelihood of drawing a random relevant document over the likelihood of drawing a random irrelevant document. The higher this is, the better our ML has worked.

  • random_seed (int|None) – A random seed. Set this to ensure the same sequence of documents is drawn each time the code is run.

Returns:

A dataframe with a N rows, of which prevalence`*`N are relevant. The column relevant is made up of 1s and 0s, where 1 represents a relevant, and 0 an irrelevant document

Return type:

pd.DataFrame

buscarpy.recall_frontier(labels_: ndarray[tuple[int], dtype[int64]] | List[int] | Series, N: int, bias: float = 1, plot: bool = True) dict

Calculates a p-score for our null hypothesis h0, that we have missed our recall target recall_target, across a range of recall_targets.

Parameters:
  • labels (list|np.array|pd.Series) – An ordered sequence of 1s and 0s representing, in the order in which they were screened, relevant and irrelevant documents respectively.

  • N (int) – The total number of documents from which you want to find the relevant examples. The size of the haystack.

  • bias (float) – The assumed likelihood of drawing a random relevant document over the likelihood of drawing a random irrelevant document. The higher this is, the better our ML has worked. When this is different to 1, we calculate the p score using biased urns.

Returns:

A dictionary containing a list of recall targets: recall_target. alongside a list of p-scores: p.

Return type:

dict

buscarpy.retrospective_h0(labels_: ndarray[tuple[int], dtype[int64]] | List[int] | Series, N: int, recall_target: float = 0.95, bias: float = 1, batch_size: int = 1000, confidence_level: float = 0.95, plot: bool = True) dict

Calculates a p-score for our null hypothesis h0, that we have missed our recall target recall_target, every batch_size documents

Parameters:
  • labels (list|np.array|pd.Series) – An ordered sequence of 1s and 0s representing, in the order in which they were screened, relevant and irrelevant documents respectively.

  • N (int) – The total number of documents from which you want to find the relevant examples. The size of the haystack.

  • recall_target (float) – The proportion of truly relevant documents you want to find, defaults to 0.95

  • bias (float) – The assumed likelihood of drawing a random relevant document over the likelihood of drawing a random irrelevant document. The higher this is, the better our ML has worked. When this is different to 1, we calculate the p score using biased urns.

  • batch_size (int) – The size of the batches for which we will calculate our stopping criteria. Smaller batches = greater granularity = more computation time.

  • confidence_level – The score will be calculated until p is smaller than 1-confidence_level

  • plot (bool) – Whether to do a plot

Returns:

A dictionary containing a list of batch sizes: batch_sizes. alongside a list of p-scores: p.

Return type:

dict