buscarpy package¶
Module contents¶
- buscarpy.calculate_h0(labels_: ndarray[tuple[int], dtype[int64]] | List[int] | Series, N: int, recall_target: float = 0.95, bias: float = 1) float | None¶
Calculates a p-score for our null hypothesis h0, that we have missed our recall target recall_target.
- Parameters:
labels (list|np.array|pd.Series) – An ordered sequence of 1s and 0s representing, in the order in which they were screened, relevant and irrelevant documents respectively.
N (int) – The total number of documents from which you want to find the relevant examples. The size of the haystack.
recall_target (float) – The proportion of truly relevant documents you want to find, defaults to 0.95
bias (float) – The assumed likelihood of drawing a random relevant document over the likelihood of drawing a random irrelevant document. The higher this is, the better our ML has worked. When this is different to 1, we calculate the p score using biased urns.
- Returns:
a p-score for our null hypothesis. We can reject the null hypothesis (and stop screening) if p is below 1 - our confidence level.
- Return type:
float
- buscarpy.generate_dataset(N: int = 20000, prevalence: float = 0.01, bias: float = 10, random_seed: int | None = None) DataFrame¶
Generate a dataset resembling the kind created through machine learning prioritised screening.
- Parameters:
N (int) – The number of documents returned by the query
prevalence (float) – The proportion of those documents which are relevant
bias (float) – The likelihood of drawing a random relevant document over the likelihood of drawing a random irrelevant document. The higher this is, the better our ML has worked.
random_seed (int|None) – A random seed. Set this to ensure the same sequence of documents is drawn each time the code is run.
- Returns:
A dataframe with a N rows, of which prevalence`*`N are relevant. The column relevant is made up of 1s and 0s, where 1 represents a relevant, and 0 an irrelevant document
- Return type:
pd.DataFrame
- buscarpy.recall_frontier(labels_: ndarray[tuple[int], dtype[int64]] | List[int] | Series, N: int, bias: float = 1, plot: bool = True) dict¶
Calculates a p-score for our null hypothesis h0, that we have missed our recall target recall_target, across a range of recall_targets.
- Parameters:
labels (list|np.array|pd.Series) – An ordered sequence of 1s and 0s representing, in the order in which they were screened, relevant and irrelevant documents respectively.
N (int) – The total number of documents from which you want to find the relevant examples. The size of the haystack.
bias (float) – The assumed likelihood of drawing a random relevant document over the likelihood of drawing a random irrelevant document. The higher this is, the better our ML has worked. When this is different to 1, we calculate the p score using biased urns.
- Returns:
A dictionary containing a list of recall targets: recall_target. alongside a list of p-scores: p.
- Return type:
dict
- buscarpy.retrospective_h0(labels_: ndarray[tuple[int], dtype[int64]] | List[int] | Series, N: int, recall_target: float = 0.95, bias: float = 1, batch_size: int = 1000, confidence_level: float = 0.95, plot: bool = True) dict¶
Calculates a p-score for our null hypothesis h0, that we have missed our recall target recall_target, every batch_size documents
- Parameters:
labels (list|np.array|pd.Series) – An ordered sequence of 1s and 0s representing, in the order in which they were screened, relevant and irrelevant documents respectively.
N (int) – The total number of documents from which you want to find the relevant examples. The size of the haystack.
recall_target (float) – The proportion of truly relevant documents you want to find, defaults to 0.95
bias (float) – The assumed likelihood of drawing a random relevant document over the likelihood of drawing a random irrelevant document. The higher this is, the better our ML has worked. When this is different to 1, we calculate the p score using biased urns.
batch_size (int) – The size of the batches for which we will calculate our stopping criteria. Smaller batches = greater granularity = more computation time.
confidence_level – The score will be calculated until p is smaller than 1-confidence_level
plot (bool) – Whether to do a plot
- Returns:
A dictionary containing a list of batch sizes: batch_sizes. alongside a list of p-scores: p.
- Return type:
dict