Skip to content

Eval

This module contains classes and functions for optimising and evaluating the performance of the FACSIMILE model.

eval

Classes:

Functions:

  • calculate_score

    Calculate the score accounting for the number of included items and minimum

  • evaluate_facsimile

    Evaluate the item reduction model for a given set of alphas.

FACSIMILEOptimiser

FACSIMILEOptimiser(n_iter: int = 100, fit_intercept: bool = True, n_jobs: int = 1, seed: int = 42, alpha_dist_scaling: float = 1, additional_metrics: Optional[Dict[str, callable]] = None)

The procedure estimates a "score" for each set of alpha values which balances accuracy (R^2) with parsimony (number of items included). The score is defined as the minimum R^2 value across the target variables, multiplied by 1 minus the number of included items divided by the total number of items. This ensures that it selects a model with a good fit, but also with a small number of items.

It also returns R^2 values for each target, the minimum R^2 value, the number of included items, and the alpha values for each target.

Parameters:

  • n_iter

    (int, default: 100 ) –

    Number of iterations to run. Defaults to 100.

  • fit_intercept

    (bool, default: True ) –

    Whether to fit an intercept. Defaults to True.

  • n_jobs

    (int, default: 1 ) –

    Number of jobs to run in parallel. Defaults to 1.

  • seed

    (int, default: 42 ) –

    Random seed. Defaults to 42.

  • alpha_dist_scaling

    (float, default: 1 ) –

    Scaling factor for the distribution of alpha (regularisation parameter) values. By default, alpha values are sampled from a beta distribution that is skewed towards zero. This parameter allows this distribution to be scaled, which may be more appropriate for certain datasets. Defaults to 1.

  • additional_metrics

    (Optional[Dict[str, callable]], default: None ) –

    Dictionary of additional metrics to calculate, in addition to the penalised score and R^2. These should be supplied as functions that take the true and predicted values as arguments and return a single value. Defaults to None.

Methods:

Source code in facsimile/eval.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
def __init__(
    self,
    n_iter: int = 100,
    fit_intercept: bool = True,
    n_jobs: int = 1,
    seed: int = 42,
    alpha_dist_scaling: float = 1,
    additional_metrics: Optional[Dict[str, callable]] = None,
) -> None:
    """
    Optimise the alpha values for each target.

    The procedure estimates a "score" for each set of alpha values which
    balances accuracy (R^2) with parsimony (number of items included). The
    score is defined as the minimum R^2 value across the target variables,
    multiplied by 1 minus the number of included items divided by the total
    number of items. This ensures that it selects a model with a good fit,
    but also with a small number of items.

    It also returns R^2 values for each target, the minimum R^2 value, the
    number of included items, and the alpha values for each target.

    Args:
        n_iter (int, optional): Number of iterations to run. Defaults to
            `100`.
        fit_intercept (bool, optional): Whether to fit an intercept.
            Defaults to `True`.
        n_jobs (int, optional): Number of jobs to run in parallel. Defaults
            to `1`.
        seed (int, optional): Random seed. Defaults to `42`.
        alpha_dist_scaling (float, optional): Scaling factor for the
            distribution of alpha (regularisation parameter) values. By
            default, alpha values are sampled from a beta distribution that
            is skewed towards zero. This parameter allows this distribution
            to be scaled, which may be more appropriate for certain
            datasets. Defaults to `1`.
        additional_metrics (Optional[Dict[str, callable]], optional):
            Dictionary of additional metrics to calculate, in addition to
            the penalised score and R^2. These should be supplied as
            functions that take the true and predicted values as arguments
            and return a single value. Defaults to `None`.
    """

    self.n_iter = n_iter
    self.fit_intercept = fit_intercept
    self.n_jobs = n_jobs
    self.seed = seed
    self.alpha_dist_scaling = alpha_dist_scaling

    # Check additional metrics are callable
    if additional_metrics is not None:
        for metric in additional_metrics.values():
            assert callable(
                metric
            ), "Additional metrics must be callable."

    self.additional_metrics = additional_metrics

fit

fit(X_train: Union[DataFrame, ArrayLike], y_train: Union[DataFrame, ArrayLike], X_val: Union[DataFrame, ArrayLike], y_val: Union[DataFrame, ArrayLike], target_names: Tuple[str] = None, progress_bar: bool = True) -> None

Optimise the alpha values for each target.

The results of the procedure are stored in the results_ attribute as a dataframe. Columns are: Run number, R^2 for each target, minimum R^2, score, number of included items, alpha values for each target.

If other metrics are provided, these are also stored in the dataframe. The minimum value for each metric across the Y variables is also stored.

Parameters:

  • X_train
    (Union[DataFrame, ArrayLike]) –

    Item responses for training.

  • y_train
    (Union[DataFrame, ArrayLike]) –

    Target scores for training.

  • X_val
    (Union[DataFrame, ArrayLike]) –

    Item responses for validation.

  • y_val
    (Union[DataFrame, ArrayLike]) –

    Target scores for validation.

  • target_names
    (Tuple[str], default: None ) –

    Names of the target variables. Defaults to None.

  • progress_bar
    (bool, default: True ) –

    Whether to show a progress bar when fitting. Default to True.

Source code in facsimile/eval.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
def fit(
    self,
    X_train: Union[pd.DataFrame, ArrayLike],
    y_train: Union[pd.DataFrame, ArrayLike],
    X_val: Union[pd.DataFrame, ArrayLike],
    y_val: Union[pd.DataFrame, ArrayLike],
    target_names: Tuple[str] = None,
    progress_bar: bool = True,
) -> None:
    """
    Optimise the alpha values for each target.

    The results of the procedure are stored in the `results_` attribute as
    a dataframe. Columns are: Run number, R^2 for each target, minimum R^2,
    score, number of included items, alpha values for each target.

    If other metrics are provided, these are also stored in the dataframe.
    The minimum value for each metric across the Y variables is also
    stored.

    Args:
        X_train (Union[pd.DataFrame, ArrayLike]): Item responses for
            training.
        y_train (Union[pd.DataFrame, ArrayLike]): Target scores for
            training.
        X_val (Union[pd.DataFrame, ArrayLike]): Item responses for
            validation.
        y_val (Union[pd.DataFrame, ArrayLike]): Target scores for
            validation.
        target_names (Tuple[str], optional): Names of the target variables.
            Defaults to `None`.
        progress_bar (bool, optional): Whether to show a progress bar
            when fitting. Default to True.
    """

    # Set up RNG
    rng = np.random.default_rng(self.seed)

    # Get number of targets
    n_targets = y_train.shape[1]

    # Check target names are correct length
    if target_names is not None:
        assert (
            len(target_names) == n_targets
        ), "Number of target names must equal number of targets"
    else:
        target_names = [
            "Variable {}".format(i + 1) for i in range(n_targets)
        ]

    # Set up alphas
    alphas = (
        rng.beta(1, 3, size=(self.n_iter, n_targets))
        * self.alpha_dist_scaling
    )

    # Use partial to set up the function with the data
    evaluate_facsimile_with_data = partial(
        evaluate_facsimile,
        X_train,
        y_train,
        X_val,
        y_val,
        fit_intercept=self.fit_intercept,
        additional_metrics=self.additional_metrics,
    )

    if self.n_jobs == 1:
        results = []
        if progress_bar:
            for i in tqdm(alphas, desc="Evaluation"):
                results.append(evaluate_facsimile_with_data(i))
        else:
            for i in alphas:
                results.append(evaluate_facsimile_with_data(i))
    else:
        if progress_bar:
            with tqdm_joblib(
                tqdm(desc="Evaluation", total=self.n_iter)
            ):
                results = Parallel(n_jobs=self.n_jobs)(
                    delayed(evaluate_facsimile_with_data)(i)
                    for i in alphas
                )
        else:
            results = Parallel(n_jobs=self.n_jobs)(
                delayed(evaluate_facsimile_with_data)(i)
                for i in alphas
            )

    # Extract results
    scores = np.array([i[0]["score"] for i in results])
    r2s = np.stack([i[0]["r2"] for i in results])
    n_items = np.stack([i[1] for i in results])

    # Store results in a dataframe
    output_df = {
        "run": range(self.n_iter),
    }

    # Get additional metrics
    if self.additional_metrics is not None:
        for metric_name in self.additional_metrics.keys():
            metrics = np.stack([i[0][metric_name] for i in results])
            for i in range(n_targets):
                output_df[
                    metric_name + "_" + target_names[i]
                ] = metrics[:, i]

    # Add R2s for each target
    for i in range(n_targets):
        output_df["r2_" + target_names[i]] = r2s[:, i]

    # Add minimum R2
    output_df["min_r2"] = r2s.min(axis=1)

    # Add maximum R2
    output_df["max_r2"] = r2s.max(axis=1)

    # Add minimum and maximum for other metrics
    if self.additional_metrics is not None:
        for metric_name in self.additional_metrics.keys():
            output_df["min_" + metric_name] = np.min(
                np.stack([i[0][metric_name] for i in results]), axis=1
            )
            output_df["max_" + metric_name] = np.max(
                np.stack([i[0][metric_name] for i in results]), axis=1
            )

    # Add scores
    output_df["scores"] = scores

    # Add number of items
    output_df["n_items"] = n_items

    # Add alpha values
    for i in range(n_targets):
        output_df["alpha_" + target_names[i]] = alphas[:, i]

    output_df = pd.DataFrame(output_df)

    self.results_ = output_df

get_best_classifier

get_best_classifier(metric: str = 'scores', highest_best: bool = True) -> FACSIMILE

Get the best classifier based on the optimisation results, i.e. the classifier with the highest score (balancing R^2 against number of included items).

Parameters:

  • metric
    (str, default: 'scores' ) –

    Metric to use to select the best classifier. Defaults to 'scores'.

  • highest_best
    (bool, default: True ) –

    Whether higher values of the metric are better. Defaults to True.

Returns:

  • FACSIMILE ( FACSIMILE ) –

    Best classifier.

Source code in facsimile/eval.py
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
def get_best_classifier(
    self, metric: str = "scores", highest_best: bool = True
) -> FACSIMILE:
    """
    Get the best classifier based on the optimisation results, i.e. the
    classifier with the highest score (balancing R^2 against number of
    included items).

    Args:
        metric (str, optional): Metric to use to select the best
            classifier. Defaults to `'scores'`.
        highest_best (bool, optional): Whether higher values of the metric
            are better. Defaults to `True`.

    Returns:
        FACSIMILE: Best classifier.
    """

    if not hasattr(self, "results_"):
        raise ValueError(
            "Optimisation results not available. Please run fit() first."
        )

    if metric not in self.results_.columns:
        raise ValueError(
            "Metric not available. Please choose from: {}".format(
                ", ".join(self.results_.columns)
            )
        )

    # Get index of best classifier
    if highest_best:
        best_idx = self.results_[metric].argmax()
    else:
        best_idx = self.results_[metric].argmin()

    # Print out information about this classifier
    print("Best classifier:")
    print(
        r"Minimum R^2: {value}".format(
            value=self.results_.iloc[best_idx]["min_r2"]
        )
    )
    print(
        r"Number of included items: {value}".format(
            value=self.results_.iloc[best_idx]["n_items"]
        )
    )

    # Get alpha values for best classifier
    best_alphas = self.results_.iloc[best_idx][
        [i for i in self.results_.columns if i.startswith("alpha")]
    ].values

    # Set up model
    clf = FACSIMILE(alphas=best_alphas)

    return clf

get_best_classifier_max_items

get_best_classifier_max_items(max_items: int = 100, metric: str = 'scores', highest_best: bool = True) -> FACSIMILE

Get the best classifier based on the optimisation results, subject to a maximum number of items being included. For example, if max_items == 100, the best classifier with 100 or fewer items will be returned.

Parameters:

  • max_items
    (int, default: 100 ) –

    Maximum number of items. Defaults to 100.

  • metric
    (str, default: 'scores' ) –

    Metric to use to select the best classifier. Defaults to 'min_r2'.

  • highest_best
    (bool, default: True ) –

    Whether higher values of the metric are better. Defaults to True.

Returns:

  • FACSIMILE ( FACSIMILE ) –

    Best classifier.

Source code in facsimile/eval.py
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
def get_best_classifier_max_items(
    self,
    max_items: int = 100,
    metric: str = "scores",
    highest_best: bool = True,
) -> FACSIMILE:
    """
    Get the best classifier based on the optimisation results, subject to a
    maximum number of items being included. For example, if `max_items ==
    100`, the best classifier with `100` or fewer items will be returned.

    Args:
        max_items (int, optional): Maximum number of items. Defaults to
            `100`.
        metric (str, optional): Metric to use to select the best
            classifier. Defaults to `'min_r2'`.
        highest_best (bool, optional): Whether higher values of the metric
            are better. Defaults to `True`.

    Returns:
        FACSIMILE: Best classifier.
    """

    if not hasattr(self, "results_"):
        raise ValueError(
            "Optimisation results not available. Please run fit() first."
        )

    if metric not in self.results_.columns:
        raise ValueError(
            "Metric not available. Please choose from: {}".format(
                ", ".join(self.results_.columns)
            )
        )

    # Get index of best classifier
    results_subset = self.results_[
        self.results_["n_items"] <= max_items
    ]
    best_idx = results_subset[results_subset["n_items"] <= max_items][
        metric
    ].argmax()

    # Get alpha values for best classifier
    best_alphas = results_subset.iloc[best_idx][
        [i for i in results_subset.columns if i.startswith("alpha")]
    ].values

    # Print out information about this classifier
    print("Best classifier:")
    print(
        r"Minimum R^2: {value}".format(
            value=results_subset.iloc[best_idx]["min_r2"]
        )
    )
    print(
        r"Number of included items: {value}".format(
            value=results_subset.iloc[best_idx]["n_items"]
        )
    )

    # Set up model
    clf = FACSIMILE(alphas=best_alphas)

    return clf

get_best_classifier_n_items

get_best_classifier_n_items(n_items: int = 100, metric: str = 'scores', highest_best: bool = True) -> FACSIMILE

Get the best classifier based on the optimisation results with a specific number of items. For example, if n_items = 100, the best classifier exactly 100 items will be returned.

NOTE: The optimisation procedure is stochastic, so it is possible that there may not be a classifier with exactly the number of items specified. In this case, an error will be raised.

Parameters:

  • n_items
    (int, default: 100 ) –

    Number of items. Defaults to 100.

  • metric
    (str, default: 'scores' ) –

    Metric to use to select the best classifier. Defaults to 'min_r2'.

  • highest_best
    (bool, default: True ) –

    Whether higher values of the metric are better. Defaults to True.

Returns:

  • FACSIMILE ( FACSIMILE ) –

    Best classifier.

Source code in facsimile/eval.py
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
def get_best_classifier_n_items(
    self,
    n_items: int = 100,
    metric: str = "scores",
    highest_best: bool = True,
) -> FACSIMILE:
    """
    Get the best classifier based on the optimisation results with a
    specific number of items. For example, if `n_items = 100`, the best
    classifier exactly `100` items will be returned.

    > **NOTE**: The optimisation procedure is stochastic, so it is possible
    that there may not be a classifier with exactly the number of items
    specified. In this case, an error will be raised.

    Args:
        n_items (int, optional): Number of items. Defaults to `100`.
        metric (str, optional): Metric to use to select the best
            classifier. Defaults to `'min_r2'`.
        highest_best (bool, optional): Whether higher values of the metric
            are better. Defaults to `True`.

    Returns:
        FACSIMILE: Best classifier.
    """

    if not hasattr(self, "results_"):
        raise ValueError(
            "Optimisation results not available. Please run fit() first."
        )

    if metric not in self.results_.columns:
        raise ValueError(
            "Metric not available. Please choose from: {}".format(
                ", ".join(self.results_.columns)
            )
        )

    if n_items not in self.results_["n_items"].values:
        # Get the closest number of items
        closest_n_items = self.results_["n_items"].values[
            np.argmin(
                np.abs(self.results_["n_items"].values - n_items)
            )
        ]
        raise ValueError(
            f"No classifier with exactly {n_items} items. Closest "
            f"number of items is {closest_n_items}."
        )

    # Get index of best classifier
    results_subset = self.results_[self.results_["n_items"] == n_items]
    best_idx = results_subset[results_subset["n_items"] == n_items][
        metric
    ].argmax()

    # Get alpha values for best classifier
    best_alphas = results_subset.iloc[best_idx][
        [i for i in results_subset.columns if i.startswith("alpha")]
    ].values

    # Set up model
    clf = FACSIMILE(alphas=best_alphas)

    return clf

get_classifier_by_metric

get_classifier_by_metric(metric_threshold: float, metric: str = 'min_r2', n_items_metric: str = 'n_items', highest_best: bool = True) -> FACSIMILE

Get the classifier with the lowest number of items, subject to a threshold on a specified metric. Allows flexibility in whether the highest or lowest value of the metric is considered better.

Parameters:

  • metric_threshold
    (float) –

    Minimum acceptable value for the provided metric.

  • metric
    (str, default: 'min_r2' ) –

    Metric to filter classifiers. Defaults to 'min_r2'.

  • n_items_metric
    (str, default: 'n_items' ) –

    Metric to determine the lowest number of items. Defaults to 'n_items'.

  • highest_best
    (bool, default: True ) –

    Whether higher values of the metric are better. Defaults to True.

Returns:

  • FACSIMILE ( FACSIMILE ) –

    Classifier with the lowest number of items that satisfies the metric threshold.

Source code in facsimile/eval.py
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
def get_classifier_by_metric(
    self,
    metric_threshold: float,
    metric: str = "min_r2",
    n_items_metric: str = "n_items",
    highest_best: bool = True,
) -> FACSIMILE:
    """
    Get the classifier with the lowest number of items, subject to a threshold
    on a specified metric. Allows flexibility in whether the highest or lowest
    value of the metric is considered better.

    Args:
        metric_threshold (float): Minimum acceptable value for the provided metric.
        metric (str, optional): Metric to filter classifiers. Defaults to 'min_r2'.
        n_items_metric (str, optional): Metric to determine the lowest number of items.
            Defaults to 'n_items'.
        highest_best (bool, optional): Whether higher values of the metric are better.
            Defaults to `True`.

    Returns:
        FACSIMILE: Classifier with the lowest number of items that satisfies the metric threshold.
    """

    if not hasattr(self, "results_"):
        raise ValueError(
            "Optimisation results not available. Please run fit() first."
        )

    if (
        metric not in self.results_.columns
        or n_items_metric not in self.results_.columns
    ):
        raise ValueError(
            "Metric not available. Please choose from: {}".format(
                ", ".join(self.results_.columns)
            )
        )

    # Filter based on the given metric threshold
    if highest_best:
        results_subset = self.results_[
            self.results_[metric] >= metric_threshold
        ]
    else:
        results_subset = self.results_[
            self.results_[metric] <= metric_threshold
        ]

    # Check if the filtered subset is empty
    if results_subset.empty:
        raise ValueError(
            f"No classifiers meet the {metric} threshold of {metric_threshold}."
        )

    # Get index of classifier with the lowest n_items (or other n_items_metric)
    best_idx = results_subset[n_items_metric].argmin()

    # Get alpha values for the selected classifier
    best_alphas = results_subset.iloc[best_idx][
        [i for i in results_subset.columns if i.startswith("alpha")]
    ].values

    # Print out information about this classifier
    print(
        "Classifier with the lowest {n_metric}:".format(
            n_metric=n_items_metric
        )
    )
    print(
        r"{metric}: {value}".format(
            metric=metric, value=results_subset.iloc[best_idx][metric]
        )
    )
    print(
        r"Number of included items: {value}".format(
            value=results_subset.iloc[best_idx][n_items_metric]
        )
    )

    # Set up model
    clf = FACSIMILE(alphas=best_alphas)

    return clf

plot_results

plot_results(degree: Optional[int] = 3, figsize: Tuple[int, int] = (10, 6), cmap: Optional[str] = None, remove_duplicates: Optional[bool] = False, show_legend: Optional[bool] = True, scatter_kws: Optional[Dict] = None, line_kws: Optional[Dict] = None, figure_kws: Optional[Dict] = None, ax: Optional[Axes] = None) -> None

Plots the results of the optimization procedure, showing the R^2 values for each target variable as a function of the number of items included.

Parameters:

  • degree
    (Optional[int], default: 3 ) –

    The degree of the polynomial for regression fitting. If None, no line is fitted or plotted. Defaults to 3.

  • figsize
    (Tuple[int, int], default: (10, 6) ) –

    The size of the figure to be plotted. Defaults to (10,6).

  • cmap
    (Optional[str], default: None ) –

    The name of a colormap to generate colors for scatter points and lines. If None, uses the Matplotlib default color cycle. Defaults to None.

  • remove_duplicates
    (Optional[bool], default: False ) –

    Whether to remove duplicate values of the number of items. Defaults to False.

  • show_legend
    (Optional[bool], default: True ) –

    Whether to show the legend.

  • scatter_kws
    (Optional[Dict], default: None ) –

    Additional keyword arguments for plt.scatter. Defaults to None.

  • line_kws
    (Optional[Dict], default: None ) –

    Additional keyword arguments for plt.plot. Defaults to None.

  • figure_kws
    (Optional[Dict], default: None ) –

    Additional keyword arguments for plt.figure. Defaults to None.

  • ax
    (Optional[Axes], default: None ) –

    An optional axis object to plot on. If None, a new figure and axis will be created. Defaults to None.

Returns:

  • None ( None ) –

    Displays the plot.

Source code in facsimile/eval.py
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
def plot_results(
    self,
    degree: Optional[int] = 3,
    figsize: Tuple[int, int] = (10, 6),
    cmap: Optional[str] = None,
    remove_duplicates: Optional[bool] = False,
    show_legend: Optional[bool] = True,
    scatter_kws: Optional[Dict] = None,
    line_kws: Optional[Dict] = None,
    figure_kws: Optional[Dict] = None,
    ax: Optional[plt.Axes] = None,
) -> None:
    """
    Plots the results of the optimization procedure, showing the R^2 values
    for each target variable as a function of the number of items included.

    Args:
        degree (Optional[int], optional): The degree of the polynomial for
            regression fitting. If `None`, no line is fitted or plotted.
            Defaults to `3`.
        figsize (Tuple[int, int], optional): The size of the figure
            to be plotted. Defaults to `(10,6)`.
        cmap (Optional[str], optional): The name of a colormap to generate
            colors for scatter points and lines. If `None`, uses the
            Matplotlib default color cycle. Defaults to `None`.
        remove_duplicates (Optional[bool], optional): Whether to remove
            duplicate values of the number of items. Defaults to `False`.
        show_legend (Optional[bool], optional): Whether to show the legend.
        scatter_kws (Optional[Dict], optional): Additional keyword
            arguments for `plt.scatter`. Defaults to `None`.
        line_kws (Optional[Dict], optional): Additional keyword arguments
            for `plt.plot`. Defaults to `None`.
        figure_kws (Optional[Dict], optional): Additional keyword arguments
            for `plt.figure`. Defaults to `None`.
        ax (Optional[plt.Axes], optional): An optional axis object to plot on.
            If None, a new figure and axis will be created. Defaults to `None`.

    Returns:
        None: Displays the plot.
    """
    df = self.results_.copy()

    # Remove duplicates if specified
    if remove_duplicates:
        df = df.drop_duplicates(subset="n_items")

    # Set up default keyword arguments
    scatter_kws = (
        {"alpha": 0.6} if scatter_kws is None else scatter_kws
    )
    line_kws = {} if line_kws is None else line_kws
    figure_kws = {} if figure_kws is None else figure_kws

    # Create a new figure if no axis is provided
    if ax is None:
        fig, ax = plt.subplots(figsize=figsize, **figure_kws)

    # Inferring Y variables from DataFrame columns
    y_vars = [col for col in df.columns if col.startswith("r2_")]

    # Getting the colormap if provided
    if cmap:
        colors = cm.get_cmap(cmap, len(y_vars))

    for i, y_var in enumerate(y_vars):
        color = colors(i) if cmap else None
        # Scatter plot for each Y variable
        ax.scatter(
            df["n_items"],
            df[y_var],
            label=f'{y_var.split("r2_")[1]}',
            color=color,
            **scatter_kws,
        )

        if degree is not None:
            # Fit the model
            p = Polynomial.fit(df["n_items"], df[y_var], degree)

            # Plot the regression line for each Y variable
            x = np.linspace(
                df["n_items"].min(), df["n_items"].max(), 400
            )
            y = p(x)
            ax.plot(x, y, linewidth=2, color=color, **line_kws)

    # Labeling the plot
    ax.set_xlabel("Number of items")
    ax.set_ylabel(r"$R^2$")
    if show_legend:
        legend = ax.legend()

        # Update alpha for legend handles
        for lh in legend.legend_handles:
            lh.set_alpha(1)  # Set alpha to 1

    plt.tight_layout()
    if ax is None:
        plt.show()

calculate_score

calculate_score(r2: Union[ndarray, list], n_included_items: int, n_features: int) -> float

Calculate the score accounting for the number of included items and minimum r2.

Parameters:

  • r2

    (Union[ndarray, list]) –

    Array or list of r2 values.

  • n_included_items

    (int) –

    Number of included items in the classifier.

  • n_features

    (int) –

    Number of features in the training data.

Returns:

  • float ( float ) –

    Calculated score.

Source code in facsimile/eval.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def calculate_score(
    r2: Union[np.ndarray, list], n_included_items: int, n_features: int
) -> float:
    """
    Calculate the score accounting for the number of included items and minimum
    r2.

    Args:
        r2 (Union[np.ndarray, list]): Array or list of r2 values.
        n_included_items (int): Number of included items in the classifier.
        n_features (int): Number of features in the training data.

    Returns:
        float: Calculated score.
    """
    r2_array = np.array(r2)
    score = np.min(r2_array) * (1 - n_included_items / n_features)
    return score

evaluate_facsimile

evaluate_facsimile(X_train: Union[DataFrame, ArrayLike], y_train: Union[DataFrame, ArrayLike], X_val: Union[DataFrame, ArrayLike], y_val: Union[DataFrame, ArrayLike], alphas: Tuple[float], fit_intercept: bool = True, additional_metrics: Optional[Dict[str, callable]] = None) -> Tuple[float, ndarray, int]

Evaluate the item reduction model for a given set of alphas.

The overall score is defined as the minimum R^2 value across the target variables, multiplied by 1 minus the number of included items divided by the total number of items. This ensures that it selects a model with a good fit, but also with a small number of items.

Parameters:

  • X_train

    (Union[DataFrame, ArrayLike]) –

    Item responses for training.

  • y_train

    (Union[DataFrame, ArrayLike]) –

    Target scores for training.

  • X_val

    (Union[DataFrame, ArrayLike]) –

    Item responses for validation.

  • y_val

    (Union[DataFrame, ArrayLike]) –

    Target scores for validation.

  • alphas

    (Tuple[float]) –

    Alpha values for the targets.

  • fit_intercept

    (bool, default: True ) –

    Whether to fit an intercept. Defaults to True.

  • additional_metrics

    (Optional[Dict[str, callable]], default: None ) –

    Dictionary of additional metrics to calculate. These should be supplied as functions that take the true and predicted values as arguments and return a single value. Defaults to None.

Returns:

  • Tuple[float, ndarray, int]

    Tuple[float, np.ndarray, int]: Tuple containing the score, R2 and number of included items.

Source code in facsimile/eval.py
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
def evaluate_facsimile(
    X_train: Union[pd.DataFrame, ArrayLike],
    y_train: Union[pd.DataFrame, ArrayLike],
    X_val: Union[pd.DataFrame, ArrayLike],
    y_val: Union[pd.DataFrame, ArrayLike],
    alphas: Tuple[float],
    fit_intercept: bool = True,
    additional_metrics: Optional[Dict[str, callable]] = None,
) -> Tuple[float, np.ndarray, int]:
    """
    Evaluate the item reduction model for a given set of alphas.

    The overall score is defined as the minimum R^2 value across the target
    variables, multiplied by 1 minus the number of included items divided by
    the total number of items. This ensures that it selects a model with a good
    fit, but also with a small number of items.

    Args:
        X_train (Union[pd.DataFrame, ArrayLike]): Item responses for
            training.
        y_train (Union[pd.DataFrame, ArrayLike]): Target scores for
            training.
        X_val (Union[pd.DataFrame, ArrayLike]): Item responses for
            validation.
        y_val (Union[pd.DataFrame, ArrayLike]): Target scores for
            validation.
        alphas (Tuple[float]): Alpha values for the targets.
        fit_intercept (bool, optional): Whether to fit an intercept.
            Defaults to `True`.
        additional_metrics (Optional[Dict[str, callable]], optional):
            Dictionary of additional metrics to calculate. These should be
            supplied as functions that take the true and predicted values as
            arguments and return a single value. Defaults to `None`.

    Returns:
        Tuple[float, np.ndarray, int]: Tuple containing the score, R2 and
            number of included items.
    """

    # Set up model
    clf = FACSIMILE(alphas=alphas, fit_intercept=fit_intercept)

    # Dictionary to store metrics
    metrics = {}

    # Fit and predict
    try:
        clf.fit(X_train, y_train)

        pred_val = clf.predict(X_val)

        # Get R2 for each variable
        r2 = r2_score(y_val, pred_val, multioutput="raw_values")

        # Add r2 to metrics
        metrics["r2"] = r2

        # Get other metrics
        if additional_metrics is not None:
            for metric_name, metric_func in additional_metrics.items():
                metric_value = metric_func(y_val, pred_val)
                metrics[metric_name] = metric_value

        # Store number included items
        n_items = clf.n_included_items

        # Get score accounting for n_included_items and minumum r2
        score = calculate_score(r2, clf.n_included_items, X_train.shape[1])

        # Add score to metrics
        metrics["score"] = score

    except Exception as e:
        print("WARNING: Fitting failed. Error:")
        print(e)
        n_items = np.nan
        metrics = {
            k: np.nan for k in ["score", "r2"] + list(metrics.keys())
        }

    return metrics, n_items