Skip to content

API reference

discophon.benchmark

Run the DiscoPhon benchmark on your predictions.

Compute the scores for all languages and splits for which units or features have been extracted.

benchmark_discovery

benchmark_discovery(
    path_dataset: str | Path,
    path_units: str | Path,
    *,
    kind: Literal["many-to-one", "one-to-one"],
    step_units: int = STEP_UNITS,
) -> DataFrame

Benchmark phoneme discovery. Evaluate all languages and splits with available units.

The units should be saved in the directory path_units, in JSONL files named units-{code}-{split}.jsonl with keys file (str) and units (list[int]).

Parameters:

  • path_dataset (str | Path) –

    Path to the DiscoPhon dataset

  • path_units (str | Path) –

    Path to the directory with the predicted units

  • kind (Literal['many-to-one', 'one-to-one']) –

    Kind of assignment. If it is many-to-one, the number of units is set to the default (DEFAULT_N_UNITS). Otherwise, it is set to the number of phonemes plus one.

  • step_units (int, default: STEP_UNITS ) –

    Step between consecutive units (in ms).

Returns:

  • DataFrame

    DataFrame with the results

benchmark_abx_discrete

benchmark_abx_discrete(
    path_dataset: str | Path,
    path_units: str | Path,
    *,
    kind: Literal["triphone", "phoneme"] = "triphone",
    step_units: int = STEP_UNITS,
) -> DataFrame

ABX on all discrete units available.

The units should be saved in the directory path_units, in JSONL files named units-{code}-{split}.jsonl with keys file (str) and units (list[int]).

Parameters:

  • path_dataset (str | Path) –

    Path to the DiscoPhon dataset

  • path_units (str | Path) –

    Path to the directory with the predicted units

  • kind (Literal['triphone', 'phoneme'], default: 'triphone' ) –

    Kind of representations to use for ABX computation.

  • step_units (int, default: STEP_UNITS ) –

    Step between consecutive units (in ms).

Returns:

  • DataFrame

    DataFrame with the results

benchmark_abx_continuous

benchmark_abx_continuous(
    path_dataset: str | Path,
    path_features: str | Path,
    *,
    kind: Literal["triphone", "phoneme"] = "triphone",
    step_units: int = STEP_UNITS,
) -> DataFrame

ABX on all continuous features available.

The features should be saved in the directory path_features, in subfolders path_features/{code}/{split}.

Parameters:

  • path_dataset (str | Path) –

    Path to the DiscoPhon dataset

  • path_features (str | Path) –

    Path to the directory with the extracted features

  • kind (Literal['triphone', 'phoneme'], default: 'triphone' ) –

    Kind of representations to use for ABX computation.

  • step_units (int, default: STEP_UNITS ) –

    Step between consecutive features (in ms). The feature frequency will be set to 1_000 // step_units

Returns:

  • DataFrame

    DataFrame with the results

discophon.evaluate

DiscoPhon evaluation module.

coocurrence_matrix

coocurrence_matrix(
    units: Units,
    phones: Phones,
    *,
    n_units: int,
    n_phonemes: int | None = None,
    step_units: int = STEP_UNITS,
    step_phones: int = STEP_PHONES,
    language: str | Language | None = None,
) -> DataArray

Build the 2D coocurrence matrix of shape (n_phonemes, n_units) as a DataArray.

Parameters:

  • units (Units) –

    Predicted discrete units

  • phones (Phones) –

    Gold phone annotations

  • n_units (int) –

    Number of distinct discrete units in the evaluated system

  • n_phonemes (int | None, default: None ) –

    Number of phonemes in the language under consideration. Either use this argument or language.

  • step_units (int, default: STEP_UNITS ) –

    Step between consecutive units (in ms)

  • step_phones (int, default: STEP_PHONES ) –

    Step between consecutive phones (in ms)

  • language (str | Language | None, default: None ) –

    Evaluated language. Used to infer the number of phonemes if n_phonemes is not set. Do not set both at the same time.

Returns:

  • DataArray

    2D array for which the element (i, j) is the number of times the unit j has appeared where the underlying phoneme is i. The phonemes are sorted by frequency.

phone_assignments

phone_assignments(
    units: Units, coocurrence: DataArray, *, kind: Literal["many-to-one", "one-to-one"]
) -> Phones

Compute the assigned sequences of phones from units, the coocurrence matrix, and the kind of assignment.

Parameters:

  • units (Units) –

    Predicted discrete units

  • coocurrence (DataArray) –

    Coocurrence matrix between units and the underlying phones, computed with coocurrence_matrix

  • kind (Literal['many-to-one', 'one-to-one']) –

    Kind of assignment.

Returns:

  • Phones

    Assigned phones with this kind of mapping

phoneme_discovery

phoneme_discovery(
    units: Units,
    phones: Phones,
    *,
    kind: Literal["many-to-one", "one-to-one"],
    n_units: int,
    n_phonemes: int | None = None,
    step_units: int = STEP_UNITS,
    step_phones: int = STEP_PHONES,
    language: str | Language | None = None,
) -> PhonemeDiscoveryEvaluation

Full evaluation of phoneme discovery: PNMI, PER, F1 and R-value boundary detection.

Parameters:

  • units (Units) –

    Predicted discrete units

  • phones (Phones) –

    Gold phone annotations

  • kind (Literal['many-to-one', 'one-to-one']) –

    Kind of assignment

  • n_units (int) –

    Number of distinct discrete units in the evaluated system

  • n_phonemes (int | None, default: None ) –

    Number of phonemes in the language under consideration. Either use this argument or language.

  • step_units (int, default: STEP_UNITS ) –

    Step between consecutive units (in ms)

  • step_phones (int, default: STEP_PHONES ) –

    Step between consecutive phones (in ms)

  • step_units (int, default: STEP_UNITS ) –

    Step between consecutive units (in ms)

  • language (str | Language | None, default: None ) –

    Evaluated language. Used to infer the number of phonemes if n_phonemes is not set. Do not set both at the same time.

Returns:

  • PhonemeDiscoveryEvaluation

    Phoneme discovery results in a dictionary with keys "pnmi", "per", "f1", and "r_val".

pnmi

pnmi(coocurrence: DataArray) -> float

Compute PNMI.

Parameters:

Returns:

  • float

    Phone-normalized mutual information (between 0 and 1)

phone_error_rate

phone_error_rate(
    predicted_phones_from_units: Phones, gold_phones: Phones, *, n_jobs: int = -1
) -> float

Phone error rate.

Total edit distances divided by the total length of the target annotations.

Parameters:

  • predicted_phones_from_units (Phones) –

    Predicted phones obtained with phone_assignments

  • gold_phones (Phones) –

    Gold phone annotations

  • n_jobs (int, default: -1 ) –

    The maximum number of concurrently runnings jobs to be passed to joblib.Parallel

Returns:

  • float

    Phone error rate. Multiply it by 100 to get a percentage.

phone_segmentation

phone_segmentation(
    predicted_phones_from_units: Phones,
    gold_phones: Phones,
    *,
    margin_in_ms: int = 20,
    step_units: int = STEP_UNITS,
    step_phones: int = STEP_PHONES,
) -> SegmentationEvaluation

Phone segmentation evaluation.

Parameters:

  • predicted_phones_from_units (Phones) –

    Predicted phones obtained with phone_assignments

  • gold_phones (Phones) –

    Gold phone annotations

  • margin_in_ms (int, default: 20 ) –

    Left and right margin around each gold boundaries (in ms). Predicted boundaries that fall in the resulting windows are considered correct. If two windows overlap, they are cut to the midpoint.

  • step_units (int, default: STEP_UNITS ) –

    Step between consecutive units (in ms)

  • step_phones (int, default: STEP_PHONES ) –

    Step between consecutive phones (in ms)

Returns:

  • SegmentationEvaluation

    Instance of a dataclass containing the segmentation results in attributes recall, precision, f1, os, and r_val. Use its describe method to get a summary of the segmentation evaluation.

discophon.abx

ABX discriminability.

We split this part of the evaluation in a separate module because it's optional and takes more time to compute. If you want to use it, install fastabx either with pip install discophon[abx] or pip install fastabx.

discrete_abx

discrete_abx(
    path_item: str | Path,
    path_units: str | Path,
    *,
    frequency: int,
    kind: Literal["triphone", "phoneme"] = "triphone",
) -> TriphoneABX | PhonemeABX

ABX on discrete units.

Parameters:

  • path_item (str | Path) –

    Path to the ABX item file

  • path_units (str | Path) –

    Path to the predicted units: JSONL file with keys file (str) and units (list[int]).

  • frequency (int) –

    Feature frequency in Hz. It is the inverse of the step_units parameter used in other functions.

  • kind (Literal['triphone', 'phoneme'], default: 'triphone' ) –

    Kind of representations to consider. If phoneme, we also compute the ABX in the "any" context condition, if addition of "within" context.

Returns:

  • TriphoneABX | PhonemeABX

    Dictionary of ABX discriminabilities with keys "within_speaker" and "across_speaker" if kind is "phoneme", and with keys "within_speaker_within_context", "across_speaker_within_context", "within_speaker_any_context", and "across_speaker_any_context" otherwise.

continuous_abx

continuous_abx(
    path_item: str | Path,
    path_features: str | Path,
    *,
    frequency: int,
    kind: Literal["triphone", "phoneme"] = "triphone",
) -> TriphoneABX | PhonemeABX

ABX on continuous representations.

Parameters:

  • path_item (str | Path) –

    Path to the ABX item file

  • path_features (str | Path) –

    Path to the extracted features: folder of .pt files with names corresponding to the file ids.

  • frequency (int) –

    Feature frequency in Hz. It is the inverse of the step_units parameter used in other functions.

  • kind (Literal['triphone', 'phoneme'], default: 'triphone' ) –

    Kind of representations to consider. If phoneme, we also compute the ABX in the "any" context condition, if addition of "within" context.

Returns:

  • TriphoneABX | PhonemeABX

    Dictionary of ABX discriminabilities with keys "within_speaker" and "across_speaker" if kind is "phoneme", and with keys "within_speaker_within_context", "across_speaker_within_context", "within_speaker_any_context", and "across_speaker_any_context" otherwise.

discophon.prepare

Download and prepare the DiscoPhon benchmark dataset.

download_benchmark

download_benchmark(path_dataset: str | Path) -> None

Download and extract the DiscoPhon dataset.

Parameters:

  • path_dataset (str | Path) –

    Target path to the DiscoPhon dataset.

prepare_commonvoice_datasets

prepare_commonvoice_datasets(path_dataset: str | Path, language: str) -> None

Prepare the Common Voice datasets needed for DiscoPhon by resampling and copying the audio files.

The specific Common Voice data should exist in path_dataset/raw: the audio files are expected to be in path_dataset/raw/${cv_code}/clips where cv_code is the Common Voice specific language code of language.

Parameters:

  • path_dataset (str | Path) –

    Path to the DiscoPhon dataset.

  • language (str) –

    Name of the language of the Common Voice dataset under consideration. Also works with ISO-639-3 code or Common Voice code.

discophon.data

Data loading and writing utilities.

STEP_PHONES module-attribute

STEP_PHONES = 10

Constant step in ms between consecutive phone annotations. Override it in function parameters only if you use new annotations built differently.

STEP_UNITS module-attribute

STEP_UNITS = 20

Default step in ms between consecutive units. Corresponds to 50 Hz model. Can be overridden easily.

DEFAULT_N_UNITS module-attribute

DEFAULT_N_UNITS = 256

Default number of distinct units in the many-to-one evaluation.

Units

Units = dict[str, list[int]]

Type of the discrete units: dictionary mapping file identifiers to lists of integers.

Phones

Phones = dict[str, list[str]]

Type of the gold or predicted phones: dictionary mapping file identifiers to list of strings.

read_gold_annotations

read_gold_annotations(source: str | Path, *, step_in_ms: int = STEP_PHONES) -> Phones

Read the gold annotations and return a mapping between file names to the list of phonemes.

There will be one phone every 10 ms.

Parameters:

  • source (str | Path) –

    Path to the annotations file

Returns:

  • Phones

    Mapping between file ids and phones

read_submitted_units

read_submitted_units(source: str | Path) -> Units

Read the units from a JSONL file. Must only have fields named file (str) and units (list[int]).

Parameters:

  • source (str | Path) –

    Path to the units file

Returns:

  • Units

    Mapping between file ids and units

discophon.languages

Language dataclass

Language(name: str, iso_639_3: str, split: Literal['dev', 'test'], n_phonemes: int)

The underlying representation of a language.

Parameters:

  • name (str) –

    Name of the language.

  • iso_639_3 (str) –

    Its ISO 639-3 code.

  • split (Literal['dev', 'test']) –

    Which split it belongs to in the benchmark.

  • n_phonemes (int) –

    The number of phoneme categories considered.

phonemes property

phonemes: list[str]

The phonemes of this language.

get_language

get_language(n: str | Language) -> Language

Return the language corresponding to this string.

discophon.baselines

Baseline finetuning.

finetune_hubert

finetune_hubert(
    name: str,
    project: str,
    workdir: Path,
    checkpoint: Path,
    manifest: str,
    *,
    n_clusters: int,
    target_layer: int,
) -> None

Finetune HuBERT on DiscoPhon data with the default configuration.

Parameters:

  • name (str) –

    Name of the run

  • project (str) –

    Wandb project

  • workdir (Path) –

    Path to workdir

  • checkpoint (Path) –

    Path to pretrained checkpoint

  • manifest (str) –

    Path to the manifest

  • n_clusters (int) –

    Number of clusters

  • target_layer (int) –

    Target layer

finetune_spidr

finetune_spidr(
    name: str, project: str, workdir: Path, checkpoint: Path, manifest: str
) -> None

Finetune SpidR on DiscoPhon data with the default configuration.

Parameters:

  • name (str) –

    Run name

  • project (str) –

    Run project

  • workdir (Path) –

    Working directory for checkpoints and Wandb logs

  • checkpoint (Path) –

    Path to the pretrained checkpoint

  • manifest (str) –

    Path to the manifest