API reference¶

discophon.benchmark ¶

Run the DiscoPhon benchmark on your predictions.

Compute the scores for all languages and splits for which units or features have been extracted.

benchmark_discovery ¶

benchmark_discovery(
    path_dataset: str | Path,
    path_units: str | Path,
    *,
    kind: Literal["many-to-one", "one-to-one"],
    step_units: int = STEP_UNITS,
) -> DataFrame

Benchmark phoneme discovery. Evaluate all languages and splits with available units.

The units should be saved in the directory path_units, in JSONL files named units-{code}-{split}.jsonl with keys file (str) and units (list[int]).

Parameters:

path_dataset (str | Path) –

Path to the DiscoPhon dataset
path_units (str | Path) –

Path to the directory with the predicted units
kind (Literal['many-to-one', 'one-to-one']) –

Kind of assignment. If it is many-to-one, the number of units is set to the default (DEFAULT_N_UNITS). Otherwise, it is set to the number of phonemes plus one.
step_units (int, default: STEP_UNITS ) –

Step between consecutive units (in ms).

Returns:

DataFrame –

DataFrame with the results

benchmark_abx_discrete ¶

benchmark_abx_discrete(
    path_dataset: str | Path,
    path_units: str | Path,
    *,
    kind: Literal["triphone", "phoneme"] = "triphone",
    step_units: int = STEP_UNITS,
) -> DataFrame

ABX on all discrete units available.

The units should be saved in the directory path_units, in JSONL files named units-{code}-{split}.jsonl with keys file (str) and units (list[int]).

Parameters:

path_dataset (str | Path) –

Path to the DiscoPhon dataset
path_units (str | Path) –

Path to the directory with the predicted units
kind (Literal['triphone', 'phoneme'], default: 'triphone' ) –

Kind of representations to use for ABX computation.
step_units (int, default: STEP_UNITS ) –

Step between consecutive units (in ms).

Returns:

DataFrame –

DataFrame with the results

benchmark_abx_continuous ¶

benchmark_abx_continuous(
    path_dataset: str | Path,
    path_features: str | Path,
    *,
    kind: Literal["triphone", "phoneme"] = "triphone",
    step_units: int = STEP_UNITS,
) -> DataFrame

ABX on all continuous features available.

The features should be saved in the directory path_features, in subfolders path_features/{code}/{split}.

Parameters:

path_dataset (str | Path) –

Path to the DiscoPhon dataset
path_features (str | Path) –

Path to the directory with the extracted features
kind (Literal['triphone', 'phoneme'], default: 'triphone' ) –

Kind of representations to use for ABX computation.
step_units (int, default: STEP_UNITS ) –

Step between consecutive features (in ms). The feature frequency will be set to 1_000 // step_units

Returns:

DataFrame –

DataFrame with the results

discophon.evaluate ¶

DiscoPhon evaluation module.

coocurrence_matrix ¶

coocurrence_matrix(
    units: Units,
    phones: Phones,
    *,
    n_units: int,
    n_phonemes: int | None = None,
    step_units: int = STEP_UNITS,
    step_phones: int = STEP_PHONES,
    language: str | Language | None = None,
) -> DataArray

Build the 2D coocurrence matrix of shape (n_phonemes, n_units) as a DataArray.

Parameters:

units (Units) –

Predicted discrete units
phones (Phones) –

Gold phone annotations
n_units (int) –

Number of distinct discrete units in the evaluated system
n_phonemes (int | None, default: None ) –

Number of phonemes in the language under consideration. Either use this argument or language.
step_units (int, default: STEP_UNITS ) –

Step between consecutive units (in ms)
step_phones (int, default: STEP_PHONES ) –

Step between consecutive phones (in ms)
language (str | Language | None, default: None ) –

Evaluated language. Used to infer the number of phonemes if n_phonemes is not set. Do not set both at the same time.

Returns:

DataArray –

2D array for which the element (i, j) is the number of times the unit j has appeared where the underlying phoneme is i. The phonemes are sorted by frequency.

phone_assignments ¶

phone_assignments(
    units: Units, coocurrence: DataArray, *, kind: Literal["many-to-one", "one-to-one"]
) -> Phones

Compute the assigned sequences of phones from units, the coocurrence matrix, and the kind of assignment.

Parameters:

units (Units) –

Predicted discrete units
coocurrence (DataArray) –

Coocurrence matrix between units and the underlying phones, computed with coocurrence_matrix
kind (Literal['many-to-one', 'one-to-one']) –

Kind of assignment.

Returns:

Phones –

Assigned phones with this kind of mapping

phoneme_discovery ¶

phoneme_discovery(
    units: Units,
    phones: Phones,
    *,
    kind: Literal["many-to-one", "one-to-one"],
    n_units: int,
    n_phonemes: int | None = None,
    step_units: int = STEP_UNITS,
    step_phones: int = STEP_PHONES,
    language: str | Language | None = None,
) -> PhonemeDiscoveryEvaluation

Full evaluation of phoneme discovery: PNMI, PER, F1 and R-value boundary detection.

Parameters:

units (Units) –

Predicted discrete units
phones (Phones) –

Gold phone annotations
kind (Literal['many-to-one', 'one-to-one']) –

Kind of assignment
n_units (int) –

Number of distinct discrete units in the evaluated system
n_phonemes (int | None, default: None ) –

Number of phonemes in the language under consideration. Either use this argument or language.
step_units (int, default: STEP_UNITS ) –

Step between consecutive units (in ms)
step_phones (int, default: STEP_PHONES ) –

Step between consecutive phones (in ms)
step_units (int, default: STEP_UNITS ) –

Step between consecutive units (in ms)
language (str | Language | None, default: None ) –

Evaluated language. Used to infer the number of phonemes if n_phonemes is not set. Do not set both at the same time.

Returns:

PhonemeDiscoveryEvaluation –

Phoneme discovery results in a dictionary with keys "pnmi", "per", "f1", and "r_val".

pnmi ¶

pnmi(coocurrence: DataArray) -> float

Compute PNMI.

Parameters:

coocurrence (DataArray) –

Coocurrence matrix between units and the underlying phones, computed with coocurrence_matrix

Returns:

float –

Phone-normalized mutual information (between 0 and 1)

phone_error_rate ¶

phone_error_rate(
    predicted_phones_from_units: Phones, gold_phones: Phones, *, n_jobs: int = -1
) -> float

Phone error rate.

Total edit distances divided by the total length of the target annotations.

Parameters:

predicted_phones_from_units (Phones) –

Predicted phones obtained with phone_assignments
gold_phones (Phones) –

Gold phone annotations
n_jobs (int, default: -1 ) –

The maximum number of concurrently runnings jobs to be passed to joblib.Parallel

Returns:

float –

Phone error rate. Multiply it by 100 to get a percentage.

phone_segmentation ¶

phone_segmentation(
    predicted_phones_from_units: Phones,
    gold_phones: Phones,
    *,
    margin_in_ms: int = 20,
    step_units: int = STEP_UNITS,
    step_phones: int = STEP_PHONES,
) -> SegmentationEvaluation

Phone segmentation evaluation.

Parameters:

predicted_phones_from_units (Phones) –

Predicted phones obtained with phone_assignments
gold_phones (Phones) –

Gold phone annotations
margin_in_ms (int, default: 20 ) –

Left and right margin around each gold boundaries (in ms). Predicted boundaries that fall in the resulting windows are considered correct. If two windows overlap, they are cut to the midpoint.
step_units (int, default: STEP_UNITS ) –

Step between consecutive units (in ms)
step_phones (int, default: STEP_PHONES ) –

Step between consecutive phones (in ms)

Returns:

SegmentationEvaluation –

Instance of a dataclass containing the segmentation results in attributes recall, precision, f1, os, and r_val. Use its describe method to get a summary of the segmentation evaluation.

discophon.abx ¶

ABX discriminability.

We split this part of the evaluation in a separate module because it's optional and takes more time to compute. If you want to use it, install fastabx either with pip install discophon[abx] or pip install fastabx.

discrete_abx ¶

discrete_abx(
    path_item: str | Path,
    path_units: str | Path,
    *,
    frequency: int,
    kind: Literal["triphone", "phoneme"] = "triphone",
) -> TriphoneABX | PhonemeABX

ABX on discrete units.

Parameters:

path_item (str | Path) –

Path to the ABX item file
path_units (str | Path) –

Path to the predicted units: JSONL file with keys file (str) and units (list[int]).
frequency (int) –

Feature frequency in Hz. It is the inverse of the step_units parameter used in other functions.
kind (Literal['triphone', 'phoneme'], default: 'triphone' ) –

Kind of representations to consider. If phoneme, we also compute the ABX in the "any" context condition, if addition of "within" context.

Returns:

TriphoneABX | PhonemeABX –

Dictionary of ABX discriminabilities with keys "within_speaker" and "across_speaker" if kind is "phoneme", and with keys "within_speaker_within_context", "across_speaker_within_context", "within_speaker_any_context", and "across_speaker_any_context" otherwise.

continuous_abx ¶

continuous_abx(
    path_item: str | Path,
    path_features: str | Path,
    *,
    frequency: int,
    kind: Literal["triphone", "phoneme"] = "triphone",
) -> TriphoneABX | PhonemeABX

ABX on continuous representations.

Parameters:

path_item (str | Path) –

Path to the ABX item file
path_features (str | Path) –

Path to the extracted features: folder of .pt files with names corresponding to the file ids.
frequency (int) –

Feature frequency in Hz. It is the inverse of the step_units parameter used in other functions.
kind (Literal['triphone', 'phoneme'], default: 'triphone' ) –

Kind of representations to consider. If phoneme, we also compute the ABX in the "any" context condition, if addition of "within" context.

Returns:

TriphoneABX | PhonemeABX –

Dictionary of ABX discriminabilities with keys "within_speaker" and "across_speaker" if kind is "phoneme", and with keys "within_speaker_within_context", "across_speaker_within_context", "within_speaker_any_context", and "across_speaker_any_context" otherwise.

discophon.prepare ¶

Download and prepare the DiscoPhon benchmark dataset.

download_benchmark ¶

download_benchmark(path_dataset: str | Path) -> None

Download and extract the DiscoPhon dataset.

Parameters:

path_dataset (str | Path) –

Target path to the DiscoPhon dataset.

prepare_commonvoice_datasets ¶

prepare_commonvoice_datasets(path_dataset: str | Path, language: str) -> None

Prepare the Common Voice datasets needed for DiscoPhon by resampling and copying the audio files.

The specific Common Voice data should exist in path_dataset/raw: the audio files are expected to be in path_dataset/raw/${cv_code}/clips where cv_code is the Common Voice specific language code of language.

Parameters:

path_dataset (str | Path) –

Path to the DiscoPhon dataset.
language (str) –

Name of the language of the Common Voice dataset under consideration. Also works with ISO-639-3 code or Common Voice code.

discophon.data ¶

Data loading and writing utilities.

STEP_PHONES `module-attribute` ¶

STEP_PHONES = 10

Constant step in ms between consecutive phone annotations. Override it in function parameters only if you use new annotations built differently.

STEP_UNITS `module-attribute` ¶

STEP_UNITS = 20

Default step in ms between consecutive units. Corresponds to 50 Hz model. Can be overridden easily.

DEFAULT_N_UNITS `module-attribute` ¶

DEFAULT_N_UNITS = 256

Default number of distinct units in the many-to-one evaluation.

Units ¶

Units = dict[str, list[int]]

Type of the discrete units: dictionary mapping file identifiers to lists of integers.

Phones ¶

Phones = dict[str, list[str]]

Type of the gold or predicted phones: dictionary mapping file identifiers to list of strings.

read_gold_annotations ¶

read_gold_annotations(source: str | Path, *, step_in_ms: int = STEP_PHONES) -> Phones

Read the gold annotations and return a mapping between file names to the list of phonemes.

There will be one phone every 10 ms.

Parameters:

source (str | Path) –

Path to the annotations file

Returns:

Phones –

Mapping between file ids and phones

read_submitted_units ¶

read_submitted_units(source: str | Path) -> Units

Read the units from a JSONL file. Must only have fields named file (str) and units (list[int]).

Parameters:

source (str | Path) –

Path to the units file

Returns:

Units –

Mapping between file ids and units

discophon.languages ¶

Language `dataclass` ¶

Language(name: str, iso_639_3: str, split: Literal['dev', 'test'], n_phonemes: int)

The underlying representation of a language.

Parameters:

name (str) –

Name of the language.
iso_639_3 (str) –

Its ISO 639-3 code.
split (Literal['dev', 'test']) –

Which split it belongs to in the benchmark.
n_phonemes (int) –

The number of phoneme categories considered.

phonemes `property` ¶

phonemes: list[str]

The phonemes of this language.

get_language ¶

get_language(n: str | Language) -> Language

Return the language corresponding to this string.

discophon.baselines ¶

Baseline finetuning.

finetune_hubert ¶

finetune_hubert(
    name: str,
    project: str,
    workdir: Path,
    checkpoint: Path,
    manifest: str,
    *,
    n_clusters: int,
    target_layer: int,
) -> None

Finetune HuBERT on DiscoPhon data with the default configuration.

Parameters:

name (str) –

Name of the run
project (str) –

Wandb project
workdir (Path) –

Path to workdir
checkpoint (Path) –

Path to pretrained checkpoint
manifest (str) –

Path to the manifest
n_clusters (int) –

Number of clusters
target_layer (int) –

Target layer

finetune_spidr ¶

finetune_spidr(
    name: str, project: str, workdir: Path, checkpoint: Path, manifest: str
) -> None

Finetune SpidR on DiscoPhon data with the default configuration.

Parameters:

name (str) –

Run name
project (str) –

Run project
workdir (Path) –

Working directory for checkpoints and Wandb logs
checkpoint (Path) –

Path to the pretrained checkpoint
manifest (str) –

Path to the manifest

API reference¶

discophon.benchmark ¶

benchmark_discovery ¶

benchmark_abx_discrete ¶

benchmark_abx_continuous ¶

discophon.evaluate ¶

coocurrence_matrix ¶

phone_assignments ¶

phoneme_discovery ¶

pnmi ¶

phone_error_rate ¶

phone_segmentation ¶

discophon.abx ¶

discrete_abx ¶

continuous_abx ¶

discophon.prepare ¶

download_benchmark ¶

prepare_commonvoice_datasets ¶

discophon.data ¶

STEP_PHONES module-attribute ¶

STEP_UNITS module-attribute ¶

DEFAULT_N_UNITS module-attribute ¶

Units ¶

Phones ¶

read_gold_annotations ¶

read_submitted_units ¶

discophon.languages ¶

Language dataclass ¶

phonemes property ¶

get_language ¶

discophon.baselines ¶

finetune_hubert ¶

finetune_spidr ¶

STEP_PHONES `module-attribute` ¶

STEP_UNITS `module-attribute` ¶

DEFAULT_N_UNITS `module-attribute` ¶

Language `dataclass` ¶

phonemes `property` ¶