API reference¶
discophon.benchmark
¶
Run the DiscoPhon benchmark on your predictions.
Compute the scores for all languages and splits for which units or features have been extracted.
benchmark_discovery
¶
benchmark_discovery(
path_dataset: str | Path,
path_units: str | Path,
*,
kind: Literal["many-to-one", "one-to-one"],
step_units: int = STEP_UNITS,
) -> DataFrame
Benchmark phoneme discovery. Evaluate all languages and splits with available units.
The units should be saved in the directory path_units, in JSONL files
named units-{code}-{split}.jsonl with keys file (str) and units (list[int]).
Parameters:
-
path_dataset(str | Path) –Path to the DiscoPhon dataset
-
path_units(str | Path) –Path to the directory with the predicted units
-
kind(Literal['many-to-one', 'one-to-one']) –Kind of assignment. If it is
many-to-one, the number of units is set to the default (DEFAULT_N_UNITS). Otherwise, it is set to the number of phonemes plus one. -
step_units(int, default:STEP_UNITS) –Step between consecutive units (in ms).
Returns:
-
DataFrame–DataFrame with the results
benchmark_abx_discrete
¶
benchmark_abx_discrete(
path_dataset: str | Path,
path_units: str | Path,
*,
kind: Literal["triphone", "phoneme"] = "triphone",
step_units: int = STEP_UNITS,
) -> DataFrame
ABX on all discrete units available.
The units should be saved in the directory path_units, in JSONL files
named units-{code}-{split}.jsonl with keys file (str) and units (list[int]).
Parameters:
-
path_dataset(str | Path) –Path to the DiscoPhon dataset
-
path_units(str | Path) –Path to the directory with the predicted units
-
kind(Literal['triphone', 'phoneme'], default:'triphone') –Kind of representations to use for ABX computation.
-
step_units(int, default:STEP_UNITS) –Step between consecutive units (in ms).
Returns:
-
DataFrame–DataFrame with the results
benchmark_abx_continuous
¶
benchmark_abx_continuous(
path_dataset: str | Path,
path_features: str | Path,
*,
kind: Literal["triphone", "phoneme"] = "triphone",
step_units: int = STEP_UNITS,
) -> DataFrame
ABX on all continuous features available.
The features should be saved in the directory path_features, in subfolders path_features/{code}/{split}.
Parameters:
-
path_dataset(str | Path) –Path to the DiscoPhon dataset
-
path_features(str | Path) –Path to the directory with the extracted features
-
kind(Literal['triphone', 'phoneme'], default:'triphone') –Kind of representations to use for ABX computation.
-
step_units(int, default:STEP_UNITS) –Step between consecutive features (in ms). The feature frequency will be set to
1_000 // step_units
Returns:
-
DataFrame–DataFrame with the results
discophon.evaluate
¶
DiscoPhon evaluation module.
coocurrence_matrix
¶
coocurrence_matrix(
units: Units,
phones: Phones,
*,
n_units: int,
n_phonemes: int | None = None,
step_units: int = STEP_UNITS,
step_phones: int = STEP_PHONES,
language: str | Language | None = None,
) -> DataArray
Build the 2D coocurrence matrix of shape (n_phonemes, n_units) as a DataArray.
Parameters:
-
units(Units) –Predicted discrete units
-
phones(Phones) –Gold phone annotations
-
n_units(int) –Number of distinct discrete units in the evaluated system
-
n_phonemes(int | None, default:None) –Number of phonemes in the language under consideration. Either use this argument or
language. -
step_units(int, default:STEP_UNITS) –Step between consecutive units (in ms)
-
step_phones(int, default:STEP_PHONES) –Step between consecutive phones (in ms)
-
language(str | Language | None, default:None) –Evaluated language. Used to infer the number of phonemes if
n_phonemesis not set. Do not set both at the same time.
Returns:
-
DataArray–2D array for which the element (
i,j) is the number of times the unitjhas appeared where the underlying phoneme isi. The phonemes are sorted by frequency.
phone_assignments
¶
phone_assignments(
units: Units, coocurrence: DataArray, *, kind: Literal["many-to-one", "one-to-one"]
) -> Phones
Compute the assigned sequences of phones from units, the coocurrence matrix, and the kind of assignment.
Parameters:
-
units(Units) –Predicted discrete units
-
coocurrence(DataArray) –Coocurrence matrix between
unitsand the underlying phones, computed withcoocurrence_matrix -
kind(Literal['many-to-one', 'one-to-one']) –Kind of assignment.
Returns:
-
Phones–Assigned phones with this
kindof mapping
phoneme_discovery
¶
phoneme_discovery(
units: Units,
phones: Phones,
*,
kind: Literal["many-to-one", "one-to-one"],
n_units: int,
n_phonemes: int | None = None,
step_units: int = STEP_UNITS,
step_phones: int = STEP_PHONES,
language: str | Language | None = None,
) -> PhonemeDiscoveryEvaluation
Full evaluation of phoneme discovery: PNMI, PER, F1 and R-value boundary detection.
Parameters:
-
units(Units) –Predicted discrete units
-
phones(Phones) –Gold phone annotations
-
kind(Literal['many-to-one', 'one-to-one']) –Kind of assignment
-
n_units(int) –Number of distinct discrete units in the evaluated system
-
n_phonemes(int | None, default:None) –Number of phonemes in the language under consideration. Either use this argument or
language. -
step_units(int, default:STEP_UNITS) –Step between consecutive units (in ms)
-
step_phones(int, default:STEP_PHONES) –Step between consecutive phones (in ms)
-
language(str | Language | None, default:None) –Evaluated language. Used to infer the number of phonemes if
n_phonemesis not set. Do not set both at the same time.
Returns:
-
PhonemeDiscoveryEvaluation–Phoneme discovery results in a dictionary with keys
"pnmi","per","f1", and"r_val".
pnmi
¶
Compute PNMI.
Parameters:
-
coocurrence(DataArray) –Coocurrence matrix between
unitsand the underlying phones, computed withcoocurrence_matrix
Returns:
-
float–Phone-normalized mutual information (between 0 and 1)
phone_error_rate
¶
phone_error_rate(
predicted_phones_from_units: Phones, gold_phones: Phones, *, n_jobs: int = -1
) -> float
Phone error rate.
Total edit distances divided by the total length of the target annotations.
Parameters:
-
predicted_phones_from_units(Phones) –Predicted phones obtained with
phone_assignments -
gold_phones(Phones) –Gold phone annotations
-
n_jobs(int, default:-1) –The maximum number of concurrently runnings jobs to be passed to
joblib.Parallel
Returns:
-
float–Phone error rate. Multiply it by 100 to get a percentage.
phone_segmentation
¶
phone_segmentation(
predicted_phones_from_units: Phones,
gold_phones: Phones,
*,
margin_in_ms: int = 20,
step_units: int = STEP_UNITS,
step_phones: int = STEP_PHONES,
) -> SegmentationEvaluation
Phone segmentation evaluation.
Parameters:
-
predicted_phones_from_units(Phones) –Predicted phones obtained with
phone_assignments -
gold_phones(Phones) –Gold phone annotations
-
margin_in_ms(int, default:20) –Left and right margin around each gold boundaries (in ms). Predicted boundaries that fall in the resulting windows are considered correct. If two windows overlap, they are cut to the midpoint.
-
step_units(int, default:STEP_UNITS) –Step between consecutive units (in ms)
-
step_phones(int, default:STEP_PHONES) –Step between consecutive phones (in ms)
Returns:
-
SegmentationEvaluation–Instance of a dataclass containing the segmentation results in attributes
recall,precision,f1,os, andr_val. Use itsdescribemethod to get a summary of the segmentation evaluation.
discophon.abx
¶
ABX discriminability.
We split this part of the evaluation in a separate module because it's optional
and takes more time to compute. If you want to use it, install fastabx either
with pip install discophon[abx] or pip install fastabx.
discrete_abx
¶
discrete_abx(
path_item: str | Path,
path_units: str | Path,
*,
frequency: int,
kind: Literal["triphone", "phoneme"] = "triphone",
) -> TriphoneABX | PhonemeABX
ABX on discrete units.
Parameters:
-
path_item(str | Path) –Path to the ABX item file
-
path_units(str | Path) –Path to the predicted units: JSONL file with keys
file(str) andunits(list[int]). -
frequency(int) –Feature frequency in Hz. It is the inverse of the
step_unitsparameter used in other functions. -
kind(Literal['triphone', 'phoneme'], default:'triphone') –Kind of representations to consider. If
phoneme, we also compute the ABX in the "any" context condition, if addition of "within" context.
Returns:
-
TriphoneABX | PhonemeABX–Dictionary of ABX discriminabilities with keys
"within_speaker"and"across_speaker"ifkindis"phoneme", and with keys"within_speaker_within_context","across_speaker_within_context","within_speaker_any_context", and"across_speaker_any_context"otherwise.
continuous_abx
¶
continuous_abx(
path_item: str | Path,
path_features: str | Path,
*,
frequency: int,
kind: Literal["triphone", "phoneme"] = "triphone",
) -> TriphoneABX | PhonemeABX
ABX on continuous representations.
Parameters:
-
path_item(str | Path) –Path to the ABX item file
-
path_features(str | Path) –Path to the extracted features: folder of
.ptfiles with names corresponding to the file ids. -
frequency(int) –Feature frequency in Hz. It is the inverse of the
step_unitsparameter used in other functions. -
kind(Literal['triphone', 'phoneme'], default:'triphone') –Kind of representations to consider. If
phoneme, we also compute the ABX in the "any" context condition, if addition of "within" context.
Returns:
-
TriphoneABX | PhonemeABX–Dictionary of ABX discriminabilities with keys
"within_speaker"and"across_speaker"ifkindis"phoneme", and with keys"within_speaker_within_context","across_speaker_within_context","within_speaker_any_context", and"across_speaker_any_context"otherwise.
discophon.prepare
¶
Download and prepare the DiscoPhon benchmark dataset.
download_benchmark
¶
prepare_commonvoice_datasets
¶
Prepare the Common Voice datasets needed for DiscoPhon by resampling and copying the audio files.
The specific Common Voice data should exist in path_dataset/raw: the audio files are expected to be
in path_dataset/raw/${cv_code}/clips where cv_code is the Common Voice specific language code of language.
Parameters:
discophon.data
¶
Data loading and writing utilities.
STEP_PHONES
module-attribute
¶
Constant step in ms between consecutive phone annotations. Override it in function parameters only if you use new annotations built differently.
STEP_UNITS
module-attribute
¶
Default step in ms between consecutive units. Corresponds to 50 Hz model. Can be overridden easily.
DEFAULT_N_UNITS
module-attribute
¶
Default number of distinct units in the many-to-one evaluation.
Units
¶
Type of the discrete units: dictionary mapping file identifiers to lists of integers.
Phones
¶
Type of the gold or predicted phones: dictionary mapping file identifiers to list of strings.
units_filename
¶
Filename for the predicted units of a (language, split) pair.
alignment_filename
¶
Filename for the gold phone alignment of a (language, split) pair.
item_filename
¶
Filename for the ABX item file of a (language, split, kind) triple.
manifest_filename
¶
Filename for the audio manifest of a (language, split) pair.
discophon.languages
¶
Language
dataclass
¶
The underlying representation of a language.
Parameters:
get_language
¶
Resolve a language identifier to its Language record.
The input is matched case-insensitively against any of the following identifiers:
- the English name (e.g.
"German","Mandarin Chinese"); - the ISO 639-3 code (e.g.
"deu","cmn"); - for languages available on Common Voice, the Common Voice locale code
(e.g.
"sw","zh-CN").
A few languages also accept additional aliases: "mandarin" and "chinese" both
resolve to Mandarin Chinese. Passing an existing Language
instance returns it unchanged.
Parameters:
-
n(str | Language) –Language identifier (name, ISO 639-3 code, Common Voice locale, alias) or an already-resolved
Languageinstance.
Returns:
Raises:
-
ValueError–If
ndoes not match any known identifier.
discophon.baselines
¶
Baseline finetuning.
extract_hubert_continuous_features
¶
extract_hubert_continuous_features(
path_dataset: str | Path,
path_features: str | Path,
language: str,
split: Literal["dev", "test", "train-10min", "train-1h", "train-10h"],
pretrained_model_name_or_path: str | Path,
*,
layers: int | Iterable[int] | None = None,
) -> None
Extract HuBERT continuous features for all utterances of a DiscoPhon split.
For each requested layer, the features are saved as PyTorch tensors at
path_features / {layer} / {iso_639_3} / {split} / {fileid}.pt.
Parameters:
-
path_dataset(str | Path) –Path to the DiscoPhon dataset.
-
path_features(str | Path) –Output directory under which per-layer feature tensors are written.
-
language(str) –Language identifier resolved by
get_language, either name or ISO 639-3 code. -
split(Literal['dev', 'test', 'train-10min', 'train-1h', 'train-10h']) –Dataset split to process.
-
pretrained_model_name_or_path(str | Path) –HuBERT checkpoint or HuggingFace model identifier.
-
layers(int | Iterable[int] | None, default:None) –Layers to extract. If
None, all encoder layers are used.
extract_hubert_discrete_units
¶
extract_hubert_discrete_units(
path_dataset: str | Path,
path_units: str | Path,
language: str,
split: Literal["dev", "test", "train-10min", "train-1h", "train-10h"],
pretrained_model_name_or_path: str | Path,
kmeans_by_layer: dict[int, MiniBatchKMeans],
*,
layers: int | Iterable[int] | None = None,
) -> None
Extract HuBERT discrete units for all utterances of a DiscoPhon split.
For each requested layer, the units are written to a JSONL file at
path_units / {layer} / units-{iso_639_3}-{split}.jsonl, with one entry per
utterance with keys file (str) and units (list[int]).
Parameters:
-
path_dataset(str | Path) –Path to the DiscoPhon dataset.
-
path_units(str | Path) –Output path used as a template. Its parent directory and filename stem determine where the per-layer JSONL files are written.
-
language(str) –Language identifier resolved by
get_language, either name or ISO 639-3 code. -
split(Literal['dev', 'test', 'train-10min', 'train-1h', 'train-10h']) –Dataset split to process.
-
pretrained_model_name_or_path(str | Path) –HuBERT checkpoint or HuggingFace model identifier.
-
kmeans_by_layer(dict[int, MiniBatchKMeans]) –Mapping from 1-based layer index to the K-means model used to quantize that layer.
-
layers(int | Iterable[int] | None, default:None) –Layers to extract. If
None, all encoder layers are used. Only layers present in bothlayersandkmeans_by_layerare written.
finetune_hubert
¶
finetune_hubert(
name: str,
project: str,
workdir: Path,
checkpoint: Path,
manifest: str,
*,
n_clusters: int,
target_layer: int,
) -> None
Finetune HuBERT on DiscoPhon data with the default configuration.
Parameters:
extract_spidr_continuous_features
¶
extract_spidr_continuous_features(
path_dataset: str | Path,
path_features: str | Path,
language: str,
split: Literal["dev", "test", "train-10min", "train-1h", "train-10h"],
checkpoint: str | Path,
*,
layers: int | Iterable[int] | None = None,
) -> None
Extract SpidR continuous features for all utterances of a DiscoPhon split.
For each requested layer, the features are saved as PyTorch tensors at
path_features / {layer} / {iso_639_3} / {split} / {fileid}.pt.
Parameters:
-
path_dataset(str | Path) –Path to the DiscoPhon dataset.
-
path_features(str | Path) –Output directory under which per-layer feature tensors are written.
-
language(str) –Language identifier resolved by
get_language, either name or ISO 639-3 code. -
split(Literal['dev', 'test', 'train-10min', 'train-1h', 'train-10h']) –Dataset split to process.
-
checkpoint(str | Path) –Path to the SpidR checkpoint.
-
layers(int | Iterable[int] | None, default:None) –Layers to extract. If
None, all student layers are used.
extract_spidr_discrete_units
¶
extract_spidr_discrete_units(
path_dataset: str | Path,
path_units: str | Path,
language: str,
split: Literal["dev", "test", "train-10min", "train-1h", "train-10h"],
checkpoint: str | Path,
*,
layers: int | Iterable[int] | None = None,
) -> None
Extract SpidR discrete units for all utterances of a DiscoPhon split.
The units are taken from the codebooks of the student network (one codebook per quantized layer).
For each requested layer, the units are written to a JSONL file at
path_units / {layer} / units-{iso_639_3}-{split}.jsonl, with one entry per
utterance with keys file (str) and units (list[int]).
Parameters:
-
path_dataset(str | Path) –Path to the DiscoPhon dataset.
-
path_units(str | Path) –Output path used as a template. Its parent directory and filename stem determine where the per-layer JSONL files are written.
-
language(str) –Language identifier resolved by
get_language, either name or ISO 639-3 code. -
split(Literal['dev', 'test', 'train-10min', 'train-1h', 'train-10h']) –Dataset split to process.
-
checkpoint(str | Path) –Path to the SpidR checkpoint.
-
layers(int | Iterable[int] | None, default:None) –Layers to extract. If
None, all layers with a codebook are used.