Baselines¶
Usage¶
The baselines and pretraining datasets are available on Hugging Face.
Baselines:
- SpidR MMS-ulab: https://huggingface.co/coml/spidr-mmsulab
- SpidR VP-20: https://huggingface.co/coml/spidr-vp20
- HuBERT MMS-ulab: https://huggingface.co/coml/hubert-base-mmsulab
- HuBERT VP-20: https://huggingface.co/coml/hubert-base-vp20
Datasets:
- MMS-ulab segmented dataset: https://huggingface.co/datasets/coml/mmsulab
- VP-20 dataset: https://huggingface.co/datasets/coml/vp20
Models¶
First install the spidr and
minimal_hubert libraries,
or directly:
SpidR checkpoints¶
import joblib
from spidr.models import SpidR
from torch.hub import load_state_dict_from_url
from torchcodec.decoders import AudioDecoder
state_dict = load_state_dict_from_url("https://huggingface.co/coml/spidr-vp20/resolve/main/final.pt")
model = SpidR().eval()
model.load_state_dict(state_dict)
wav = AudioDecoder("/path/to/file.wav").get_all_samples().data
# Training loss
mask = ... # Set up your boolean mask
loss, _ = model(wav, mask=mask)
# Continuous representations
codebook_predictions = model.get_codebooks(wav) # Log-probs from prediction heads
hidden_states = model.get_intermediate_outputs(wav) # Hidden Transformer states
# Discrete units
layer = 6 # Target layer
# From prediction heads
units_from_heads = codebook_predictions[layer - 1].argmax(-1)
# From intermediate representations, using K-means
kmeans = joblib.load("/path/to/kmeans.joblib")
units_from_interm = kmeans.predict(hidden_states[layer - 1])
HuBERT checkpoints¶
-
With Transformers (check out their documentation for details):
-
With
minimal_hubert:from minimal_hubert import HuBERT, HuBERTPretrain model = HuBERT.from_pretrained("coml/hubert-vp20") model_from_pretraining = HuBERTPretrain.from_pretrained( "https://huggingface.co/coml/hubert-base-vp20/resolve/main/it2.pt" ) # Training loss loss, _ = model_from_pretraining(wav, mask=mask) # Intermediate Transformer representations (same convention as in fairseq) # Use this method if you want to get discrete units using K-means # that were trained in this project or in projects that used fairseq. feats = model.get_intermediate_outputs(wav) # Same as HF transformers, s3prl and torchaudio: # representations are taken at the end of the Transformer layer block # instead of just before the residual. feats_after_residual = model.get_intermediate_outputs(wav, before_residual=False)
Datasets¶
We redistribute both pretraining datasets on HuggingFace Hub. You can access them directly if you have datasets
installed:
from datasets import load_dataset
mmsulab = load_dataset("coml/mmsulab")
vp20 = load_dataset("coml/vp20")
Check out their README for more details on their structure and how they were built.
Replication¶
SpidR pretraining¶
-
Create the following TOML config file to
cfg.toml:[data] manifest = "./manifests/train_manifest.jsonl" [validation] val.manifest = "./manifests/val_manifest.jsonl" [run] workdir = "./workdir" wandb_mode = "offline"# (1)! wandb_project = "discophon" wandb_name = "spidr-pretraining" model_type = "spidr" [run.slurm_validation] nodes = 1 gpus_per_node = 1 qos = "qos_gpu-dev" time = 60 cpus_per_task = 10 constraint = "v100-32g"-
Set to
onlineif you cluster has internet access and you want to log to Weights & Biases
Adapt the paths and SLURM parameters to your setup.
-
-
Launch pretraining with:
python -m spidr ./cfg.toml \ --nodes 4 \ --gpus-per-node 4 \ --cpus-per-task 24 \ --time 1200 \ --constraint h100 \ --dump ./dumpAgain, adapt the SLURM parameters to your setup. This specific command will launch one job on 4 nodes with 4 H100 GPUs each, for 20 hours. The
--dumpargument will specify the directory where to dump the submitit output. -
You're done! Checkpoints will be available in
./workdir/discophon/spidr-pretraining.
HuBERT pretraining¶
Check out
minimal_hubert's README
for easy pretraining. It involves multiple steps, but the pretraining part is very similar to SpidR's.
Finetuning¶
Use the CLI utility:
❯ python -m discophon.baselines --help
usage: python -m discophon.baselines [-h] [--n-clusters N_CLUSTERS] [--layer LAYER]
{hubert,spidr} name project workdir checkpoint manifest
Baseline finetuning of HuBERT or SpidR
positional arguments:
{hubert,spidr} Model architecture
name Run name
project Run project
workdir Working directory for checkpoints and Wandb logs
checkpoint Path to pretrained checkpoint
manifest Manifest file for finetuning
options:
-h, --help show this help message and exit
--n-clusters N_CLUSTERS
Number of target clusters for HuBERT finetuning
--layer LAYER Target layer for HuBERT finetuning used to train the K-means
or use the finetune_spidr
and finetune_hubert functions.
Discrete units¶
Coming soon!