Language Processing Tools
Language processing tools¶
Open-source tools for processing language data can be challenging to set up and use in a shared environment. Described here are some such tools available on the VACC.
Meta's Llama-4 multi-modal large-language model¶
Instructions for setting up your user for using the Llama-4 models that are installed on the VACC, and a number of example job scripts to run the models on various GPU combinations, are available at our llama-4-setup GitLab repository.
OpenAI's Whisper via noScribe¶
noScribe is a tool for transcribing audio that has been installed on the VACC as a module. It has a graphical interface, leverages OpenAI's open-source Whisper model, and provides diarization (speaker identification).
To use the tool, start an Open OnDemand single-GPU Desktop session (all GPUs
are supported except RTX6000), and once on the desktop, open a Terminal
and issue the command $ module load noscribe and then $ noscribe.
A guide to using noScribe is available here.