Skip to content

Language Processing Tools

Language processing tools

Open-source tools for processing language data can be challenging to set up and use in a shared environment. Described here are some such tools available on the VACC.

Meta's Llama-4 multi-modal large-language model

Instructions for setting up your user for using the Llama-4 models that are installed on the VACC, and a number of example job scripts to run the models on various GPU combinations, are available at our llama-4-setup GitLab repository.

OpenAI's Whisper via noScribe

noScribe is a tool for transcribing audio that has been installed on the VACC as a module. It has a graphical interface, leverages OpenAI's open-source Whisper model, and provides diarization (speaker identification).

To use the tool, start an Open OnDemand single-GPU Desktop session (all GPUs are supported except RTX6000), and once on the desktop, open a Terminal and issue the command $ module load noscribe and then $ noscribe.

A guide to using noScribe is available here.