← Back to news

Integration of the Dataverse with Galaxy

Galaxy users can now connect to a Dataverse as a repository source, browse and search datasets directly from the Upload dialog, import files into histories and utilize them for scientific analyses.

Integration of Dataverse with Galaxy

A common bottleneck in reproducible research is moving data between analysis platforms and FAIR repositories: downloading files locally, re-uploading them to Galaxy, and later repeating the process to publish results back into a repository. We will walk through an example dataverse - Barcelona Supercomputing Center (BSC) Dataverse showcasing its usage from within Galaxy. The public datasets inside the dataverse become directly accessible using the Galaxy's dataverse file source plugin. For showing the usage of private data repositories, we will use Harvard Dataverse.

With Galaxy’s dataverse integration, you can connect a dataverse instance (here: BSC) Dataverse as a file source and then:

  • Browse and search dataverse datasets/collections from Galaxy
  • Import files directly into a Galaxy history (as normal datasets)

This keeps your analyses reproducible while reducing manual “download/upload loops”.

Pre-configured Dataverses in Galaxy

Note: You can either use the pre-configured Dataverse repositories or create your own. You can find all preconfigured Dataverses by clicking Upload in your activity bar, then Choose from repository, and typing Dataverse in the search bar. There, you can see all preconfigured Dataverses. This is sufficient if you want to reuse existing datasets from an already preconfigured Dataverse. You can continue uploading data directly to your Galaxy history.

Preconfigured Dataverses in Galaxy:

Galaxy: preconfigured

Configure your own Dataverse in Galaxy

Get your BSC Dataverse API token

To access private content and to enable uploads from dataverse, you’ll need an API token from your dataverse account. First create an account on BSC Dataverse. Then, create an API token.

BSC Dataverse account page showing the API Token tab and token validity

Create your own BSC Dataverse repository in Galaxy

Galaxy stores external repositories under User Preferences → My Repositories. This lets you reuse the same repository across uploads, tools, and workflows.

1) Create a new repository

Go to User Preferences → My Repositories → Create. From the repository options, choose Dataverse.

Galaxy: Create new repository source options, including Dataverse

2) Configure the dataverse connection

Fill in the Dataverse settings:

  • Name: a label for your repository (e.g., BSC Dataverse)
  • Dataverse instance endpoint: https://dataverse.bsc.es/
  • Allow Galaxy to export data to Dataverse?: set to Yes if you want to upload from Galaxy to dataverse
  • Publication Name: creator name used in dataset metadata
  • API Token: paste the token from your dataverse account
Galaxy: Dataverse file source configuration form (endpoint, export toggle, publication name, API token)

3) Confirm the repository was created

After saving, your new entry appears in My Repositories.

Galaxy: My Repositories list showing a created BSC Dataverse source

Import datasets from BSC Dataverse into a Galaxy history

Step-by-step: browse/search and import a file

1) Open “Galaxy Upload” and choose “Choose from repository”

From the Galaxy Upload dialog, click Choose from repository and select your configured dataverse source (e.g., BSC Dataverse).

Galaxy: repository picker showing available file sources including BSC Dataverse

2) Browse Dataverse collections/datasets

You can navigate folders/collections and list available datasets.

Galaxy: browsing the BSC Dataverse hierarchy from within the upload repository browser

3) Select dataset(s) and import

If you are, for example, looking for scans of Medieval texts, search for "handwritten" and select the dataset: A Dataset for Handwritten Text Recognition in Medieval Notarial Charters Written on Parchment by clicking on Select.

Galaxy: selecting a dataset/collection within BSC Dataverse from the repository browser

Export from Dataverse to Galaxy history

1) Add files to the upload queue

Select the files to be uploaded to Galaxy history and add the file(s) to the Galaxy upload queue. Once they turn green you can start working with them.

Galaxy: Upload dialog showing a file queued for upload to BSC Dataverse

Galaxy: Upload dialog showing a file queued for upload to BSC Dataverse

2) Verify the dataset is available in your Galaxy history

Once imported (when it turns green), the file appears as a normal Galaxy dataset (usable as tool input, workflow input, or for sharing). You can take a look at it by clicking on the "eye-icon" on the right side of the dataset.

Galaxy: history panel showing the imported dataset

Configuring a private dataverse

For working with private datasets, we will use Harvard Dataverse. We will repeat the same steps as we did for creating and configuring BSC Dataverse.

1) Create an account on Harvard Dataverse and acquire an API token.

Harvard Dataverse: API

2) Create a private Dataverse on Harvard Dataverse

Harvard Dataverse: created

3) Add datasets to the newly created Dataverse

Harvard Dataverse: add dataset
Harvard Dataverse: add dataset form
Harvard Dataverse: added dataset

4) Browse the Harvard Dataverse from within Galaxy's file uploader

Harvard Dataverse: added dataset

5) Select the newly created "Training data" repository from the uploader and import the underlying dataset into Galaxy history

Harvard Dataverse: added dataset

The datasets imported from public and private dataverses can be directly used in any suitable Galaxy tool or workflow or for sharing.

Significance for reproducible Galaxy tools and workflows

Connecting Dataverse as a direct repository source in Galaxy makes it easier to operationalize FAIR data management:

  • Direct repository access in the Galaxy Upload (browse + search).
  • Less manual file handling (fewer downloads/re-uploads).
  • Clear provenance: imported files become explicit history items.
  • Sharing and reusablility: one repository configuration can power many analyses and workflows.
  • Easier sharing of your own data After you have analysed your material in Galaxy, you can easily push it to Dataverse.