Integration of the Dataverse with Galaxy
Galaxy users can now connect to a Dataverse as a repository source, browse and search datasets directly from the Upload dialog, import files into histories and utilize them for scientific analyses.
-
Integration of Dataverse with Galaxy
- Pre-configured Dataverses in Galaxy
-
Configuring a private dataverse
-
- 1) Create an account on Harvard Dataverse and acquire an API token.
- 2) Create a private Dataverse on Harvard Dataverse
- 3) Add datasets to the newly created Dataverse
- 4) Browse the Harvard Dataverse from within Galaxy's file uploader
- 5) Select the newly created "Training data" repository from the uploader and import the underlying dataset into Galaxy history
-
- Significance for reproducible Galaxy tools and workflows
Integration of Dataverse with Galaxy
A common bottleneck in reproducible research is moving data between analysis platforms and FAIR repositories: downloading files locally, re-uploading them to Galaxy, and later repeating the process to publish results back into a repository. We will walk through an example dataverse - Barcelona Supercomputing Center (BSC) Dataverse showcasing its usage from within Galaxy. The public datasets inside the dataverse become directly accessible using the Galaxy's dataverse file source plugin. For showing the usage of private data repositories, we will use Harvard Dataverse.
With Galaxy’s dataverse integration, you can connect a dataverse instance (here: BSC) Dataverse as a file source and then:
- Browse and search dataverse datasets/collections from Galaxy
- Import files directly into a Galaxy history (as normal datasets)
This keeps your analyses reproducible while reducing manual “download/upload loops”.
Pre-configured Dataverses in Galaxy
Note: You can either use the pre-configured Dataverse repositories or create your own. You can find all preconfigured Dataverses by clicking Upload in your activity bar, then Choose from repository, and typing Dataverse in the search bar. There, you can see all preconfigured Dataverses. This is sufficient if you want to reuse existing datasets from an already preconfigured Dataverse. You can continue uploading data directly to your Galaxy history.
Preconfigured Dataverses in Galaxy:
Configure your own Dataverse in Galaxy
Get your BSC Dataverse API token
To access private content and to enable uploads from dataverse, you’ll need an API token from your dataverse account. First create an account on BSC Dataverse. Then, create an API token.
Create your own BSC Dataverse repository in Galaxy
Galaxy stores external repositories under User Preferences → My Repositories. This lets you reuse the same repository across uploads, tools, and workflows.
1) Create a new repository
Go to User Preferences → My Repositories → Create. From the repository options, choose Dataverse.
2) Configure the dataverse connection
Fill in the Dataverse settings:
- Name: a label for your repository (e.g.,
BSC Dataverse) - Dataverse instance endpoint:
https://dataverse.bsc.es/ - Allow Galaxy to export data to Dataverse?: set to Yes if you want to upload from Galaxy to dataverse
- Publication Name: creator name used in dataset metadata
- API Token: paste the token from your dataverse account
3) Confirm the repository was created
After saving, your new entry appears in My Repositories.
Import datasets from BSC Dataverse into a Galaxy history
Step-by-step: browse/search and import a file
1) Open “Galaxy Upload” and choose “Choose from repository”
From the Galaxy Upload dialog, click Choose from repository and select your configured dataverse source (e.g., BSC Dataverse).
2) Browse Dataverse collections/datasets
You can navigate folders/collections and list available datasets.
3) Select dataset(s) and import
If you are, for example, looking for scans of Medieval texts, search for "handwritten" and select the dataset: A Dataset for Handwritten Text Recognition in Medieval Notarial Charters Written on Parchment by clicking on Select.
Export from Dataverse to Galaxy history
1) Add files to the upload queue
Select the files to be uploaded to Galaxy history and add the file(s) to the Galaxy upload queue. Once they turn green you can start working with them.
2) Verify the dataset is available in your Galaxy history
Once imported (when it turns green), the file appears as a normal Galaxy dataset (usable as tool input, workflow input, or for sharing). You can take a look at it by clicking on the "eye-icon" on the right side of the dataset.
Configuring a private dataverse
For working with private datasets, we will use Harvard Dataverse. We will repeat the same steps as we did for creating and configuring BSC Dataverse.
1) Create an account on Harvard Dataverse and acquire an API token.
2) Create a private Dataverse on Harvard Dataverse
3) Add datasets to the newly created Dataverse
4) Browse the Harvard Dataverse from within Galaxy's file uploader
5) Select the newly created "Training data" repository from the uploader and import the underlying dataset into Galaxy history
The datasets imported from public and private dataverses can be directly used in any suitable Galaxy tool or workflow or for sharing.
Significance for reproducible Galaxy tools and workflows
Connecting Dataverse as a direct repository source in Galaxy makes it easier to operationalize FAIR data management:
- Direct repository access in the Galaxy Upload (browse + search).
- Less manual file handling (fewer downloads/re-uploads).
- Clear provenance: imported files become explicit history items.
- Sharing and reusablility: one repository configuration can power many analyses and workflows.
- Easier sharing of your own data After you have analysed your material in Galaxy, you can easily push it to Dataverse.