Message "This version has no files of this type" upon refresh but valid file exists

I refreshed two workflows that were previously registered in Dockstore. When I selected the new version (release tag v0.28.4-beta), the Files tab said “This version has no files of this type”. But the files do exist on GitHub at the link from the Info tab, and I validated them with womtool right before refreshing. I tried refreshing again, and waiting an hour and refreshing again, but the error persists. Can you help me identify the problem?

The two workflows are:

  1. EvidenceQC.wdl - on Dockstore (Dockstore) (refer to the Info tab for GitHub link). Interestingly Dockstore also shows no valid file for EvidenceQC.wdl starting from the previous release tag, v0.28.3-beta, but there were no changes to EvidenceQC.wdl or any of its imports between v0.28.2-beta (valid) and v0.28.3-beta.
  2. GenotypeBatch.wdl - on Dockstore (Dockstore) (refer to the Info tab for GitHub link).

Hello,

We’re still trying to figure out exactly what’s going on, but here’s something you can do to sync them in the meantime:

  1. Go to the accounts page and copy your Dockstore token to your clipboard
  2. Go to the refresh workflow version section of our API
  3. Click on the lock icon, and paste the token you copied in step 1.
  4. Click Try it Out, then specify the following parameters
  5. workflowid = 16910 (the id of your workflow)
  6. version = v0.28.4-beta
  7. hardRefresh = true
  8. Click the Execute button

The UI makes the same call, but with hardRefresh = false. In my local tests, that seems to show the right data after it completes.

We’ll follow up later with more, but I wanted to unblock you for now.

Charles

Hi,

Right after I posted the previous message, a query I was running against our logs came back that revealed the issue; you ran into a GitHub rate limit.

When you click the Refresh button for a workflow, we iterate through every branch and tag in the workflow, reading the primary descriptor, then all of its imports, one file at a time, using a GitHub token that was granted to us when you logged into Dockstore.

You have 169 branches and tags in that repo. For each branch, we’re reading the primary descriptor, resolving/reading imports, running through WomTool, etc. It can be slow to process the whole workflow, and there are a lot of API requests going on.

My best guess is that you may have clicked the refresh button while one refresh was running and inadvertently compounded the problem by causing the GitHub API rate limit to be reached, which prevented us from reading certain files.

We should provide better feedback, but we’re also trying to deprecate the way your workflow is registered.

Here are my suggestions going forward with the way the workflow is registered now:

  1. If possible, only refresh the version you know changed. Go to the versions table, and select Refresh Version from the Action menu for the version you changed. I know this can be limiting; it doesn’t catch new GitHub branches/tags, and it’s a pain if multiple branches/tags have been updated. But do it if you know only one branch/tag has been updated in GitHub.
  2. If you refresh the workflow as a whole, be patient. Give it a long time to complete.

The ultimate/ideal solution is to add a .dockstore.yml to your workflow. See for example gatk. Doing it this way is better because:

  1. Updates happen automatically; you don’t have to click the refresh button at all in Dockstore.
  2. We only update the the version for the branch/tag that was updated/added/deleted, so it’s more performant than updating the whole workflow.

Charles

Thank you for your prompt responses! We are working towards switching over to using a .dockstore.yml for this reason but want to be able to update our existing workflows in the meantime.

Your workaround with the API was successful for EvidenceQC - thank you! How can I find the workflow ID number for GenotypeBatch as well? I looked through the workflow page and also tried the getWorkflowByAlias API call, but I guess the inputs I tried for “alias” weren’t right.

I did notice that after refreshing EvidenceQC with the API call, the WDL version column is blank for v0.28.4-beta (it is defined as version 1.0 in the first line of the WDL). Is that expected? Do you anticipate any issues related to that?

Since you identified a GitHub rate limit error, I also tried refreshing GenotypeBatch again normally in case it would work better on a new day. It said that the refresh completed successfully, but v0.28.4-beta is still blank. Would you expect the GitHub rate limit error to occur when refreshing just one workflow one time? I did get an error message saying I had hit the GitHub rate limit one time on Oct 23, but all other times the refresh was reported to be successful (but the file was not found for v0.28.4-beta) - is that still consistent with a rate limit error as the underlying issue, or do you think something else could be going on as well?

Hello,

How can I find the workflow ID number for GenotypeBatch as well?

The id for GenotypeBatch is 18569. To get it, I just opened the browser developer tools, and look for one of the API calls the UI is making. You could also use the API, assuming you have jq installed: curl 'https://dockstore.org/api/workflows/path/workflow/github.com%2Fbroadinstitute%2Fgatk-sv%2F11-GenotypeBatch/published?subclass=BIOWORKFLOW&versionName=v0.28.4-beta' -H 'Accept: application/json' | jq '.id'

I did notice that after refreshing EvidenceQC with the API call, the WDL version column is blank for v0.28.4-beta (it is defined as version 1.0 in the first line of the WDL). Is that expected? Do you anticipate any issues related to that?

I don’t anticipate any issues, other than it not showing up correctly on the versions table; the important thing is that it is marked valid, which it is – we don’t let you export invalid versions to Terra and other platforms. I logged an issue for it.

Would you expect the GitHub rate limit error to occur when refreshing just one workflow one time?

Probably not for this repo. The rate limit is 5,000 API calls per hour. You have 169 tags and branches. Let’s say 10 API calls to refresh each version; you should be good. We did run into issues back in the day with other Broad repos where they had over 700 tags and branches as I recall, and lots of imports.

Since you identified a GitHub rate limit error, I also tried refreshing GenotypeBatch again normally in case it would work better on a new day. It said that the refresh completed successfully, but v0.28.4-beta is still blank.

I think if you do the API with the hardRefresh=true, it will work. When you do it via the UI, hardRefresh is set to false. When we update a version we store the commit we processed; then if a future update is requested again, if we see the commit is the same we skip it. If hardRefresh is true, we process it anyway. So the problem is that we had updated the version with no content because when trying to fetch the content it failed. Our error handling for rate limit errors isn’t ideal. But we’ve been trying to encourage users to move towards the .dockstore.yml model, where we haven’t run into the rate limit errors, both because the GitHub apps have a higher rate limit, and the processing is more efficient.

I logged another issue to followup on this.

Thanks!

Charles

Thank you for your help and explanations! I was able to refresh GenotypeBatch with the API as well, so all set for now, and will move the .dockstore.yml setup higher on our priorities list to avoid future issue.

1 Like