Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate ClinVar Test API #6

Open
korikuzma opened this issue May 1, 2024 · 16 comments
Open

Evaluate ClinVar Test API #6

korikuzma opened this issue May 1, 2024 · 16 comments
Assignees
Labels
Epic requirement A requirement for the project

Comments

@korikuzma
Copy link
Collaborator

korikuzma commented May 1, 2024

GitHub repo: https://github.com/ncbi/clinvar/tree/master/submission_api_schema
Test endpoint: https://submit.ncbi.nlm.nih.gov/apitest/v1/submissions
Submission API documentation: https://www.ncbi.nlm.nih.gov/clinvar/docs/api_http/

@ahwagner said:

  • Focus should be on what submission / response looks like; keeping in mind how we will track submissions and reference them for status / revision requests.
  • We also want to check that the data appears in the test instance as expected following submission.
  • If time allows, we should try sending malformed data to see what failed validation looks like.

Additional notes:

  • Test submissions get loaded to a pre-production server, so they'll never go public in ClinVar
  • Submissions are not purged on this server, but the processed data (variants, conditions, etc) is refreshed
  • Can get status reports, errors, etc even if data is refreshed on this pre-production server
@korikuzma korikuzma added Epic requirement A requirement for the project labels May 1, 2024
@korikuzma korikuzma self-assigned this May 1, 2024
@korikuzma
Copy link
Collaborator Author

korikuzma commented May 3, 2024

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533

For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

@korikuzma
Copy link
Collaborator Author

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533

For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

On a related note, you are only required to provide the name of the drug. However, in other places such as providing disease/gene it allows you to provide database identifiers (array of items) OR name as a string. I'm curious why they didn't follow a similar approach here.

@wesleygoar
Copy link
Collaborator

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533
For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

On a related note, you are only required to provide the name of the drug. However, in other places such as providing disease/gene it allows you to provide database identifiers (array of items) OR name as a string. I'm curious why they didn't follow a similar approach here.

That's odd that they didn't apply a consistent approach everywhere.

@korikuzma
Copy link
Collaborator Author

It also appears that the spreadsheets allow for more info to be submitted. For instance, in the spreadsheets there's a Variant - more section with information such as Variation identifiers, Alternate designations, and URL but I cannot seem to find this information in the apitest jsonschema.

@jsstevenson
Copy link
Collaborator

I cannot seem to find this information in the apitest jsonschema.

Is this because "Apparently, the NCBI ClinVar server has no tight coupling to the JSON schemas and these schemas are mostly for informative purposes."?

@korikuzma
Copy link
Collaborator Author

I cannot seem to find this information in the apitest jsonschema.

Is this because "Apparently, the NCBI ClinVar server has no tight coupling to the JSON schemas and these schemas are mostly for informative purposes."?

Ugh

@korikuzma
Copy link
Collaborator Author

I cannot seem to find this information in the apitest jsonschema.

Is this because "Apparently, the NCBI ClinVar server has no tight coupling to the JSON schemas and these schemas are mostly for informative purposes."?

Ugh

I was just hoping I was blind

@ahwagner
Copy link
Member

ahwagner commented May 3, 2024

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533
For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

On a related note, you are only required to provide the name of the drug. However, in other places such as providing disease/gene it allows you to provide database identifiers (array of items) OR name as a string. I'm curious why they didn't follow a similar approach here.

We should compile a list of questions like this and send along to our ClinVar contacts for clarification.

@korikuzma
Copy link
Collaborator Author

@ahwagner sounds good. Wanted to run this by y'all before doing so at least

@korikuzma
Copy link
Collaborator Author

I don't think there's a way to submit our evidence / evidence line data (using ClinGen/CGC/VICC SOP onco codes) at the moment. I'm only seeing a citation field and it's pretty limited.

@ahwagner had said "The VCI folks have ended up encoding pathogenicity codes as free-text string comments"

@korikuzma
Copy link
Collaborator Author

By the way, I have not been testing the API directly yet. I was manually creating submission requests for a CIViC EID and VarCat assertion for test fixtures. I sent a request for a service account this morning. Once approved, we'll get our API key and we will be able to test the submission API.

@ahwagner
Copy link
Member

ahwagner commented May 3, 2024

I don't think there's a way to submit our evidence / evidence line data (using ClinGen/CGC/VICC SOP onco codes) at the moment. I'm only seeing a citation field and it's pretty limited.

@ahwagner had said "The VCI folks have ended up encoding pathogenicity codes as free-text string comments"

To be clear, the VCI folks flatten their data structure to accommodate the ClinVar model in this way; they natively encode a rich ev/prov structure similar to VarCat. Here's an example record from ClinGen/VCI in ClinVar:
image

@korikuzma
Copy link
Collaborator Author

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533
For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

On a related note, you are only required to provide the name of the drug. However, in other places such as providing disease/gene it allows you to provide database identifiers (array of items) OR name as a string. I'm curious why they didn't follow a similar approach here.

How the CIViC team handles this: "We just turn substitutes into separate AIDs"

@korikuzma
Copy link
Collaborator Author

If time allows, we should try sending malformed data to see what failed validation looks like.

As I have been performing dry runs on the test API, I've been testing out validation both on accident and on purpose. Here are some examples of output (I haven't been saving the malformed input data):

 'errors': [{'message': "'recordStatus' is a required property",
   'code': None,
   'identifier': None},
  {'message': "Unevaluated properties are not allowed ('description', 'direction', 'id', 'isReportedIn', 'predicate', 'qualifiers', 'specifiedBy', 'strength', 'therapeutic', 'tumorType', 'type', 'variant' were unexpected)",
   'code': None,
   'identifier': None}]}
{'message': 'Validation failed, see errors for detailed description', 'errors': [{'message': "5233 is not of type 'string'", 'code': None, 'identifier': None}]}
{'message': 'Validation failed, see errors for detailed description', 'errors': [{'message': "{'db': 'PubMed', 'id': '25265492'} is not of type 'array'", 'code': None, 'identifier': None}, {'message': "{'gene': [{'id': 2778}]} is not valid under any of the given schemas", 'code': None, 'identifier': None}, {'message': "'Tier I - strong' is not one of ['Tier I - Strong', 'Tier II - Potential', 'Tier III - Unknown', 'Tier IV - Benign/Likely benign']", 'code': None, 'identifier': None}]}

@korikuzma
Copy link
Collaborator Author

korikuzma commented May 10, 2024

We also want to check that the data appears in the test instance as expected following submission. @wesleygoar to review first and once approved, we'll send along to @ahwagner for secondary review

@ahwagner since the test instance doesn't appear to let you see the data, I just added you as a reviewer for #8 and #9 (@wesleygoar approved both) to review the data we would submit. Once approved, I can submit to the test instance

@korikuzma
Copy link
Collaborator Author

#8, #9, #11 will close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic requirement A requirement for the project
Projects
None yet
Development

No branches or pull requests

4 participants