Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOI Organizer: make it easier to identify offending item + what's wrong with the item #9827

Closed
bram-atmire opened this issue Sep 16, 2024 · 2 comments · Fixed by #9834
Closed
Assignees
Labels
bug error handling How errors are handled and/or logged identifier: DOIs Related to integration with DOIs
Milestone

Comments

@bram-atmire
Copy link
Member

Describe the bug

When the DOI Organizer errors out, a typical error signature looks like:

2024-09-13 06:10:45,583 WARN  unknown unknown org.dspace.identifier.doi.DataCiteConnector @ While reserving the DOI doi:10.13025/14703, we got a http status code 422 and the message "DOI 10.13025/14703: This element is not expected. Expected is one of ( {[http://datacite.org/schema/kernel-4}creators](http://datacite.org/schema/kernel-4%7Dcreators), {[http://datacite.org/schema/kernel-4}titles](http://datacite.org/schema/kernel-4%7Dtitles), {[http://datacite.org/schema/kernel-4}publisher](http://datacite.org/schema/kernel-4%7Dpublisher), {[http://datacite.org/schema/kernel-4}publicationYear](http://datacite.org/schema/kernel-4%7DpublicationYear), {[http://datacite.org/schema/kernel-4}resourceType](http://datacite.org/schema/kernel-4%7DresourceType), {[http://datacite.org/schema/kernel-4}subjects](http://datacite.org/schema/kernel-4%7Dsubjects), {[http://datacite.org/schema/kernel-4}contributors](http://datacite.org/schema/kernel-4%7Dcontributors), {[http://datacite.org/schema/kernel-4}dates](http://datacite.org/schema/kernel-4%7Ddates), {[http://datacite.org/schema/kernel-4}language](http://datacite.org/schema/kernel-4%7Dlanguage), {[http://datacite.org/schema/kernel-4}alternateIdentifiers](http://datacite.org/schema/kernel-4%7DalternateIdentifiers) ). at line 4, column 0".
2024-09-13 06:10:47,664 ERROR unknown unknown org.dspace.identifier.doi.DOIOrganiser @ It wasn't possible to update this identifier:  doi:10.13025/14703 Exceptions code:  BAD_ANSWER
org.dspace.identifier.doi.DOIIdentifierException: Unable to parse an answer from DataCite API. Please have a look into DSpace logs.
	at org.dspace.identifier.doi.DataCiteConnector.reserveDOI(DataCiteConnector.java:467) ~[dspace-api-7.6.jar:7.6]
	at org.dspace.identifier.doi.DataCiteConnector.updateMetadata(DataCiteConnector.java:538) ~[dspace-api-7.6.jar:7.6]
	at org.dspace.identifier.doi.DOIOrganiser.update(DOIOrganiser.java:571) [dspace-api-7.6.jar:7.6]
	at org.dspace.identifier.doi.DOIOrganiser.runCLI(DOIOrganiser.java:271) [dspace-api-7.6.jar:7.6]
	at org.dspace.identifier.doi.DOIOrganiser.main(DOIOrganiser.java:103) [dspace-api-7.6.jar:7.6]

There are two problems with this:

  1. If the DOI is not yet on a live item, it is not super easy to identify the offending item from doi:10.13025/14703, as it requires a lookup/inspection into the DOI table. Would be great if the item uuid would be logged alongside the error.

  2. The error message doesn't make it clear what's wrong with the metadata of the item. Cases that we have seen that is causing errors are null/empty metadata values for a specific field, or the DOI being present twice, but there are other cases as well.

Would be really great if the error statement makes it clear which metadata field or value is causing the problem.

To Reproduce

Steps to reproduce the behavior:

  1. Make sure DOI registration is configured and active for new items
  2. Submit an item with an empty value for for example dc.contributor.author, or put the same DOI value in two separate instances of dc.identifier.uri

Expected behavior

The offending item and offending metadata field should be clear from the log, so that the errors are easily resolved.

@bram-atmire bram-atmire added bug needs triage New issue needs triage and/or scheduling labels Sep 16, 2024
@github-project-automation github-project-automation bot moved this to 🆕 Triage in DSpace Backlog Sep 16, 2024
@tdonohue tdonohue added identifier: DOIs Related to integration with DOIs error handling How errors are handled and/or logged help wanted Needs a volunteer to claim to move forward and removed needs triage New issue needs triage and/or scheduling labels Sep 16, 2024
@mwoodiupui
Copy link
Member

The cited error response is mostly the error message that the registrar's service got from some schema-driven XML parser. We shouldn't depend on knowing or guessing what parser they use today, so it is risky to do more than to display the document we sent together with their response.

We could acquire a copy of the schema that we intend to follow in our crosswalk, and parse the crosswalk output with it to check validity before sending. That might give us a chance to display better information about any syntactic problems.

@mwoodiupui
Copy link
Member

Second thought: rather than generating a schema-driven parser just to check for problems, simply log the metadata document as part of the error message and let people use their preferred tools to examine it.

@tdonohue tdonohue moved this from 📋 To Do to 🏗 In Progress in DSpace 8.x and 7.6.x Maintenance Sep 19, 2024
@github-project-automation github-project-automation bot moved this from 🏗 In Progress to ✅ Done in DSpace 8.x and 7.6.x Maintenance Dec 17, 2024
@tdonohue tdonohue added this to the 7.6.3 milestone Dec 17, 2024
@tdonohue tdonohue removed the help wanted Needs a volunteer to claim to move forward label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug error handling How errors are handled and/or logged identifier: DOIs Related to integration with DOIs
Projects
Development

Successfully merging a pull request may close this issue.

3 participants