MCP HubMCP Hub
Retour aux compétences

validate-references

pjt222
Mis à jour 2 days ago
5 vues
17
2
17
Voir sur GitHub
Designapidesign

À propos

Cette compétence valide les fichiers bibliographiques BibTeX en vérifiant les champs obligatoires, en résolvant les DOI via CrossRef et en testant l'accessibilité des URL. Elle signale des problèmes tels que les doublons, les résumés manquants et les incohérences de formatage. Utilisez-la pour auditer les fichiers .bib avant soumission, après fusion de sources ou comme vérification CI dans le contrôle de version.

Installation rapide

Claude Code

Recommandé
Principal
npx skills add pjt222/agent-almanac -a claude-code
Commande PluginAlternatif
/plugin add https://github.com/pjt222/agent-almanac
Git CloneAlternatif
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/validate-references

Copiez et collez cette commande dans Claude Code pour installer cette compétence

Documentation

Validate References

Check BibTeX bibliography entries for completeness, accuracy, and consistency. This skill covers verifying required fields per entry type, resolving DOIs via the CrossRef API, checking URL accessibility, detecting duplicate entries, and producing a structured validation report that flags issues by severity. It ensures that .bib files are publication-ready before rendering.

When to Use

  • Preparing a manuscript bibliography for journal submission
  • Auditing a shared .bib file for quality before a project milestone
  • After merging bibliographies from multiple sources
  • When citations render incorrectly and you need to diagnose .bib issues
  • As a CI check on .bib files in version-controlled projects

Inputs

  • Required: Path to a .bib file
  • Optional: Validation level (basic, standard, strict; default: standard)
  • Optional: Whether to check DOI resolution online (default: TRUE)
  • Optional: Whether to check URL accessibility (default: TRUE)
  • Optional: Output report path (default: prints to console)
  • Optional: CrossRef API email for polite pool (recommended for large files)

Procedure

Step 1: Install and Load Required Packages

required_packages <- c("RefManageR", "httr2", "curl")
missing <- required_packages[!vapply(required_packages, requireNamespace,
                                     logical(1), quietly = TRUE)]
if (length(missing) > 0) install.packages(missing)

library(RefManageR)

Got: All packages load without errors.

If fail: If httr2 is unavailable, install it with install.packages("httr2"). For systems without curl headers: sudo apt install libcurl4-openssl-dev.

Step 2: Parse and Inventory the Bibliography

bib <- RefManageR::ReadBib("references.bib", check = FALSE)
message(sprintf("Loaded %d entries from references.bib", length(bib)))

# Inventory entry types
entry_types <- vapply(bib, function(x) tolower(attr(x, "bibtype")), character(1))
type_counts <- sort(table(entry_types), decreasing = TRUE)
message("Entry types:")
for (type in names(type_counts)) {
  message(sprintf("  %s: %d", type, type_counts[[type]]))
}

Got: Summary of entry types (article, book, inproceedings, etc.) and total count matching the number of @type{ blocks in the file.

If fail: Parsing errors indicate malformed BibTeX. Check for unmatched braces, missing commas between fields, or invalid UTF-8 characters.

Step 3: Validate Required Fields per Entry Type

# BibTeX required fields by entry type
required_fields <- list(
  article       = c("author", "title", "journal", "year"),
  book          = c("author", "title", "publisher", "year"),
  inproceedings = c("author", "title", "booktitle", "year"),
  incollection  = c("author", "title", "booktitle", "publisher", "year"),
  phdthesis     = c("author", "title", "school", "year"),
  mastersthesis = c("author", "title", "school", "year"),
  techreport    = c("author", "title", "institution", "year"),
  misc          = c("author", "title", "year"),
  unpublished   = c("author", "title", "note")
)

validate_fields <- function(bib) {
  issues <- list()
  for (i in seq_along(bib)) {
    key <- names(bib)[i]
    entry_type <- tolower(attr(bib[[i]], "bibtype"))
    req <- required_fields[[entry_type]]
    if (is.null(req)) {
      issues[[length(issues) + 1]] <- list(
        key = key, severity = "warning",
        message = sprintf("Unknown entry type: %s", entry_type)
      )
      next
    }
    for (field in req) {
      value <- bib[[i]][[field]]
      if (is.null(value) || !nzchar(trimws(as.character(value)))) {
        issues[[length(issues) + 1]] <- list(
          key = key, severity = "error",
          message = sprintf("Missing required field: %s (type: %s)", field, entry_type)
        )
      }
    }
  }
  issues
}

field_issues <- validate_fields(bib)
message(sprintf("Field validation: %d issues found", length(field_issues)))

Got: A list of issues where required fields are missing. Zero issues for a well-maintained bibliography.

If fail: This step runs locally and should not fail. If it does, check that the .bib file parsed correctly in Step 2.

Step 4: Resolve and Validate DOIs

validate_dois <- function(bib, email = NULL) {
  issues <- list()

  # Set polite API headers
  headers <- list(`User-Agent` = "R-bibliography-validator/1.0")
  if (!is.null(email)) {
    headers[["mailto"]] <- email
  }

  for (i in seq_along(bib)) {
    key <- names(bib)[i]
    doi <- bib[[i]]$doi
    if (is.null(doi) || !nzchar(doi)) {
      issues[[length(issues) + 1]] <- list(
        key = key, severity = "info",
        message = "No DOI present"
      )
      next
    }

    # Normalize DOI
    doi <- gsub("^https?://doi\\.org/", "", doi)
    doi <- gsub("^doi:", "", doi, ignore.case = TRUE)
    doi <- trimws(doi)

    # Resolve via CrossRef
    tryCatch({
      resp <- httr2::request(sprintf("https://api.crossref.org/works/%s", doi)) |>
        httr2::req_headers(!!!headers) |>
        httr2::req_timeout(10) |>
        httr2::req_perform()

      if (httr2::resp_status(resp) != 200) {
        issues[[length(issues) + 1]] <- list(
          key = key, severity = "error",
          message = sprintf("DOI does not resolve: %s (HTTP %d)", doi,
                            httr2::resp_status(resp))
        )
      }
    }, error = function(e) {
      issues[[length(issues) + 1]] <<- list(
        key = key, severity = "warning",
        message = sprintf("DOI check failed for %s: %s", doi, e$message)
      )
    })

    Sys.sleep(0.5)  # Rate limiting
  }
  issues
}

# Only run online checks if requested
doi_issues <- validate_dois(bib, email = "[email protected]")
message(sprintf("DOI validation: %d issues found", length(doi_issues)))

Got: Each DOI resolves successfully (HTTP 200 from CrossRef). Entries without DOIs are flagged as informational.

If fail: Network errors or rate limiting produce warnings rather than hard failures. Set the email parameter for higher rate limits from CrossRef's polite pool.

Step 5: Check URL Accessibility

validate_urls <- function(bib) {
  issues <- list()

  for (i in seq_along(bib)) {
    key <- names(bib)[i]
    url <- bib[[i]]$url

    if (is.null(url) || !nzchar(url)) next

    tryCatch({
      resp <- httr2::request(url) |>
        httr2::req_method("HEAD") |>
        httr2::req_timeout(10) |>
        httr2::req_error(is_error = function(resp) FALSE) |>
        httr2::req_perform()

      status <- httr2::resp_status(resp)
      if (status >= 400) {
        issues[[length(issues) + 1]] <- list(
          key = key, severity = "warning",
          message = sprintf("URL returned HTTP %d: %s", status, url)
        )
      }
    }, error = function(e) {
      issues[[length(issues) + 1]] <<- list(
        key = key, severity = "warning",
        message = sprintf("URL unreachable: %s (%s)", url, e$message)
      )
    })

    Sys.sleep(0.3)
  }
  issues
}

url_issues <- validate_urls(bib)
message(sprintf("URL validation: %d issues found", length(url_issues)))

Got: All URLs return HTTP 200 (or 301/302 redirects). Broken links flagged.

If fail: Some servers block HEAD requests. Retry with GET for failed HEAD checks. Timeout errors are common for slow academic servers.

Step 6: Detect Duplicate Entries

detect_duplicates <- function(bib) {
  issues <- list()

  # Check for duplicate DOIs
  dois <- vapply(bib, function(x) {
    d <- x$doi
    if (is.null(d)) NA_character_ else tolower(trimws(d))
  }, character(1))

  doi_table <- table(dois[!is.na(dois)])
  dup_dois <- names(doi_table[doi_table > 1])
  for (d in dup_dois) {
    keys <- names(bib)[which(dois == d)]
    issues[[length(issues) + 1]] <- list(
      key = paste(keys, collapse = ", "), severity = "error",
      message = sprintf("Duplicate DOI %s in entries: %s", d,
                        paste(keys, collapse = ", "))
    )
  }

  # Check for duplicate titles (fuzzy)
  titles <- vapply(bib, function(x) {
    t <- x$title
    if (is.null(t)) NA_character_ else tolower(gsub("[^a-z0-9 ]", "", tolower(t)))
  }, character(1))

  seen <- character(0)
  for (i in seq_along(titles)) {
    if (is.na(titles[i])) next
    for (j in seen) {
      if (identical(titles[i], titles[as.integer(j)])) {
        issues[[length(issues) + 1]] <- list(
          key = sprintf("%s, %s", names(bib)[as.integer(j)], names(bib)[i]),
          severity = "warning",
          message = sprintf("Possible duplicate titles: '%s'",
                            substr(bib[[i]]$title, 1, 60))
        )
      }
    }
    seen <- c(seen, as.character(i))
  }

  issues
}

dup_issues <- detect_duplicates(bib)
message(sprintf("Duplicate detection: %d issues found", length(dup_issues)))

Got: Zero duplicates for a clean bibliography. Any detected duplicates are flagged with the specific entry keys involved.

Step 7: Generate Validation Report

generate_report <- function(all_issues, bib, output_file = NULL) {
  errors   <- Filter(function(x) x$severity == "error", all_issues)
  warnings <- Filter(function(x) x$severity == "warning", all_issues)
  infos    <- Filter(function(x) x$severity == "info", all_issues)

  lines <- c(
    "# Bibliography Validation Report",
    "",
    sprintf("**File**: references.bib"),
    sprintf("**Entries**: %d", length(bib)),
    sprintf("**Date**: %s", Sys.Date()),
    "",
    sprintf("## Summary: %d errors, %d warnings, %d info",
            length(errors), length(warnings), length(infos)),
    ""
  )

  if (length(errors) > 0) {
    lines <- c(lines, "## Errors", "")
    for (issue in errors) {
      lines <- c(lines, sprintf("- **[%s]** %s", issue$key, issue$message))
    }
    lines <- c(lines, "")
  }

  if (length(warnings) > 0) {
    lines <- c(lines, "## Warnings", "")
    for (issue in warnings) {
      lines <- c(lines, sprintf("- **[%s]** %s", issue$key, issue$message))
    }
    lines <- c(lines, "")
  }

  report_text <- paste(lines, collapse = "\n")

  if (!is.null(output_file)) {
    writeLines(report_text, output_file)
    message(sprintf("Report written to %s", output_file))
  }

  cat(report_text)
  invisible(all_issues)
}

all_issues <- c(field_issues, doi_issues, url_issues, dup_issues)
generate_report(all_issues, bib, output_file = "validation-report.md")

Got: A structured markdown report listing all issues grouped by severity.

Validation

  • All entries have required fields for their type (no errors in field check)
  • All DOIs resolve to valid CrossRef records
  • No duplicate DOIs exist in the bibliography
  • All URLs are accessible (HTTP 200 or redirect)
  • Validation report generated without R errors
  • Zero errors in report for a publication-ready bibliography

Pitfalls

  • DOI format inconsistency: DOIs may appear as 10.1234/..., https://doi.org/10.1234/..., or doi:10.1234/.... Normalize before comparing
  • CrossRef rate limiting: Unauthenticated requests are limited to ~50/second. Always use the email parameter to join the polite pool for higher limits
  • Transient URL failures: Academic servers occasionally timeout. Retry failed URLs once before flagging them as broken
  • Entry type variations: BibLaTeX uses @online where BibTeX uses @misc. The validator should handle both
  • False positive duplicates: Entries like "Introduction" or "Methods" as titles trigger fuzzy matching. Review flagged duplicates manually
  • Missing DOIs for older works: Pre-2000 publications often lack DOIs. Flag as informational, not as errors

Related Skills

  • manage-bibliography - fix issues found by this validator (dedup, add fields)
  • format-citations - format validated entries into styled citations
  • ../reporting/format-apa-report - APA reports require complete, validated references
  • ../r-packages/write-vignette - vignettes with citations need valid .bib entries

Dépôt GitHub

pjt222/agent-almanac
Chemin: i18n/caveman-lite/skills/validate-references
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

Compétences associées

executing-plans

Design

Utilisez la compétence executing-plans lorsque vous disposez d'un plan de mise en œuvre complet à exécuter par lots contrôlés avec des points de contrôle de revue. Elle charge et examine le plan de manière critique, puis exécute les tâches par petits lots (3 tâches par défaut) tout en rapportant la progression entre chaque lot pour une revue par l'architecte. Cela garantit une mise en œuvre systématique avec des points de contrôle de qualité intégrés.

Voir la compétence

requesting-code-review

Design

Cette compétence délègue un sous-agent réviseur de code pour analyser les modifications apportées au code par rapport aux exigences avant de poursuivre. Elle doit être utilisée après avoir terminé des tâches, implémenté des fonctionnalités majeures, ou avant une fusion vers la branche principale. La revue aide à détecter précocement les problèmes en comparant l'implémentation actuelle avec le plan initial.

Voir la compétence

connect-mcp-server

Design

Cette compétence fournit un guide complet permettant aux développeurs de connecter des serveurs MCP à Claude Code via les transports HTTP, stdio ou SSE. Elle couvre l'installation, la configuration, l'authentification et la sécurité pour intégrer des services externes tels que GitHub, Notion et des API personnalisées. Utilisez-la lors de la configuration d'intégrations MCP, de la configuration d'outils externes ou du travail avec le Protocole de Contexte de Modèle de Claude.

Voir la compétence

web-cli-teleport

Design

Cette compétence aide les développeurs à choisir entre les interfaces Web et CLI de Claude Code en fonction de l'analyse des tâches, puis permet une téléportation transparente des sessions entre ces environnements. Elle optimise le flux de travail en gérant l'état et le contexte de la session lors du passage entre le web, la CLI ou le mobile. Utilisez-la pour des projets complexes nécessitant différents outils à diverses étapes.

Voir la compétence