Meeting 2019-01-28

Attendees

  • Brian
  • Edgar
  • Garhan
  • Rob Gardner

Derek, Shawn, and John were out due to the OSG Operations F2F.

Completed

  • Ilija, Derek: Alarms are working! (And they corresponded to a real outage too!). We can watch these over the next few weeks to determine whether they are set to the appropriate sensitivity.
  • Derek: Deployed RSV replacement to testing.
  • Derek: Recreated ps-collector setup in testing.
  • Derek Refactored the perfSonar collectors so we can change the ES prefix. Doing some other minor cleanup while there (e.g., adding ACKs to prevent data loss).
  • Derek: Created parallel data pipeline, with parallel ES indexes to store the data at Nebraska.

Action items

  • Ilija, RobG: Tidy up alerts emails - rename from “ATLAS” to “SAND-CI” if possible, remove links to the ATLAS ADC Twiki page.
  • Derek: Test RSV replacements and verify data. Suggested test procedure (from 14 January): - Start testing by polling sites with capable hardware: USCMS or ATLAS - Concerned about overloading pS instances on marginal hardware - Validation tools don’t exist yet, done manually with Kibana queries - Compare overlapping time ranges to confirm record counts match

Existing items

  • Derek - Put together weekly summary email of Condor transfer data
  • Monitoring (John)
    • Still need to update SAND monitoring draft outlining Nagios probes for project services (both Nebraska and other sites), mostly based on architecture document
    • How to send check_mk status info from Nebraska to OSG perfSonar ETF?
      • LiveStatus API?
  • Shawn: RabbitMQ authentication
    • Shawn discussed the possibility of a plugin functionality with the pS developers, and they seemed open to the idea.
    • Keeps them out of the business of storing credentials
    • pS team agrees it’s a reasonable feature request
    • Lead developer for this psconfig request is out of the office until second week in January
  • Shawn: Expanding web presence
    • Shawn will create github PR and Derek will review
    • Logo, both text and graphical
    • Pointers to existing OSG network docs
  • Shawn: Complete documentation for each service component in architecture document
  • Derek/John/MarianZ - Documentation for pS collectors administration
    • Collectors running on host under UNL T2 puppet
    • Hosts are maintained by Marian Z.
    • Get documentation into OSG or SAND folders
    • How to restart docker containers? Repo locations?
    • 14 January update: nothing to report.
    • 28 January update: nothing to report.

Updated: