Meeting 2018-12-17

Attendees

  • Brian
  • Derek
  • Shawn
  • John

Completed

  • DONE Brian: Setup group for alerts at sand-ci.org

Action items

  • Derek - “Audit” the RabbitMQ alerts with new alerts mailing group
  • Derek - Put together weekly summary email of Condor transfer data
  • Brian - Research on how to include TCP flow statistics for XRootD
  • Ilya and Nebraska - Improve UNL ElasticSearch monitoring, alert when data rate is less than expected
  • Derek - Find source of collectors, and make sure who can access them to restart them
  • Derek - Document how to replay records
  • Shawn - Wants alerts when PS instance stops reporting
  • Derek - Start email about acknowledgements for ps-collectors!
  • Focus day on RSV overhaul on Tuesday, 12/18.

Existing items

  • John: Created spreadsheet outlining Nagios probes for project services (both Nebraska and other sites)
    • Initial draft SAND monitoring spreadsheet saved into docs folder
    • Brian emphasized knowing the alert destinations (email addresses, etc.)
    • Brian created alerts at sand-ci.org group for notifications
    • Will follow up with Chicago and Michigan to fill in gaps
  • Edgar: Will contact John Hicks regarding setup of a pS mesh config for getting I2 data and NRP data into archive
  • Test RabbitMQ functionality by manually configuring an endpoint
    • There might be issues vs how the RSV probes collect data. Selection of topics?
    • Need to ensure data format is the same in RSV and MQ endpoints
  • Shawn: RabbitMQ authentication
    • Shawn discussed the possibility of a plugin functionality with the pS developers, and they seemed open to the idea.
    • Keeps them out of the business of storing credentials
    • pS team agrees it’s a reasonable feature request
    • Lead developer for this psconfig request is out of the office until second week in January
  • Monitoring
    • The pipelines and the contents of the pipelines
    • Nebraska will be responsible for the “plumbing” monitoring?
      • MQ status
      • Confirmation of flow with test messages
    • Shawn has experience with monitoring the data quality
    • John: Come up with Nagios probes list for infrastructure
  • Derek: Will look at the RSV code and estimate the development effort
  • Expanding web presence
    • Shawn: Create github PR and Derek will review
    • Logo, both text and graphical
    • Pointers to existing OSG network docs
  • Complete documentation for each service component in architecture document
  • How to send check_mk status info from Nebraska to OSG perfSonar ETF?
    • LiveStatus API?

Updated: