Attendees
Completed
- DONE Brian: Setup group for alerts at sand-ci.org
Action items
- Derek - “Audit” the RabbitMQ alerts with new alerts mailing group
- Derek - Put together weekly summary email of Condor transfer data
- Brian - Research on how to include TCP flow statistics for XRootD
- Ilya and Nebraska - Improve UNL ElasticSearch monitoring, alert when data rate is less than expected
- Derek - Find source of collectors, and make sure who can access them to restart them
- Derek - Document how to replay records
- Shawn - Wants alerts when PS instance stops reporting
- Derek - Start email about acknowledgements for ps-collectors!
- Focus day on RSV overhaul on Tuesday, 12/18.
Existing items
- John: Created spreadsheet outlining Nagios probes for project services (both Nebraska and other sites)
- Initial draft SAND monitoring spreadsheet saved into docs folder
- Brian emphasized knowing the alert destinations (email addresses, etc.)
- Brian created alerts at sand-ci.org group for notifications
- Will follow up with Chicago and Michigan to fill in gaps
- Edgar: Will contact John Hicks regarding setup of a pS mesh config for getting I2 data and NRP data into archive
- Test RabbitMQ functionality by manually configuring an endpoint
- There might be issues vs how the RSV probes collect data.
Selection of topics?
- Need to ensure data format is the same in RSV and MQ endpoints
- Shawn: RabbitMQ authentication
- Shawn discussed the possibility of a plugin functionality with the pS
developers, and they seemed open to the idea.
- Keeps them out of the business of storing credentials
- pS team agrees it’s a reasonable feature request
- Lead developer for this psconfig request is out of the office until second week in January
- Monitoring
- The pipelines and the contents of the pipelines
- Nebraska will be responsible for the “plumbing” monitoring?
- MQ status
- Confirmation of flow with test messages
- Shawn has experience with monitoring the data quality
- John: Come up with Nagios probes list for infrastructure
- Derek: Will look at the RSV code and estimate the development effort
- Expanding web presence
- Shawn: Create github PR and Derek will review
- Logo, both text and graphical
- Pointers to existing OSG network docs
- Complete documentation for each service component in architecture document
- How to send check_mk status info from Nebraska to OSG perfSonar ETF?