Meeting 2019-01-28
Attendees
- Brian
- Edgar
- Garhan
- Rob Gardner
Derek, Shawn, and John were out due to the OSG Operations F2F.
Completed
- Ilija, Derek: Alarms are working! (And they corresponded to a real outage too!). We can watch these over the next few weeks to determine whether they are set to the appropriate sensitivity.
- Derek: Deployed RSV replacement to testing.
- Derek: Recreated ps-collector setup in testing.
- Derek Refactored the perfSonar collectors so we can change the ES prefix. Doing some other minor cleanup while there (e.g., adding ACKs to prevent data loss).
- Derek: Created parallel data pipeline, with parallel ES indexes to store the data at Nebraska.
Action items
- Ilija, RobG: Tidy up alerts emails - rename from “ATLAS” to “SAND-CI” if possible, remove links to the ATLAS ADC Twiki page.
- Derek: Test RSV replacements and verify data. Suggested test procedure (from 14 January): - Start testing by polling sites with capable hardware: USCMS or ATLAS - Concerned about overloading pS instances on marginal hardware - Validation tools don’t exist yet, done manually with Kibana queries - Compare overlapping time ranges to confirm record counts match
Existing items
- Derek - Put together weekly summary email of Condor transfer data
- Monitoring (John)
- Still need to update SAND monitoring draft outlining Nagios probes for project services (both Nebraska and other sites), mostly based on architecture document
- How to send
check_mk
status info from Nebraska to OSG perfSonar ETF?- LiveStatus API?
- Shawn: RabbitMQ authentication
- Shawn discussed the possibility of a plugin functionality with the pS developers, and they seemed open to the idea.
- Keeps them out of the business of storing credentials
- pS team agrees it’s a reasonable feature request
- Lead developer for this psconfig request is out of the office until second week in January
- Shawn: Expanding web presence
- Shawn will create github PR and Derek will review
- Logo, both text and graphical
- Pointers to existing OSG network docs
- Shawn: Complete documentation for each service component in architecture document
- Derek/John/MarianZ - Documentation for pS collectors administration
- Collectors running on host under UNL T2 puppet
- Hosts are maintained by Marian Z.
- Get documentation into OSG or SAND folders
- How to restart docker containers? Repo locations?
- 14 January update: nothing to report.
- 28 January update: nothing to report.