Open Science Grid: OSG--Troublesum (02:57 PM, 07/31/2006)This report contains data from 35 Tickets. |
![]() |
| Ticket Number | Title | Date Submitted | Last Edit Date | Assignees | Summary of Current Status |
|---|---|---|---|---|---|
| Destination VO Support Center: CSC | |||||
| NTU_HEP is failing site_verify tests. Site is gray on GridCat, but no response has been received from the administrator upon repeated followups. Sent mail to Exec. Board (5/18) asking how to proceed. Another mail sent to the EB (5/24). GOC attempts to call the site admin have been fruitless. | |||||
| Destination VO Support Center: Fermilab | |||||
| Resource is failing a critical site_verify test: running gatekeeper They don't have the manpower to diagnose it at this point. | |||||
| GOC email forwarding planning due to upcoming changes; moving off of Paintbird (osg-goc-l@grnoc.iu.edu). Waiting for response from Burt. | |||||
| Destination VO Support Center: GRASE | |||||
| Resource is failing a critical site_verify test: host reachable. It may be several weeks. Moving the CE node soon to a new host and will be doing a full install. | |||||
| Destination VO Support Center: nanoHUB | |||||
| nanoHUB jobs were failing to run at several sites. Errors were non-descriptive (Globus error 47, Globus error 17, etc.) John Rosheck put together a list of informative error messages and the GOC sent mailings to each Support Center asking for followups. A lot of followups occurred and more site support nanoHUB. Mail was sent to customer on (5/10) asking if this is satisfactory. He is still hoping for more support. This may be an issue for the EB, or policy group. Sites say they support nanoHUB, but aren't. Leigh and Rob advise contacting sites who say they support nanoHUB to do so, or update their site information. | |||||
| Destination VO Support Center: OSG-GOC | |||||
| Leigh updated the ticket, "A superfluous check for the ownership of the "vomses" files by the program which is used to generate proxies, for clients using voms-proxy-init. This error needs to be addressed in the voms_api.c code." Ticket addressed w/VDT #1670. | |||||
| File system reconfiguration. See Description for latest details. Question now is how to set up the NFS-lite approach with per-job (i.e., per-batch-slot) home areas under PBS Pro, and on which OSG VO accounts this is desired and permissible. VDT can't assist. Directed to ENG. | |||||
| Gernot Krobath tried to install Athena on atlas.iu.edu and it aborted immediately. Some scripts were run by Gernot to check if the required directories and libraries are there, if the lcg- commands, later needed for the installation job, work etc.. GOC engineers said it appears that there is no edg-brokerinfo command installed on bandicoot - at least not system-wide. So maybe the user could try a different script, to which we need to contact another party who writes them (Alessandro de Salva) to do that. Rob will followup. | |||||
| Should there be limits on the size of home directories? This was discussed at the meeting held 3/31/2006 at the LCG interop. Laurence will check on the usaage of $HOME within the LCG to determine requirements/specifications. Rob will bring up this issue at the ITB meeting. | |||||
| New VO registration for GridChem. Many items in the registration were listed as tba. Since those items haven't been completed, the GOC suggested to the EB that the registration be removed until GridChem is ready, at which time a new registration can be submitted. | |||||
| GGUS is testing their new CE and running into errors. Assigned to Engineer. | |||||
| SRM_CONFIG variable issue. Assigned to Leigh. | |||||
| nanoHUB has an application that will take 1800 hours to run. It can run on one CPU, but they want to run several hundred of the jobs. Steve Clark has been discussing this at the weekly Operations Call since 6/5. See notes at http:// osg.ivdgl.org/twiki/bin/view/Operations/ Updates will continue to be provided at the meeting. This ticket will likely remain open for a while. LIGO is accepting 90 jobs that are set to run over 200 hours. | |||||
| Missing quotes in a shell script causing errors in 0.4.1. | |||||
| CALTECH requested to pull the hostname from the HOSTNAME variable rather than configure-osg.sh since some sites may choose to use an alias to the gatekeeper. | |||||
| OPS VO registration. | |||||
| GridCat shows that some sites support WS GRAM (Web services GRAM). But do site actually support WS GRAM? John found out that GridCat reports it is installed, but that doesn't mean it is enabled. He sent this email to Ruth. | |||||
| Resource registration. Reviewed/approved at Ops Meeting, per the SOP. Needs to be added to VORS now. | |||||
| STAR uses spam protection that requires their list to be in the TO: or FROM: field or the email gets held up. GOC notifications are sent to several addresses at once, with each address included in the BCC: field. This hides the addresses, keeping them private. Doug O. requested the tool be modified to send to a separate email to each address. We also need to look at what the GRNOC's tool will do for us in this regard. Kyle and I will work together on this. | |||||
| atlas.iu.edu is ICMP Unreachable from Thu Jul 20 15:23:49 UTC 2006 until Thu Jul 20 15:35:38 UTC 2006 | |||||
| Request to query the VORS db directly. John requested this ticket. | |||||
| SDSS_TAM is failing several tests on GridCat, but when run manually, they test successful. Why? | |||||
| Webpage: News & Events box problems | |||||
| CERN wants an agreement with the US Tier 1 sites FNAL and BNL on dealing with the OSG. | |||||
| Destination VO Support Center: TACC | |||||
| Resource is failing a critical site_verify test: Authentication on the gatekeeper. Jason is bringing up a new cluster and it may be a couple of months (Aug) because they are quadrupling the size of their cluster and decommissioning the old one. | |||||
| Destination VO Support Center: UC CI | |||||
| Contact information updates for U. Chicago. | |||||
| Destination VO Support Center: USATLAS | |||||
| Matthew Norman reported CDF should be able to have access to both of the UChicago OSG sites. He can get in through the Teraport, but he is having some problems with the Atlas tier2 portal. Marty Dippel responded, "UC_ATLAS_MWT2 uses the old (1.0.1) GUMS server, which is incompatible with certain VOMS servers. The Teraport cluster uses the new server (1.1) which is why Matthew was able to authenticate on Teraport but not on UC_ATLAS_MWT2." Marty said they will support CDF with the 0.4.1 upgrade. | |||||
| Destination VO Support Center: USCMS | |||||
| All users of USCMS resources were supposed to have migrated into the cms VO at CERN by the end of February 2006. GUMS configs may need to be updated by replacing the USCMS VO at Fermilab w/the one at CERN. | |||||
| Destination VO Support Center: VDT | |||||
| While testing 0.4.1 CE at UCSD Terrence noticed the gridmonitor can slow killing off job-managers. He had around 600-700 jobs running through osg-gw-3.t2.ucsd.edu and around 400 job- managers running. The jobs in question are fairly short lived, 10-15 minutes and do nothing but sleep.So what's the criteria that the grid-monitor uses for cleaning up the excess job-managers? Is there a maximum rate it will clean up new jobs? How long after a job is submitted does it take the grid-monitor to kill off? See full description for full details. VDT responded. Wait for customer followup. | |||||
| Last ticket updates indicates that John Weigand is working on this. | |||||
| URLs returning errors for Burt, not for me. Awaiting action from the VDT. | |||||
| Bugs reported to the VDT. | |||||
| Several bugs were reported with GUMS by John Weigand and sent to the VDT. | |||||