Open Science Grid: OSG--Troublesum (02:57 PM, 07/31/2006)

This report contains data from 35 Tickets.
FootPrints logo


Ticket NumberTitleDate SubmittedLast Edit DateAssigneesSummary of Current Status

Destination VO Support Center: CSC    

1991
Site failing critical site_verify tests - NTU_HEP
05/17/2006
07/10/2006
OSG GOC Coordinator
NTU_HEP is failing site_verify tests. Site is gray on
GridCat, but no
response has been received from the administrator upon
repeated
followups. Sent mail to Exec. Board (5/18) asking how
to proceed.
Another mail sent to the EB (5/24). GOC attempts to
call the site
admin have been fruitless.

Destination VO Support Center: Fermilab    

2214
Site failing critical site_verify tests - FNAL_LQCD
07/06/2006
07/24/2006
Tim Silvers
Resource is failing a critical site_verify test:
running gatekeeper
They don't have the manpower to diagnose it at
this point.

2269
Fermi/Remedy - 82585 - goc@opensciencegrid.org forwarding
07/25/2006
07/31/2006
OSG GOC Coordinator: Tim Silvers. OSG Support Centers: OSG-GOC. Individual Users: Footprints Remedy
GOC email forwarding planning due to upcoming changes;
moving
off of Paintbird (osg-goc-l@grnoc.iu.edu). Waiting for
response
from Burt.

Destination VO Support Center: GRASE    

2157
Site failing critical site_verify tests - GRASE-ALBANY
06/21/2006
07/28/2006
Tim Silvers
Resource is failing a critical site_verify test:
host reachable. It may
be several weeks. Moving the CE node soon to a new
host and will
be doing a full install.

Destination VO Support Center: nanoHUB    

1996
nanoHUB jobs fail to run
05/17/2006
07/21/2006
John Rosheck
nanoHUB jobs were failing to run at several sites.
Errors were
non-descriptive (Globus error 47, Globus error 17,
etc.) John
Rosheck put together a list of informative error
messages and
the GOC sent mailings to each Support Center asking for
followups. A lot of followups occurred and more site
support
nanoHUB. Mail was sent to customer on (5/10) asking if
this is
satisfactory. He is still hoping for more support.
This may be an
issue for the EB, or policy group. Sites say they
support
nanoHUB, but aren't. Leigh and Rob advise
contacting sites who
say they support nanoHUB to do so, or update their site
information.

Destination VO Support Center: OSG-GOC    

1969
Fermi/Remedy - 78480 - [vdt-support #1670] voms-proxy-init has special constraints on system-wide vomses file
05/17/2006
07/31/2006
Operations Workgroup: Leigh Grundhofer. OSG GOC Coordinator: Tim Silvers. OSG Support Centers: VDT
Leigh updated the ticket, "A superfluous check for
the ownership of
the "vomses" files by the program which is
used to generate
proxies, for clients using voms-proxy-init. This error
needs to
be addressed in the voms_api.c code." Ticket
addressed w/VDT
#1670.

1979
Provisioning home areas on a per-batch-job basis with PBS
05/17/2006
07/10/2006
Operations Workgroup: Leigh Grundhofer
File system reconfiguration. See Description for
latest details.
Question now is how to set up the NFS-lite approach
with per-job
(i.e., per-batch-slot) home areas under PBS Pro, and
on which
OSG VO accounts this is desired and permissible. VDT
can't assist.
Directed to ENG.

1994
ATHENA 11.0.42 Install Failure
05/17/2006
06/09/2006
Operations Workgroup: Kyle Gross
Gernot Krobath tried to install Athena on atlas.iu.edu
and it
aborted immediately. Some scripts were run by Gernot
to check if
the required directories and libraries are there, if
the lcg-
commands, later needed for the installation job, work
etc.. GOC
engineers said it appears that there is no
edg-brokerinfo command
installed on bandicoot - at least not system-wide. So
maybe the
user could try a different script, to which we need to
contact
another party who writes them (Alessandro de Salva) to
do that.
Rob will followup.

2000
Use of user home areas--limits?
05/17/2006
06/08/2006
Operations Workgroup: John Rosheck
Should there be limits on the size of home
directories? This was
discussed at the meeting held 3/31/2006 at the LCG
interop.
Laurence will check on the usaage of $HOME within the
LCG to
determine requirements/specifications. Rob will bring
up this issue at
the ITB meeting.

2001
VO registration for GridChem
05/17/2006
07/10/2006
Operations Workgroup: Leigh Grundhofer. OSG GOC Service Desk. Individual Users: Tim Silvers
New VO registration for GridChem. Many items in the
registration
were listed as tba. Since those items haven't been
completed, the
GOC suggested to the EB that the registration be
removed until
GridChem is ready, at which time a new registration
can be
submitted.

2044
GGUS Ticket ..Testing the GGUS glite CE
05/24/2006
07/07/2006
Operations Workgroup: Leigh Grundhofer. OSG Support Centers: GGUS
GGUS is testing their new CE and running into errors.
Assigned to Engineer.

2113
[vdt-support #1861] Current versions of srm-cp and OSG client?
06/09/2006
07/24/2006
Operations Workgroup: Leigh Grundhofer. OSG Support Centers: VDT
SRM_CONFIG variable issue. Assigned to Leigh.

2080
Large resource request from nanoHUB
06/01/2006
07/24/2006
OSG GOC Coordinator. OSG Support Centers: nanoHUB
nanoHUB has an application that will take 1800 hours to
run. It can run on one CPU, but they want to run
several
hundred of the jobs. Steve Clark has been discussing
this at the
weekly Operations Call since 6/5. See notes at
http://
osg.ivdgl.org/twiki/bin/view/Operations/ Updates will
continue
to be provided at the meeting. This ticket will likely
remain
open for a while. LIGO is accepting 90 jobs that are
set to run
over 200 hours.

2089
lost quotes in osg-attributes.conf during jobmanager-lsf submission
06/03/2006
06/29/2006
Operations Workgroup: Leigh Grundhofer
Missing quotes in a shell script causing errors in
0.4.1.

2073
Request to pull hostname from variable, not configure-osg.sh
05/30/2006
06/29/2006
Leigh Grundhofer
CALTECH requested to pull the hostname from the
HOSTNAME
variable rather than configure-osg.sh since some sites
may choose to
use an alias to the gatekeeper.

2090
VO Package Updates Requested
06/05/2006
06/09/2006
Leigh Grundhofer
(no data)

2154
VO registration for OPS
06/20/2006
07/17/2006
Operations Workgroup: Leigh Grundhofer
OPS VO registration.

2199
Webpage Request: Script for search queries dynamically displayed
06/29/2006
07/21/2006
Operations Workgroup. Individual Users: Marcia Teckenbrock
(no data)

2200
wsgram service showing as available and active on grid cat
06/29/2006
07/12/2006
Operations Workgroup: Leigh Grundhofer, John Rosheck
GridCat shows that some sites support WS GRAM (Web
services
GRAM). But do site actually support WS GRAM? John
found out that
GridCat reports it is installed, but that doesn't
mean it is enabled.
He sent this email to Ruth.

2244
STAR-Bham Resource/Service Registration
07/18/2006
07/31/2006
Operations Workgroup: John Rosheck
Resource registration. Reviewed/approved at Ops
Meeting, per the
SOP. Needs to be added to VORS now.

2238
Notification tool review
07/17/2006
07/24/2006
Operations Workgroup: Kyle Gross. Individual Users: Tim Silvers
STAR uses spam protection that requires their list to
be in the TO:
or FROM: field or the email gets held up. GOC
notifications are
sent to several addresses at once, with each address
included in the
BCC: field. This hides the addresses, keeping them
private. Doug
O. requested the tool be modified to send to a
separate email to
each address. We also need to look at what the
GRNOC's tool will
do for us in this regard. Kyle and I will work
together on this.

2253
atlas.iu.edu is ICMP Unreachable
07/20/2006
07/31/2006
Operations Workgroup: Kyle Gross
atlas.iu.edu is ICMP Unreachable from Thu Jul 20
15:23:49 UTC 2006 until Thu Jul 20
15:35:38 UTC 2006

2251
Request to query the VORS database
07/20/2006
07/20/2006
Operations Workgroup: John Rosheck
Request to query the VORS db directly. John requested
this ticket.

2257
SDSS_TAM failing GridCat, but manual test are successful
07/21/2006
07/21/2006
Operations Workgroup: John Rosheck
SDSS_TAM is failing several tests on GridCat, but when
run
manually, they test successful. Why?

2281
Webpage Request: News & Events box problems
07/28/2006
07/31/2006
Operations Workgroup: Kyle Gross
Webpage: News & Events box problems

2284
CERN agreement with the 2 US Tier-1s, BNL and FNAL
07/31/2006
07/31/2006
Operations Workgroup: Leigh Grundhofer
CERN wants an agreement with the US Tier 1 sites FNAL
and BNL
on dealing with the OSG.

Destination VO Support Center: TACC    

2035
Site failing critical site_verify tests - TACC
05/23/2006
07/27/2006
OSG Support Centers: TACC. Individual Users: Tim Silvers
Resource is failing a critical site_verify test:
Authentication on the
gatekeeper. Jason is bringing up a new cluster and it
may be a
couple of months (Aug) because they are quadrupling
the size of
their cluster and decommissioning the old one.

Destination VO Support Center: UC CI    

2280
UC CI and UC Teraport information update
07/28/2006
07/31/2006
Operations Workgroup: Kyle Gross. Individual Users: Tim Silvers
Contact information updates for U. Chicago.

Destination VO Support Center: USATLAS    

1983
CDF Access to UC_ATLAS_MWT2
05/17/2006
07/17/2006
OSG Support Centers: USATLAS
Matthew Norman reported CDF should be able to have
access to both of the UChicago OSG sites. He can
get in through the Teraport, but he is having some
problems with the Atlas tier2 portal. Marty Dippel
responded, "UC_ATLAS_MWT2 uses the old (1.0.1) GUMS
server, which is incompatible with certain VOMS
servers. The Teraport cluster uses the new server (1.1)
which is why Matthew was able to authenticate on
Teraport but not on UC_ATLAS_MWT2." Marty said they
will support CDF with the 0.4.1 upgrade.

Destination VO Support Center: USCMS    

2248
End of life announcement - uscms VO at Fermilab - July 31, 2006
07/19/2006
07/19/2006
Tim Silvers
All users of USCMS resources were supposed to have
migrated into
the cms VO at CERN by the end of February 2006. GUMS
configs
may need to be updated by replacing the USCMS VO at
Fermilab
w/the one at CERN.

Destination VO Support Center: VDT    

2031
#1674: (VDT Ticket) Grid-monitor job-manager killing speed?
05/19/2006
06/09/2006
Operations Workgroup: John Rosheck. OSG Support Centers: VDT
While testing 0.4.1 CE at UCSD Terrence noticed the
gridmonitor
can slow killing off job-managers. He had around
600-700 jobs
running through osg-gw-3.t2.ucsd.edu and around 400
job-
managers running. The jobs in question are fairly
short lived,
10-15 minutes and do nothing but sleep.So what's
the criteria that
the grid-monitor uses for cleaning up the excess
job-managers? Is
there a maximum rate it will clean up new jobs? How
long after a
job is submitted does it take the grid-monitor to kill
off?
See full description for full details. VDT responded.
Wait for
customer followup.

2002
GUMS Memory Leak v1.1.0
05/17/2006
06/09/2006
Operations Workgroup: Leigh Grundhofer. OSG Support Centers: VDT
Last ticket updates indicates that John Weigand is
working on this.

2112
Fermi/Remedy - 80086 - [vdt-support #1750] Some CRLs returning HTTP error 302
06/08/2006
07/19/2006
OSG GOC Coordinator: Tim Silvers. OSG Support Centers: USCMS, VDT. Individual Users: Footprints Remedy
URLs returning errors for Burt, not for me. Awaiting
action from the
VDT.

2194
Fermi/Remedy - 81143 - [vdt-support #1901] OSG 0.4.1 / VDT 1.3.10 GUMS grid-mapfile generation - VO member retrieval
06/29/2006
07/21/2006
OSG GOC Coordinator: Tim Silvers. OSG Support Centers: VDT. Individual Users: Footprints Remedy
Bugs reported to the VDT.

2195
Fermi/Remedy - 79197 - [vdt-support #1900] GUMS gums.config validation - OSG 0.4.1 / VDT 1.3.10b
06/29/2006
07/28/2006
OSG GOC Coordinator: Tim Silvers. OSG Support Centers: VDT. Individual Users: Footprints Remedy
Several bugs were reported with GUMS by John Weigand
and sent
to the VDT.