One day I noticed my servers (agents) were greyed out in Computers node in Monitoring section. You know what’s strange; those greyed out computers were not even showing up in Agent Managed node in Administration section.
All of these greyed out servers, I see this event id 21034.
Event Type: Warning
Event Source: OpsMgr Connector
Event Category: None
Event ID: 21034
Date: 8/12/2011
Time: 10:03:10 AM
User: N/A
Computer: SERVER-NAME
Description:
The Management Group Watch-Men has no configured parents and most monitoring tasks cannot be performed. This can happen if a management group in Active Directory does not have any server SCPs or if the agent does not have access to any server SCPs.
I usually try two things when a SCOM agent is not talking to SCOM management server.
1. Restart System Center Management (HealthService) on the affected server. Watch the event log in Operations Manager node.
2. If step 1 fails, I do
a. Stop the System Center Management (HealthService) service.
b. Open Explorer window and go to “C:\Program Files\System Center Operations Manager 2007”. Rename (or Delete) the folder named “Health Service State”.
c. Start the System Center Management Service (HealthService).
Well..well..well. That didn’t work. I have called Microsoft support and got help on how to reconnect the disconnected agents.
On the SQL server that hosts Operations Manager database, open SQL Server Management Studio. Browse to the OperationsManager database. Open new query window for the database. If you need help get from the DBA.
SCRIPT A: Execute the following query to list all disconnected agents in the database.
declare @DiscoverySourceId uniqueidentifier;
set @DiscoverySourceId = dbo.fn_DiscoverySourceId_User();
SELECT TME.[TypedManagedEntityid], HS.PrincipalName
FROM MTV_HealthService HS
INNER JOIN dbo.[BaseManagedEntity] BHS with(nolock)
ON BHS.[BaseManagedEntityId] = HS.[BaseManagedEntityId]
-- get host managed computer instances
INNER JOIN dbo.[TypedManagedEntity] TME with(nolock)
ON TME.[BaseManagedEntityId] = BHS.[TopLevelHostEntityId]
AND TME.[IsDeleted] = 0
INNER JOIN dbo.[DerivedManagedTypes] DMT with(nolock)
ON DMT.[DerivedTypeId] = TME.[ManagedTypeId]
INNER JOIN dbo.[ManagedType] BT with(nolock)
ON DMT.[BaseTypeId] = BT.[ManagedTypeId]
AND BT.[TypeName] = N'Microsoft.Windows.Computer'
-- only with missing primary
LEFT OUTER JOIN dbo.Relationship HSC with(nolock)
ON HSC.[SourceEntityId] = HS.[BaseManagedEntityId]
AND HSC.[RelationshipTypeId] = dbo.fn_RelationshipTypeId_HealthServiceCommunication()AND HSC.[IsDeleted] = 0
INNER JOIN DiscoverySourceToTypedManagedEntity DSTME with(nolock)
ON DSTME.[TypedManagedEntityId] = TME.[TypedManagedEntityId]AND DSTME.[DiscoverySourceId] = @DiscoverySourceId WHERE HS.[IsAgent] = 1 AND HSC.[RelationshipId] IS NULL
If you see any results, Note down (copy/paste) the results in a note pad of all disconnected agents.
Now we need to delete all disconnected agents. Make database backup of Operations Manager database.
Execute this script to delete all disconnected agents. Note: you are on your own. I am NOT responsible for your actions.
declare @TypedManagedEntityId uniqueidentifier;
declare @DiscoverySourceId uniqueidentifier;
declare @LastErr int;
declare @TimeGenerated datetime;set @TimeGenerated = GETUTCDATE();
set @DiscoverySourceId = dbo.fn_DiscoverySourceId_User();DECLARE EntitiesToBeRemovedCursor CURSOR LOCAL FORWARD_ONLY READ_ONLY FOR SELECT TME.[TypedManagedEntityid] FROM MTV_HealthService HS INNER JOIN dbo.[BaseManagedEntity] BHS ON BHS.[BaseManagedEntityId] = HS.[BaseManagedEntityId]
-- get host managed computer instances
INNER JOIN dbo.[TypedManagedEntity] TME ON TME. BaseManagedEntityId] = BHS.[TopLevelHostEntityId] AND TME.[IsDeleted] = 0 INNER JOIN dbo.[DerivedManagedTypes] DMT ON DMT.[DerivedTypeId] = TME.[ManagedTypeId] INNER JOIN dbo.[ManagedType] BT ON DMT.[BaseTypeId] = BT.[ManagedTypeId] AND BT.[TypeName] = N'Microsoft.Windows.Computer'
-- only with missing primary
LEFT OUTER JOIN dbo.Relationship HSC
ON HSC.[SourceEntityId] = HS.[BaseManagedEntityId] AND HSC.[RelationshipTypeId] = dbo.fn_RelationshipTypeId_HealthServiceCommunication() AND HSC.[IsDeleted] = 0 INNER JOIN DiscoverySourceToTypedManagedEntity DSTME ON DSTME.[TypedManagedEntityId] = TME.TypedManagedEntityId] AND DSTME.[DiscoverySourceId] = @DiscoverySourceId WHERE HS.[IsAgent] = 1 AND HSC.[RelationshipId] IS NULL;
OPEN EntitiesToBeRemovedCursor
FETCH NEXT FROM EntitiesToBeRemovedCursor INTO @TypedManagedEntityId
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN TRAN-- Delete entity
EXEC @LastErr = [p_RemoveEntityFromDiscoverySourceScope] @TypedManagedEntityId, @DiscoverySourceId, @TimeGenerated;
IF @LastErr <> 0 GOTO Err
COMMIT TRAN
-- Get the next typedmanagedentity to delete.
FETCH NEXT FROM EntitiesToBeRemovedCursor
INTO @TypedManagedEntityId
END
CLOSE EntitiesToBeRemovedCursor
DEALLOCATE EntitiesToBeRemovedCursor
GOTO Done
Err:
ROLLBACK TRAN
GOTO Done
Done:
Execute SCRIPT A again to see any disconnected agents listed. Hopefully not. If yes, you need to execute the following script. See the highlighted value for EntityId. Replace it with the ID from above script results. Run the script against all disconnected servers with their corresponding EntityIds.
DECLARE @EntityId uniqueidentifier;
DECLARE @TimeGenerated datetime;
-- change "GUID" to the ID of the invalid entity
SET @EntityId = '3B2F8221-9F7B-5FFD-B80D-DEEAFFB6E342';
SET @TimeGenerated = getutcdate();
BEGIN TRANSACTION
EXEC dbo.p_TypedManagedEntityDelete @EntityId, @TimeGenerated;
COMMIT TRANSACTION
Execute SCRIPT A again to check the server is not listed as disconnected.
Check SCOM console to see these servers disappeared in Computers Node in Monitoring section.
Now you have to do the following all original disconnected servers after fixing it in the database. On every disconnected server,
a. Stop the System Center Management (HealthService) service.
b. Open Explorer window and go to “C:\Program Files\System Center Operations Manager 2007”. Rename (or Delete) the folder named “Health Service State”.
c. Start the System Center Management Service (HealthService).
I made a little VBScript to do the above task on all list of servers. Copy/Paste the following script in notepad an save it as "FixSCOMAgent.vbs”. Create a new text file called Servers.txt on the same folder you saved the VBScript. Type the disconnected server names in Servers.txt file. List each server name on it’s own line. e.g,
servername1
servername2
servername3
' ####### ##### ##### ####### # #
' # # # # # # # # # # ## ##
' # # # # # # # # # # # #
' ##### # ## ##### # # # # # #
' # # ## # # # # # #
' # # # # # # # # # # # #
' # # # # ##### ##### ####### # #
'
'
' #
' # # #### ###### # # #####
' # # # # # ## # #
' # # # ##### # # # #
' ####### # ### # # # # #
' # # # # # # ## #
' # # #### ###### # # #
'
' Script Name: FixSCOMAgent.vbs
' Description: This script will stop the SCOM agent service, Delete
' Health Service State folder and start the SCOM agent service on
' all servers listed in servers.txt.
'
' Written by Anand Venkatachalpathy
'On Error Resume Next
'how much time we want to wait after initating the stop service
intSleep = 18000'agent health service state folder to be deleted
HSFolder = "\c$\Program Files\System Center Operations Manager 2007\Health Service State"' creating a File system object
Set objFSO = CreateObject("Scripting.FileSystemObject")' Open Servers.txt file, should be located same folder as this script
Set f=objFSO.OpenTextFile("servers.txt",1)' Read the file one line at a time
Do While f.AtEndOfStream <> True
strComputer = f.ReadLine
WScript.Echo "Fixing " & strComputer & " ..."
'call the sub routine to fix the server
ReStartHealthService strComputer
Loop'-*-*-*-*-* End of Script -*-*-*-*-*-*
'Sub Routine: RestartHealthService
'Parameter: server name
'Description: This sub routine stops the SCOM agent service,
'Delete the Health Service Status folder and start the
'agent service.Sub RestartHealthService(strComputer)
'Service Name
strService = " 'HealthService' "
'Get WMI object on the given server
Set objWMIService = GetObject("winmgmts:" _
& "{impersonationLevel=impersonate}!\\" _
& strComputer & "\root\cimv2")
'Get the services WMI object
Set colListOfServices = objWMIService.ExecQuery _
("Select * from Win32_Service Where Name ="_
& strService & " ")
'Folder to delete
strSource = "\\" & strComputer & HSFolder
'Get the folder object to delete
Set fTarget = objFSO.GetFolder(strSource)
For Each objService in colListOfServices
WScript.Echo vbTab & "Stopping SCOM Agent Service"
objService.StopService()
WScript.Sleep intSleep
WScript.Echo vbTab & "Deleting the folder: " & strSource
fTarget.Delete
WScript.Echo vbTab & "Starting SCOM Agent Service"
objService.StartService()
Next
End Sub
'-*-*-*- End of Sub Routine –*-*-*-*-*-
Now open the command prompt as administrator, go to the location where you saved the script and run it. (CScript FixSCOMAgent.vbs).
After you run the script, Watch the SCOM console. Servers will start showing up correctly. It may take about 15 minutes some times. Just in case check fixed server’s event log for any errors.
Whew! Hope this blog helped you.
No comments:
Post a Comment