Toggle navigation
Toggle navigation
Home
DCUM Forums
Nanny Forums
Events
About DCUM
Advertising
Search
Recent Topics
Hottest Topics
FAQs and Guidelines
Privacy Policy
Your current identity is: Anonymous
Login
Preview
Subject:
Forum Index
»
Website Feedback
Reply to "Update on DCUM Technical Problems"
Subject:
Emoticons
More smilies
Text Color:
Default
Dark Red
Red
Orange
Brown
Yellow
Green
Olive
Cyan
Blue
Dark Blue
Violet
White
Black
Font:
Very Small
Small
Normal
Big
Giant
Close Marks
[quote=jsteele]Today I may have inadvertently discovered the root cause of the major problems we had back in November that caused DCUM to be down for considerable periods of time. Unfortunately, it took some outages today in order to discover the problem. The short answer is that it appears that the primary switch we have -- a Cisco SG300 -- got into what I might charitably call a less than optimal state. The longer explanation is that this we've used this switch for years without a problem. When we had the issues in November, I rebooted it as some point just to see if that made any difference, but as I recall it didn't. The switch has not been rebooted since then. All this time, it appeared to work and, in fact, did work to a certain point. Today I was trying to transfer a very large file from one server to another. I noticed that the transfer was very slow -- toping out at about 500 kbps on a gig network. I started doing various testing and found out that while several servers were pushing 20 mbps or so, no transfers would get anywhere near that speed. Even stranger, I found that one server's traffic was being broadcast to every other device on the switch. There are a couple of known causes for this sort of thing, but taking steps to address those issues didn't change anything. While I watched the network, things got worse and worse and then, suddenly, some ports simply stopped passing traffic. Unfortunately, one port was for a database cluster detained and when it stopped communicating, it crashed. I then rebooted the switch which, of course, caused the entire cluster to go down. When the switch came back up, several ports simply wouldn't turn on. There was a fairly recent firmware update, so I installed that. We also moved some cables around on the switch to find ports that worked. Finally, after several reboots, all ports started working. Then, I had to get the db cluster working again. While that took time, there were no problems. When I finished, I found that the slow db queries that has been the problem back in November were now fast. So, I can probably go back to a more conventional configuration of db components. But, I'll wait a while for that. I assume that there were bugs in the previous version of the switch firmware and I now don't know how much to trust that switch. In the network stack of firewall, switch, and servers, it is the least expensive device. So, moving to something beefier might be a good idea. [/quote]
Options
Disable HTML in this message
Disable BB Code in this message
Disable smilies in this message
Review message
Search
Recent Topics
Hottest Topics