Tweenday Two (Dec 27 2007) |
![]() |
| log in |
Message boards : Technical News : Tweenday Two (Dec 27 2007)
1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
|
("Tweenday" referring to the scant few work days between Xmas and New Year's holidays). Missing files like that prompt me to make an immediate fsck on the filesystem. Very true - except this is a filesystem on network attached storage. The filesystem is propietary and out of our control, therefore no fsck'ing, nor should there be a need for manual fsck'ing. Why are the bits 'in' larger than the bits 'out'? In regards to the cricket graphs, the in/out depends on your orientation. The bytes going into the router are coming from the lab, en route to the outside world. So this is "outbound" traffic going "into" the router. Vice versa for the inbound. Basically: green = workunit downloads, blue line = result uploads - though there is some low-level apache traffic noise mixed in there (web sites and schedulers). - Matt ____________ -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude | |
| ID: 695137 · | |
|
Thanks Matt for the update. | |
| ID: 695154 · | |
|
Maybe you could make a suggestion to the heavy contributors that they should set their queue's to a longer period for the period you need to build indexes and cause downtime. | |
| ID: 695165 · | |
Maybe you could make a suggestion to the heavy contributors that they should set their queue's to a longer period for the period you need to build indexes and cause downtime. I don't have exact stats here, but I believe most of our bandwidth is due to users of the "set it and forget it" variety - they never mess around with queues/caches. So if the "heavy" users did stock up it wouldn't help as much as you'd think. Also, I thought I saw someone say something about a 60Mbit cap. I mentioned how we seem to have a 60Mbit ceiling - that's not an enforced cap - that's due to internal disk/database/network I/O bottlenecks which are quite dynamic and always difficult to track down. In reality, we have a 1GBit connection to the world via Hurricane Electric, but alas this is constrained by a 100 Mbit fiber coming out of the lab down to campus - it will take some big $$$ to upgrade that, which may happen sooner or later (as it would not only benefit us). But I thought to remember you guys have Gigabit lnterlinks between all the servers and probably a Gigabit or higher backbone going almost all the way to the Internet router, that 60Mbit/s-7.5MB/s limit is really really weird. We have gigabit all over our server closet (more or less - some older servers are 100Mbit going into the 1Gbit switch). And yes, you are right - it is weird. Maybe I'll take a look at that switch now that you mention it... - Matt ____________ -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude | |
| ID: 695170 · | |
|
Won't just an upgrade of NIC's on both ends of the fibre do the trick? Or is it that low quality fibre? | |
| ID: 695178 · | |
|
| |
| ID: 695188 · | |
Won't just an upgrade of NIC's on both ends of the fibre do the trick? Or is it that low quality fibre? That's more or less true, but this fibre is under campus control (not ours), and let's just say they have a very specific way of doing things. - Matt ____________ -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude | |
| ID: 695198 · | |
|
I think I remember when you posted the pictures of when that cable was put in! | |
| ID: 695201 · | |
|
Let me get this right, you are paying for a gig but can only use a tenth of it? (And you are asking for donations?!) Maybe you could make a suggestion to the heavy contributors that they should set their queue's to a longer period for the period you need to build indexes and cause downtime. ____________ | |
| ID: 695309 · | |
|
The Cricket graph seem to be for the actual Gigabit Internet port, so maybe the bandwidth problem isn't on the Seti side, but on the campus side. | |
| ID: 695328 · | |
("Tweenday" referring to the scant few work days between Xmas and New Year's holidays). Matt, Is it possible to do the database builds in parallel with the other Tuesday maintenance tasks? That would make it much less painful. Even if you can't, why don't you schedule them serially with the other Tuesday tasks? If the heavy crunchers who process nothing but SETI tasks know that this will happen, they can load up the day before. For the rest of us, we'll let the backup projects run (Einstein is more than happy to gobble up my CPUs right now). For the set and forget types, they won't notice and if they do, they are not "forgetting" enough! ;-) So, for a while, Tuesday downtime is longer than we are used to. I'll bet you you that we will get used to it. Jim ____________ Thanks, Jim | |
| ID: 695358 · | |
|
During these rather rocky times, has anyone thought of disabling the beta project(s) that share the same servers? I think astropulse uses the same hardware, so why not throttle that project down to free up the main project. I realize this may not solve the bandwidth issue, but it may help the processing issues. | |
| ID: 695360 · | |
Guys, It seems that you are looking at wrong graph. interface packets per second , In packets per second both graphs (in/out) are equal. (it seems that you peaked in that, not bandwidth. or maybe combination of both) Or maybe is a limit of some firewall. What (whose) router(s)/firewall(s) do you have in line to the backbone ?? I hope it helps !!! Bye ____________ | |
| ID: 695460 · | |
During these rather rocky times, has anyone thought of disabling the beta project(s) that share the same servers? I think astropulse uses the same hardware, so why not throttle that project down to free up the main project. I realize this may not solve the bandwidth issue, but it may help the processing issues. I don't think the minimal traffic on Beta is going to make one bit of difference to the main site problems. The Beta server status only has Results ready to send 8,185 Results in progress 14,849 as at 28 Dec 2007 22:21:13 UTC. And it does keep my cpu's from making bigger demands here, units on Beta (AR=0.4xx) with V 6.00 at moment take ~3:20 (or ~5:00) to crunch whilst most unit here only take <20 mins (or ~40 mins), depending on cpu. | |
| ID: 695461 · | |
During these rather rocky times, has anyone thought of disabling the beta project(s) that share the same servers? I think astropulse uses the same hardware, so why not throttle that project down to free up the main project. I realize this may not solve the bandwidth issue, but it may help the processing issues. Judging from estimates of processing power at the third-party stats sites, S@h Beta produces something like one-third of one percent as much work as the main project. The increase in server capacity from shutting it down would therefore be pretty negligible. ____________ | |
| ID: 695462 · | |
Let me get this right, you are paying for a gig but can only use a tenth of it? (And you are asking for donations?!) I don't have the exact numbers, but if I remember correctly.... The gigabit from Hurricane Electric cost about a quarter of what the previous connection cost from Cogent. So, even if they waste 9/10ths of it, they are saving money. How is that bad? Universities are highly political. Matt has hinted about that elsewhere, but the short answer is, the only way to get fiber that isn't controlled by campus is to rent space off campus. There are some other thoughts, like moving most of the servers nearer where that fat pipe enters campus, but that means other logistic issues as well. More to the point, it's really easy for us (especially those of us who do internetworking for a living) to sit out her and play "armchair quarterback." I certainly have my ideas, but unless and until I volunteer to actually write the code, I won't criticize the project (or BOINC) for not doing that. ____________ | |
| ID: 695466 · | |
|
Dudo: good catch, that is weird indeed. But 6000 Packets a sec shouldn't be that hard for a gigabit router, especially if you consider its CPU never got over 10% useage and its only using 1/5th of its ram. | |
| ID: 695469 · | |
|
Would any of this explain why my client is currently unable to download WUs? It has gotten part of two and can't get one bit of three others. It has already worked through its queue of WUs. I'll bump up my queue to 7 days, but that won't help now since my tank is empty. | |
| ID: 695471 · | |
In the meantime, I've been trying to squeeze more juice out of our current servers. I'm kinda stumped as to why we are hitting this 60 MB/sec ceiling of workunit production/sending. I'm not finding any obvious I/O or network bottlenecks. I have been watching the uploads and downloads when the network is congested the last few times it has happened. I have DSL (10 Mbs) or municipal WiFi (1.2 to 2.4 Mbs). What is occurring is that many times an upload or download starts, continues fitfully for several packets, gets almost to the end or even to 100%, and then quits w/o completing. This behavior is very very common. In other words on my machines, many WUs or results are being uploaded or downloaded essentially many times before finally completing. The way I/O and CPU priority are supposed to work is that when a program is given the CPU on I/O completion, it is assigned a higher priority than normal, and then that priority is allowed to degrade slowly to normal. This is so if a program is I/O bound, it can begin its I/O, be sent to the I/O wait queue, be given the CPU again at a higher than normal priority when the I/O is complete so it can do sufficient computations again to quickly begin another I/O operation. It appears that programs uploading or downloading WUs are not being allocated sufficient CPU time because the delays between I/O attempts appear to be causing the connections to time out. I wonder if the priority mechanism is working on the S@H server or if the server is attempting too many simultaneous connections. ____________ | |
| ID: 695477 · | |
snipped.... Probably the latter, is my guess, a few months ago when we had problems downloading. On most connections if you didn't start downloading in 22 seconds the connection was broken and you retried later. Most of my downloads during this period go beyond the 22 sec limit and may or may not download a few bytes in the next 5 mins or so before the connection is terminated, although a few do timeout at the 22 sec limit. | |
| ID: 695489 · | |
Message boards : Technical News : Tweenday Two (Dec 27 2007)
| Copyright © 2013 University of California |