WebAssign. MyEconLab. PeerWise.
These are just a few of the many online homework solutions used on Purdue’s campus. Computer science students remotely login to servers to work on and turn in homework. English majors turn in essays through Blackboard. But what do all of these solutions have in common?
They need the internet to work.
Molly Clark, a freshman in the College of Engineering, found out how important it was on Monday.
Clark was attempting to complete her Calculus II WebAssign homework when she found she couldn’t connect to Wi-Fi.
“I was kind of freaking out a little bit,” said Clark. “And I knew I had to get the assignment done so I had to figure out some way to get the homework done.”
“And that’s what led me to call my mom, who lives in Danville, to help me submit my homework. I had to call her to tell her the answers to the questions for her to submit it since she had Wi-Fi.”
Clark later resorted to using an Ethernet cable given to her by a friend, but she was still late on a few assignments.
“Because of the Wi-Fi and the Ethernet complications, I got on my chemistry assignment, but I got on an hour before it was due so I got as much as I could done.”
Information Technology at Purdue, also known as ITaP, first sounded the alarm on Jan. 16 in a tweet alerting users that Purdue’s main network, Purdue Air Link version 3.0 (also known as PAL 3.0), had connectivity troubles.
Users are unable to connect to PAL 3.0 in many places on campus. Administrators are aware of the problem and working on a solution. pic.twitter.com/H2Bef38jBP
— ITaP (@PurdueIT) January 17, 2018
So what exactly happened?
Mark Sonstein, the executive director of Purdue’s IT Infrastructure Services, hasn’t slept much in the last few weeks. According to him, since the initial connectivity incident that started on Jan. 16, ITaP had Cisco engineers on site to fix the authentication problem — a problem that stemmed from an update to make authentication smoother.
For students to use PAL, a few things must happen when connecting to one of 10,000 access points that are on campus. First, the user must enter their student username and password. The network then sends these credentials to a server located in the Purdue Telecommunications Building. That makes sure that the username and password the user entered match with the information that it has. Once verified, the server allots a local IP address for the device to use to connect to the internet.
“Over Christmas Break, we made some changes to the network,” Sonstein said. “We updated some software, some things like that which actually made things run a little smoother, but they compounded other problems that we weren’t aware of.”
One of the problems that they were aware of was that PAL was far over capacity.
“When this network was engineered four or five years ago, you were looking at a user community of 10,000 to 15,000 wireless devices,” Sonstein said. “That’s as low as I ever get now.
“I think our (all-time) peak right now is 75,000 unique devices.”
The average Purdue student would have at least one laptop and one mobile phone that would require connecting to the internet. According to the Purdue Data Digest, there are currently 40,632 undergraduate and graduate students at Purdue, so after the math is done, there is a possibility that 81,264 devices would be connected to Purdue’s wireless network at any given time.
“The only software update we have made in this entire time period, ... we made some recommended change of settings to the (authentication) server. That’s business practice changes, things like timeouts. ... Nothing to do with protocols or packet flow.”
After the software fix was applied to the server, disaster struck again.
“When I use Snapchat, I know it uses a lot of data, so I always check to see if I’m on Wi-Fi before I start,” said Noah Curran, a freshman in the College of Science. “That weekend, I noticed that my Wi-Fi on my phone wasn’t quite working.”
“I was physically at (the Telecommunications Building) when it physically did crash,” Sonstein said.
According to him, their main switch had crashed. A switch is the central box that connects all internet traffic on campus to each other and to the router that connects the traffic to the outside world. The original model they were using was the Catalyst 6880-X made by Cisco, a model that could handle 100-gigabit Ethernet speeds. Listed on Cisco’s website as able to meet “the unique needs of a midsize campus,” it was clear that it was not the case that weekend.
“(After the crash) we said okay, we’re going to put a (Cisco) 6816 switch in place,” Sonstein said. “And we’re going to move everyone on that because we know that that platform will work.”
The Catalyst 6816-X model, which retails for approximately $20,000, is an older model. Cisco announced the Catalyst 6816’s end-of-life and end-of-sale in September 2017.
“It’s not a big switch,” said Sonstein. “So we’re not sure how well it worked. ... At the beginning of the week, the 6816 was holding up, it was doing fine, and then it crashed on Friday night. So we then immediately rolled over to the 6880s, the old switches that we had rebuilt. They needed some upgrades and some updates in software and some things like that. It wasn’t a big deal, it wasn’t a full replacement — it was just some basic upgrades.”
To explain the network infrastructure, Sonstein referred to an analogy using highways. Envision a 10-lane highway, on which there is construction being done on three lanes, which limits the number of usable lanes to seven. One of these lanes exits into a single-lane access road that is only meant to handle one lane’s exit traffic. But according to Sonstein, part of the situation was that the exit lane was forced to handle all ten lanes of traffic — causing the traffic to be jammed up. Another vulnerability in the system is if there is a bridge on the highway that can only handle five lanes of traffic.
“When I incrementally open up one or two lanes that’s fine,” Sonstein said. “But if I suddenly open it back up to all 10 lanes, then all of a sudden that bridge becomes the bottleneck. So think of that as the same thing as authentication. You were driving down this highway and we were forcing you to collapse down to three lanes, and sometimes we were forcing you to push into eduroam because we just didn’t have enough throughput. Then all of a sudden we have 10 lanes of high-speed traffic, but my bridge (can only handle) five lanes still.”
The reality of having a “small bridge,” combined with a steep resurgence of users after the network came online, was that despite the move back to the refurbished 6880 switches, they found their problem persisted. After switching to the 6880 switches, they “grew to size too quickly,” prompting the 6880 to crash.
“So that’s when we made the call, we’ll bring in the new equipment,” Sonstein said.
The “new equipment” Sonstein was referring to was the core piece of a $20 million life-cycle upgrade — one that was planned well in advance.
“We had already made the purchase,” Sonstein said. “That was before my time, a year ago, in June 2017 they executed the contract. We’ve already started — we have a warehouse full of equipment that’s being programmed and configured to go into our network. We’re coming up with a network plan of over the next 18 months as we’re putting the new stuff in place.”
Sonstein stated the 6880 switch would not have been functioning in a month due to use.
“I would say Purdue was getting ahead of that and planning appropriately, but it all just accelerated due to the failures,” said Chris Garrison, a region manager for Cisco. “So they’ve already done their due diligence in making sure they’ve already done (what needs to be done).”
An IT war room has also been set up since the start of the crash in the Telecommunications Building, consisting of a mix of full-time ITaP and Cisco network engineers to help troubleshoot problems and build a plan looking forward.
“(Developer tools) were also there to help troubleshoot our problems,” Sonstein said. “We reached out to all the resources available and we got flooded with assistance.”
Looking forward at the rising enrollment rates, Sonstein sees a fine balance between fiscal responsibility and the need to upgrade.
“I want to be just ahead of the student population,” Sonstein said. “Let’s say I could even buy the box that we would need 10 years from now; it would be astronomically expensive and would sit there — bored before we need it to (do its job).”
But despite the plan moving forward, many students have been affected by the outage in many ways other than grades.
“I ended up using the hot spot on my phone,” said Rose Dunbar, a freshman in the College of Health and Human Sciences.
“Me and my family — over four people — share 12 gigabytes,” said Dunbar. She also cited concerns over using up the rest of her mobile data during another Wi-Fi outage. While some Boilers were able to finish their homework, others simply gave up.
“I couldn’t do the CHM 112 homework that was due that night,” said Emi Chan, a freshman in the College of Agriculture. “Even though I could access it on my phone through data, I couldn’t answer it due to the way it was formatted on my phone.”
Homework points were not the only thing lost some nights.
“Unfortunately I’ve had to stop from using my phone to use streaming services while lying in bed,” said Jay Rixie, a freshman in the College of Science. “I’ve had to cope by going to bed later and just staying on my laptop for longer periods while hardwired into the Ethernet.”
Fortunately, some students were able to use their social organizations’ wireless networks to connect and complete their assignments during the outage.
“I would have to drive all the way to the fraternity house just to use their Wi-Fi,” said Nathan Cohen, a freshman in the College of Science. “I’ve taken the precaution of finishing homework early because I don’t trust PAL 3.0 anymore.”
Students online on the Purdue subreddit of Reddit.com have also noticed the outage. Examples of post titles include “F--- PAL,” “Can PAL eat my a-- right now,” “PAL has cancer,” and “PAL’s s--- itself again.”
“I think right now PAL may be the worst I have seen,” said Reddit user “Emracruel” in a post. “I can’t even hope to defend it as I sit here watching it s--- in its pants and fall out of its hamster wheel.”
Despite the harshly negative feedback, the ITaP team sees it as useful.
“Complain,” Sonstein said. “What we see is the number of users connected to PAL. What I don’t see is that you waited for 15 minutes while your thing spun before it actually connected to PAL. ... The problem I have right now is that I don’t see the problem that students have. What I see are the Reddit comments. ... We had a Reddit feed because that’s — while I wish the students would use the ITaP feedback channel — at least I had some feedback that PAL was out, people are mad.”
But despite the anger and frustration from the student body, Sonstein wants their feedback looking forward.
“The worst thing in the world for me right now is that my users say, ‘PAL is broken right now; why should I complain again because it’s just always broken?’” Sonstein said. “Which I think is a lot of the attitude. But I want your complaints. I want to know when you guys are having problems — it’s my job to provide you access, but I can’t do it on my own.”