The G&L service team ensures streaming quality around the clock
Streams on the internet must be reliably accessible – otherwise viewers quickly turn away. That's why we offer broadcasters, online editors and event streamers a 24x7 service via a service level agreement. But how does it work exactly?
Saturday night, 3 a.m., it's raining outside. The smartphone calls our service technician Stefan Herzogenrath to the scene. A glance at the display shows a warning message: The live stream of a TV channel has collapsed. Stefan reaches for his laptop, logs onto the station's server and restarts the encoder. A few seconds later, the problem is solved and the stream is running again. The customer is satisfied that the quick troubleshooting also works at night and on weekends.
"Internet users rarely have much patience. If internet transmissions break down or are not provided in optimal quality, they quickly turn to other providers," Stefan tells us. "We therefore take no notice of the sandman and guarantee fast and reliable troubleshooting around the clock, seven days a week."
Regular training shortens downtime
To maximise stream availability and minimise downtime, technicians need to know the industry's state-of-the-art technology inside out. Our service staff – all of whom are trained IT specialists for system integration – therefore regularly take part in product training and further technical education. For example, at Akamai University, which specialises in advanced training on web performance solutions, cloud security and media delivery.
"The specialised training always keeps the competence of our first-level support at a very high level in order to be able to identify and fix errors quickly," Stefan emphasises. "Often, our specialists will have an idea of where the cause of the fault might be at the start of a conversation with a customer."
Touchstream simulates viewers and tests stream availability around the clock
The fact that we solve streaming problems within minutes is no coincidence. We've built a system that consists of a chain of monitoring, alerting and collaboration tools – a system that increases the speed of response for technicians. A key component of this is Touchstream – a tool that simulates viewers and constantly retrieves streams of clients from different locations. For example, the tool notices when the quality of a specific ISP's (Internet Service Provider) transmission drops. If such faulty retrievals, of which the customer is often not even aware, exceed a defined threshold, Touchstream reports the problem to checkmk, one of the most efficient and currently most popular monitoring tools for system administrators.
"We also use the tool on the large video wall in our Cologne headquarters as a dashboard, so that almost all services are visible at a glance," Stefan explains. "In case of problems, the tool generates a critical alert and informs a technician."
Toolchain alerts technicians via push alert, SMS and phone call
In order for our technician to start working immediately, a secure and fast transmission of the critical is required. In the past, checkmk informed employees by e-mail. Since the messages sometimes arrived with a delay due to the network quality, we introduced Pagerduty – a tool that not only informs first-level technicians by e-mail, but also by AppPush, SMS and phone call. The work shifts of the experts are stored in the tool, so that the right employees who are on-call can be informed directly. If the technician does not confirm the order, the system automatically alerts the next colleague.
"We can therefore adhere to strict service level agreements with defined response times of a few minutes," says Stefan. "For special events like a major football tournament or even the Olympics, Touchstream also reports outages directly to Pagerduty so we lose even less time."
Reduce fault clearance times: we are man-in-the-middle between CDN and customers
When technicians start troubleshooting, they are usually confronted with two situations. If touchstream retrievals fail in all regions, the problem is usually at the beginning of the distribution chain, usually the encoder that compresses audio and video signals and converts them into a format suitable for streaming. In this case, our technicians access the encoder directly and inform the customer via a ticket in the cloud-based customer support platform Zendesk. If, on the other hand, there are only disrupted retrievals in certain regions, in many cases the fault lies directly with the Content Delivery Network (CDN), which is the basis for the fast delivery of large media files. In this case, a quick coordination between the CDN provider and our customers is required.
"We like to be the middleman in this case, filtering information, passing it between the parties and in many cases already answering questions ourselves," says Stefan. "This way, we can shorten fault clearance times even further and optimise streaming availability."