Betfair Down / Betfair site crash

Post Reply
aleksandar
Posts: 26
Joined: Fri Dec 04, 2009 10:50 pm

I can find two scenarios regarding these recent outages. Both of them are using hypothesis that in such big company, with this kind of service as core business, there has to be decent amount of technically competent stuff. But technical management could be incompetent. So:

1. Technical management is competent. BF higher management boosted this "Ireland-Gibraltar whatever migration", even if deploy wasn't completely ready, possibly because of some business interest (I am working in IT company, and this kind of pressure on technical and R&D departments is pretty common). If this is issue, it can be resolved in shorter time (lets say weeks).

2. Technical management is incompetent but in good relationship with higher management. Higher management of course knows nothing about technical quality of service and you can sell them stories and romances for years. Until water starts to come into the ship. (I use my own software for automatic strategies, and I have considerable knowledge about BF API, and that product is far away from decent API. And they planned to improve it. To launch version 7 as I have read on their so called "developers forum" a year ago. But nothing still happened. And also there is no more people on that "forum" with answers on serious BF API questions and issues. This tells me something is wrong in technical department.). And some day, higher management detects water in the ship, wake up, and there is no more place for stories and romances. If this is the issue, it can't be resolved in short time. You can't replace technical management so easy and so fast. This can last for months, even year.

Third scenario, that everyone is incompetent, and finally ship sinks... I can't imagine that. Still. Maybe you guys from England can tell us something more about Betfair and people working there. Some of you have probably met some of those guys. Peter probably?

P.S. Excuse my English, hope you will not miss the point.
freddy
Posts: 1132
Joined: Sun Aug 01, 2010 8:22 pm

JUST POSTED ON THE BF FORUM

On Saturday March 12, from 14:05 through 20:01, Betfair's website failed. We know how frustrating this is for our customers and offer our sincere apologies to all affected.

We are now looking at ways in which we can make up Saturday’s events to you.

We are working as hard as possible to ensure Betfair offers as reliable a site as possible. In a normal week we make at least 15 changes to the Betfair website but we have resolved not to release any new products or features for the next seven days. This should give maximum stability throughout a busy week that includes the Cheltenham Festival, cricket World Cup and Champions League football.

Below is an explanation of what went wrong and what we have done to fix the issue.

When the website failed on Saturday, our first step was to disable Betfair for all our customers on the web, API and mobile services. Once we identified the actual problem, we determined that we needed our website "available" but with betting disallowed. We recovered the site internally around 18:00 and re-enabled betting as of 20:00 once we were certain it was stable.

Here is what actually happened:

After performing certain types of website changes, an issue developed that caused our servers to temporarily slow down, processing just one thing at a time (single threading) instead of thousands of user requests in parallel. This "single threading" behaviour was introduced some time ago to protect against occasional broken pages caused by serving content while it is changing. In tech speak, our servers weren't thread-safe on certain types of content changes.

This has been an operational concern for several weeks as our traffic has reached record volumes week after week. While we had several operational protections in place to limit these types of changes during peak load, we missed an important one. Every 15 minutes, an automated process was publishing exactly the type of content that triggers the issue described above. Yesterday we hit a tipping point as the web servers reached a point where it was taking longer than 15 minutes to complete their update - essentially rendering the servers unusable.

Then in an attempt to quickly shed load, we triggered a process to disable some of the computationally intensive features on the site. Unfortunately, the way this was done triggered a complete recompile of every page on our site, for every user, in every locale. Under our normal Saturday usage, recovery took several hours.

After spotting the pattern, we've recognised this has been going on with varying impact since February 8, 2011. During periods of increased user traffic, our customers would experience this issue in the form of slow navigation or a "sticky" user experience. Yesterday was simply a tipping point, made worse by our recovery attempt.

We've fixed this problem now. We've disabled the original automated job and rebuilt it to update content safely. We've tripled the capacity of our web server farm to spread our load even more thinly. We've fixed our process for disabling features so that we won't make things worse. We've updated our operational processes and introduced a whole new raft of monitoring to spot this type of issue. We've also isolated the underlying web server issue so that we can change our content at will without triggering the switch to single-threading.

We believe these changes will bring the stability we all desire and thank you for your continued custom.

Yours faithfully,

Niall Wass – Chief Marketing & Development Officer
Tony McAlister – Chief Technology Officer
User avatar
Dobbin
Posts: 222
Joined: Sun Nov 01, 2009 5:46 pm

And ?

How does that get people their money back while Betfair still make commission
User avatar
oddstrader
Posts: 344
Joined: Fri Apr 16, 2010 4:55 pm

well at least they have been open and frank which is a positive.
User avatar
Dobbin
Posts: 222
Joined: Sun Nov 01, 2009 5:46 pm

I suppose so

Maybe if there were 2 of them they may have spotted it sooner :lol: :lol:

After spotting the pattern, we've recognised this has been going on with varying impact since February 8, 2011

Only took them 5 days to notice the fault
andyfuller
Posts: 4619
Joined: Wed Mar 25, 2009 12:23 pm

To give them credit that is the best explanation I have ever seen from them. So credit where it is due they have improved in this area but oh they have so much further to go and a hell of a lot of bridges to rebuild.

They say they are looking into ways of making it up to us - what are these going to be? Not much probably as they couldn't even be arsed to send out an e-christmas card this year!
khfcfan
Posts: 4
Joined: Sat Feb 12, 2011 5:03 pm

Dobbin wrote:I suppose so

Maybe if there were 2 of them they may have spotted it sooner :lol: :lol:

After spotting the pattern, we've recognised this has been going on with varying impact since February 8, 2011

Only took them 5 days to notice the fault

More like four and a half weeks not days :shock:
lewismbet
Posts: 55
Joined: Thu Jul 23, 2009 11:20 am

freddy wrote:JUST POSTED ON THE BF FORUM

After spotting the pattern, we've recognised this has been going on with varying impact since February 8, 2011. During periods of increased user traffic, our customers would experience this issue in the form of slow navigation or a "sticky" user experience. Yesterday was simply a tipping point, made worse by our recovery attempt.
Interested to know whether this 'sticky' user experience would relate to the website interface alone or more generally to include the API connection too? I have definitely been feeling a slower connection however there are any number of potential causes for that.
User avatar
gutuami
Posts: 1858
Joined: Wed Apr 15, 2009 4:06 pm

from that statement on the forum freddy missed a few words:
"An apology and explanation"
https://promotions.betfair.com/explanation
and I got that link today when I logged into my bf account.
User avatar
LeTiss
Posts: 5489
Joined: Fri May 08, 2009 6:04 pm

In all fairness to BF, they have given an explantion and some kind of apology

Previously, they've created enemies within their own customer base by not giving explanations or apologies for site issues.

However, I'm still not confident about the future
;)
Innertube
Posts: 215
Joined: Mon Mar 14, 2011 9:18 am

The apology would be ok if this was the first time it had happened. So many problems seem to point to a bigger issue, maybe with the way the company is being run.
User avatar
superfrank
Posts: 2762
Joined: Fri Aug 14, 2009 8:28 pm

A welcome apology, explanation and admission.

Let's hope that they've turned the corner and that things improve from here.
xpaul
Posts: 32
Joined: Sat Jan 30, 2010 12:47 pm

And that's why BF will remain a monopolist for while. Everybody is crying when the site is down and talks about switching to BD but as soon as BF is up and running everybody is right back to using BF. And the cycle will repeat next time when BF will be down.
I don't blame them for acting as they act.
Small apology and everybody is more than happy.


Paul
User avatar
oddstrader
Posts: 344
Joined: Fri Apr 16, 2010 4:55 pm

good point Paul and im guilty as said
xpaul
Posts: 32
Joined: Sat Jan 30, 2010 12:47 pm

We all guilty. It's just how world works. ;)
Post Reply

Return to “Betfair Exchange API”