• CallMeAnAI@lemmy.world
    link
    fedilink
    arrow-up
    4
    arrow-down
    12
    ·
    edit-2
    4 天前

    How many banks didn’t work? Which ones? You have a source? Visa and MC were good all day here in the real world in the east coast.

    Sounds like you’re just trying to exaggerate around an edge case that frankly isn’t the end of the world even if it were common for 4 hours a year

    Why aren’t you blaming the bank for having redundancy outside a single DC? How many banks do you know if that were out susessfully using other providers that have a higher SLO/SLA?

    • AmbitiousProcess (they/them)@piefed.social
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 天前

      I can see why your account is marked with two red marks on PieFed for low reputation, because man do you come off confrontational.

      How many banks didn’t work? Which ones? You have a source?

      Search engines exist. Use them before acting as if I"m making shit up.

      The list of financial institutions that had issues, as far as I can tell from industry reporting and downdetector graphs, is Navy Federal Credit Union (~15 million members), Truist (~15 million customers), Chime (~8-9 million customers), Venmo (~60 million users), Ally Bank (~10 million customers), and Lloyds Banking group (~30 million customers).

      Assuming no overlap, that’s nearly 140 million people that lost banking and money transfer access.

      Sounds like you’re just trying to exaggerate around an edge case that frankly isn’t the end of the world even if it were common for 4 hours a year

      The outage lasted for 15 hours in some cases, due to many AWS services recovering after the outage, yet having a backlog to work through, which took many more hours. Many services also depend on AWS in a manner where AWS coming back online doesn’t instantaneously restart service. These systems are complex, and not every company that relied on them could instantly start back up the moment the main outage was resolved, let alone when many services were still marked as impacted for hours and hours later as they worked through their backlog.

      Why aren’t you blaming the bank for having redundancy outside a single DC? How many banks do you know if that were out susessfully using other providers that have a higher SLO/SLA?

      I also blame them for not having additional redundancy. I blame both them for not having a fallback, and AWS for allowing such a major outage to happen. Shockingly, more than one party can be at fault.

    • jj4211@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      4 天前

      I’m also skeptical that any payment processing networks were impacted. I would be surprised, but less so if they couldn’t manage their account online which might have similar effect. I’m not surprised at all of the grocery store or restaurants were significantly impacted. I know a lot of the apps were broken and I could imagine someone used to apping everything leaving their cards at home and unable to get lunch. Might have some aggressively “modern” establishments that are kiosk only and I could imagine them getting downed by aws outage.

      outside a single DC?

      I’m told that a lot of the companies did all the right things but still got taken down because some dependent Amazon services are tethered to that single DC and only Amazon has the power to change that.

      • CallMeAnAI@lemmy.world
        link
        fedilink
        arrow-up
        2
        arrow-down
        2
        ·
        4 天前

        I’ll wait for the final root cause but…

        We mitigated most of it by swapping to secondary DNS and completely taking any thing related to AWS DNS and services in useast1. If you didn’t have secondary DNS and heavily reliant on AWS internal DNS this might be something they experienced.

        • jj4211@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          4 天前

          I’m not familiar with AWS myself, but they seemed to be referencing something they vaguely characterized as ‘security infrastructure’, kind of as a handwaving for why they thought it made sense to be single point of failure because to enable distribution of it would somehow be insecure…

          I frankly wasn’t interested in delving deeper, because that excuse sounds pretty stupid, but I’d be trying to get details I don’t personally need about something I probably shouldn’t be arguing about. I’ve gotten burned too much by someone championing something stupid ostensibly in the name of ‘security’ to try to sign up for another one of those arguments.