SmartLink service needs to be more robust

  • 1
  • Problem
  • Updated 1 month ago
I can see that there are a quite a number of posts and with that occasions when the SmartLink Service has been down for remote action.

If a service like SmartLink is needed for users to run remote then one would expect to have a design with uptime guarantee 99.99% isch.

So for the moment I would say we are having a bad design and I can see this as a showstopper for to promote FLEX remote usage, at least based on SmartLink-platform.

A more selfcontained solution based on VPN is probably the better option until SmartLink gets more reliable.

What is your view and/or the official statement from Flex ? 

Thank you in advance / Tilman SM0JZT
Photo of Tilman D Thulesius

Tilman D Thulesius

  • 5 Posts
  • 0 Reply Likes

Posted 1 month ago

  • 1
Photo of Brian Morgan VK7RR

Brian Morgan VK7RR

  • 55 Posts
  • 11 Reply Likes
I do not agree. There have been very few outages with Smartlink. I venture to suggest that we have more with our service provider. 
Photo of Paul

Paul

  • 453 Posts
  • 135 Reply Likes
I have a used 6500, the price I paid included a small element (nowhere near £200) to account for v2 being already installed. WAN remote is the only reason I was attracted to v2 so I would definitely not be a satisfied customer if I had paid full price for this. I therefore agree with Tilman but it seems to me that the weak point (the authorisation) isn't within Flex's control. IMHO it was a bad decision to use an intermediary for this, I can't see them back-tracking at this stage.

Footnote: I default to my Softether VPN as I find Smartlink has nothing extra to offer and the VPN is much more dependable.
Photo of ka7gzr

ka7gzr

  • 215 Posts
  • 36 Reply Likes

I haven't experienced any down time with SmartLink. I do a lot of remote access with SmartLink and found it very reliable. The few times I have experienced issues I found it to be not SmartLink problems but with my local service or the 3rd party service I was trying to use.  
I don't know the details of your failures but I would question other network issues instead of SmartLink. 
Photo of Winston VK7WH

Winston VK7WH

  • 248 Posts
  • 42 Reply Likes
I blame the Uzure service which I think is by Microsoft. Surely we should expect better reliability than this from Microsoft?
Photo of Paul

Paul

  • 453 Posts
  • 135 Reply Likes
My problems were nothing to do with my ISP nor my LAN. In each case, log-in was made impossible due to outages at Auth0 whom Flex has entrusted with the security aspects of Smart link. In fairness, these outages have been infrequent (but prolonged); very frustrating when they coincide with times you need to operate remote. The latest was reported yesterday in another thread:

https://community.flexradio.com/flexr...

I'm not sure if it's fixed yet as the thread was closed abruptly.
Photo of JohnSweeney

JohnSweeney

  • 47 Posts
  • 2 Reply Likes
With an ever increasing number of users of SmartLink, Flex should publish a document which clearly states their uptime objectives and how they will deal with growth. As a small company, they are completely dependent on 3rd party suppliers. I would suggest Flex provide full support for VPN protocols like OpenVPN which is very popular and has phone/tablet apps as well. Currently Flex radios are not compatible with OpenVPN, so users VPN users must resort to SoftEther which is much less user friendly than OpenVPN. Old protocols like PPTP are not supported by Apple any more, for example.
Photo of Tilman D Thulesius

Tilman D Thulesius

  • 5 Posts
  • 0 Reply Likes
Right. This is what I think we schould be asking for. A statement on what could be expected from the service.

It is fine that it seems to work most of the time. But what is needed is a official statement on quality and process.

Again . I would also like to have a OpenVPN alternative, where one would have a selfcontained solution.
Photo of Michael Coslo

Michael Coslo

  • 818 Posts
  • 195 Reply Likes
The idea is very simple. The Flex service will be as reliable as the Microsoft Service. I take it that you all have expressed your dissatisfaction with Microsoft?

Is Flex responsible if the remote site has a power outage?
Photo of Kari Gustafsson SM0HRP

Kari Gustafsson SM0HRP, Elmer

  • 232 Posts
  • 20 Reply Likes
I do not agree with your statement that SmartLink is not enough robust. I have been using it extensively since it's release over a year ago. Including 48 hour contests and I have never experienced a down time. Never. And I am quite demanding with respect to my contest needs.

As I provide quite a bit of support for Flex users I must say that many times "down time" issues of SmartLink are related to router, PC, broadband network access or other non Flex hardware problems. Sometimes issues are regarding registration issues of SmartLink it self. But this has nothing to do with robustness.

Of course, VPN is an alternative but the I do not see it as a simple user friendly alternative. I have used Softether myself and it works fine. But to be able to easily work from several platforms; PC, smartphone, iOS makes SmartLink the obvious choice for me even though I consider myself quite technically oriented in this respect.
(Edited)
Photo of Laurens PD9X

Laurens PD9X

  • 15 Posts
  • 0 Reply Likes

What is currently the status? Are connections already possible with Smartlink?

I am waiting to register my newly purchased 6400 for remote used with the Maestro. Since yesterday evening 23:00 LT I was not possible to reach the server, even not login with my Smartlink account
Photo of Rick Wykoff

Rick Wykoff

  • 8 Posts
  • 1 Reply Like
Still not up as of this time.  11:00 UTC   9/5/18   Flex is aware of the problem and hopefully it will be resolved very soon.
(Edited)
Photo of Michael Walker

Michael Walker, Employee

  • 290 Posts
  • 77 Reply Likes
Official Response
Our Smart Link Service relies on Microsoft Azure to manage the authentication process.    Like you, we are a customer waiting for our services to be made available.

This is the update from Microsoft as of 15 minutes ago.  And, from my tests,  it looks like things are starting to come up, but it might be a few more hours for them to be at 100%.  Lots of boxes to restart here.

I just tested my iPhone on SmartSDR and it came right up.

PRELIMINARY ROOT CAUSE: A severe weather event, including lightning strikes, occurred near one of the South Central US datacenters. This resulted in a power voltage increase that impacted cooling systems. Automated datacenter procedures to ensure data and hardware integrity went into effect and critical hardware entered a structured power down process.
 
ENGINEERING STATUS: Engineers have restored access to storage resources for the majority of services, and customers should be seeing signs of recovery.  Engineers are continuing to work on any residual storage impact to fully mitigate this issue. The current mitigation workflow is outlined below:
 
1) Restore power to the South Central US datacenter (COMPLETED)
2) Recover software load balancers for Azure Storage scale units in South Central US (COMPLETED)
3) Recover impacted Azure Storage scale units in South Central US. (Mostly complete)
4) Recover the remaining Storage-dependent services in South Central US (Mostly complete)
(Edited)
Photo of Rick Wykoff

Rick Wykoff

  • 8 Posts
  • 1 Reply Like
Thanks for the great explanation Michael. Can't prevent mother nature from doing her thing. You are always right on top of these issues!   Best 73 de N4WRW.   Rick
Photo of Varistor

Varistor

  • 334 Posts
  • 73 Reply Likes
Azure has availability zones all over the world across all human inhabited continents. By having SmartLink run in more than one availability zone it can be 100% bulletproof. So the real issue here is if SL is designed to be highly available.
Photo of Michael Coslo

Michael Coslo

  • 815 Posts
  • 195 Reply Likes
That is a Microsoft issue also, and people would then demand that Flex cure their horrible latency issue. The entire concept of radio by internet is going to expose the radio users to the internet and it's issues. DDOS and local issues like the number of users simply will affect the service as we add layers of complexity to our hamming.
Photo of Burch - K4QXX

Burch - K4QXX

  • 370 Posts
  • 60 Reply Likes
I work in IT and several of the cloud services that we use were down yesterday.  It wasn't just smartlink.  Many Microsoft 365 services were down yesterday.  Sometimes stuff happens....

Burch-K4QXX
Photo of Jim Gilliam

Jim Gilliam

  • 851 Posts
  • 175 Reply Likes

Ever played with the Icom 7610? Although it doesn't compare to the Flex, it has a stand-alone server. It never fails unless the radio or the LAN/WAN fails. Perhaps it is possible to change the design philosophy with the same strategy as Icom? I never use Smartlink as the Asus OpenVPN is far superior. It is a kick in the head to use FT8 remotely.
(Edited)
Photo of Bill -VA3WTB

Bill -VA3WTB

  • 2887 Posts
  • 624 Reply Likes
Flex is a server.
Photo of Michael Coslo

Michael Coslo

  • 823 Posts
  • 198 Reply Likes
And if you are willing to put up with a lesser radio, get a 7610. I shook our new one down last week, and it is meh. Reinforced my decision to go with Flex. And FWIW, I have two Icoms in the shack.
Photo of Mike - VE3CKO

Mike - VE3CKO, Elmer

  • 351 Posts
  • 133 Reply Likes
Flex is just one user of that particular data center and just a very small percentile of users in the overall picture. If a company like Microsoft is vulnerable with all their billion dollar resources, it is very unrealistic for ham radio users to demand more than what Flex is using. This is a hobby.
I'm sure appreciative that they decided to farm out SmartLink servers with a large data center as compared to maintaining their own server in a bathroom closet.

Photo of Paul

Paul

  • 453 Posts
  • 135 Reply Likes
Yes Mike it is just a hobby. Which is why I believe it would have been better to have simply gone for a direct connection between the server and client with no middle man. There's no need for additional  security for this amateur application, a VPN provides more than enough. Like any system, the fewer components between A & B, the more reliable it will be. IMHO Flex made the wrong decision when they opted for the current system.
Photo of Varistor

Varistor

  • 334 Posts
  • 73 Reply Likes
SmartLink is at the cornerstone of the argument “Why Flex”. As such, it would make sense to design SL with far better resiliency. A single lost 6400 sale pays several times the monthly fee associated with running workloads in multiple availability zones.

Building in Azure is not the same as running a server in Microsoft’s data center. You can do that, but you are not benefiting at all from the full Azure capabilities.

I am currently working with a client that runs an app that is existential to the lives of millions of people. If the system fails people would die. The system runs in each US region as well as in Brazil and the UK. The incremental cost of running across the globe is roughly $5,000/month.

In the test environment, we run a script that randomly fails any number of components and we observe the impact. Happy to say that for the past 6 months the system has been up and running no matter what, including in the past 24 hours.

Here’s the current state of Azure that shows that you can get away with just to regions if you want to have a resilient app:

https://azure.microsoft.com/en-us/sta...

The point is that there are viable options. What FRS decides to do is their own business decision, making a tradeoff between protecting the grown jewels (SL) vs. the incremental cost.
(Edited)
Photo of JohnSweeney

JohnSweeney

  • 47 Posts
  • 2 Reply Likes
Very well said
Photo of Mike - VE3CKO

Mike - VE3CKO, Elmer

  • 351 Posts
  • 133 Reply Likes
Surely this was on the drawing board during the brainstorming sessions but there were no doubt too many cons vs pros. To revert back to what legacy radios are doing would be taking the flexibility out of Flexradio. The pros of a server outweight the cons. What your suggesting would require much more radio resources at the expense of performance and future features. For example, how to manage what radio is authorized to run what level of software. Another example is multi-client. Just isn't going to happen without external authentication.
Going out via the internet to a 3rd-party multi-level server for authentication was the was to go forward. Downfall, if a zombie apocalypse should happen and the internet is down, remote authentication would eventually be non-existent. I suspect one would have other priorities during an apocalypse besides dxing and contesting remotely.




Photo of Ken - NM9P

Ken - NM9P, Elmer

  • 4007 Posts
  • 1233 Reply Likes
The option for using a individual VPN connection, utilizing VPN software at the shack router, or Softether.net running on the shack computer, has been possible long before SmartLink was introduced.  Many of us have successfully run SSDR remotely on laptops, desktops, or iPads/iPhones before Smartlink was released. 

it still remains a fallback option when Smartlink disruptions occur.  I still leave mine active for cases just like this.

Smartlink's advantage is that is provides 1) easy networking solutions for the vast majority of hams who may not be proficient enough with networking to implement their own VPN, and 2) to provide an easy, secure authentication system to protect users from potential hacking. 3) it eliminates the requirement for VPN software to be running on a shack computer or router, and it requires no additional VPN software to be running on the Maestro, remote computer, or IOS device.

If users desire a direct connection that doesn't involve the authentication system SL provides, they are welcome to establish their own VPN connection with all the required port forwarding, etc.  It isn't that difficult, just a bit tedious to do the first time.

Ken - NM9P
Photo of Dave Spencer

Dave Spencer

  • 8 Posts
  • 0 Reply Likes
Thanks for the update Michael.
As a power engineer in a datacentre, I'm fully aware of the nightmare Microsoft must be having over this outage. It's nowhere near as simple as flicking a switch back on, or simply rebooting a few servers or the odd router. They have to be brought back up "gracefully" and likely in a particular order. And that is if the software hasn't been corrupted and has to be reloaded from backup, the PSU's haven't blown and need to be replaced, blah blah.......

Let's cut Flex some slack here. It's a Microsoft issue that's affecting a lot of customers. It's also likely that there'll be some business giants hosted there (far far bigger than Flex) and they'll be first in line to get their pound of flesh........
Photo of Bill -VA3WTB

Bill -VA3WTB

  • 2895 Posts
  • 628 Reply Likes
But there will always be people who think you can just blow the whole sytem up and it should work..lol
Photo of David Holmgren

David Holmgren

  • 42 Posts
  • 11 Reply Likes
Well all I can say about this is we're a hobby.  I would never expect 99.99.  Oh well if it's broke they'll fix it.  In the meantime call your buddies on the local simplex frequency if you need a ham fix.
Photo of Larry Williamson

Larry Williamson

  • 96 Posts
  • 10 Reply Likes
I totally agree-----this is a hobby guys! Cheyenne MT. (Norad bunker) does not require 100%! We are acting like it's a life or death situation here with our little hobby radios.
Photo of Bill -VA3WTB

Bill -VA3WTB

  • 2883 Posts
  • 620 Reply Likes
But this is an oportunity for the EXPERTS to say Flex did it all wrong.
Photo of Paul

Paul

  • 448 Posts
  • 129 Reply Likes
I'm no expert Bill but I do consider that customer-opinion is valid whether I agree with it or not. Where better to air those opinions than here?
Photo of Michael Coslo

Michael Coslo

  • 822 Posts
  • 198 Reply Likes
Amen, Bill.  I am a little surprised that so many people have the internet stacked on top of their radio game, and expect no problems ever. That is simply not realistic. 

Things like losing the cloud service, DDOS attacks, Internet outages and slowdowns are 100 percent certain to happen. I have a fast connection. But at certain times it isn't for one reason or another. I've lost it a few times and had to reset and reboot the router. All things that would take the Flex remote or Icom's remote service down. 

We're smart people. If we incorporate an outside service like the Internet into our radio use, we should learn the limitations of the internet just like we should know how to operate our radios. 
Photo of David Holmgren

David Holmgren

  • 42 Posts
  • 11 Reply Likes
How 'bout this one we use as amateurs to pitch the hobby: "When all else fails (ie. The Internet), there's always radio"   :< )
Photo of Laurens PD9X

Laurens PD9X

  • 15 Posts
  • 0 Reply Likes
Here a happy user now
Photo of Steve K9ZW

Steve K9ZW, Elmer

  • 1278 Posts
  • 656 Reply Likes

If "the net" is broke, it's broke.

Affected a lot more than SmartLink.  Microsoft lost parts of their Office 365 and other cloud services for some clients. 

As a reminder it is possible to run your shack remotely WITHOUT SmartLink.  Many did this with various VPN setups long before SmartLink.

It is of course possible to break those too - whether the individual VPN or overall connectivity.

Even with SmartLink mostly being up I've kept my Raspberry-Pi based VPN system on the ready. 

This may be a useful idea for those of us who really would be put out by other outages.   

Additionally redundant options might appeal (I know I like to have options.)

While we think of redundancy, it is a good time to think of what happens when the mains power goes down, and if we have enough "shack spares" on hand to bridge a critical failure?

I know I have been a bit lazy on sorting some of these items out. 

While a person may not need a hot-swappable shack with full backups, it might be a good idea to have a "Plan B" configuration if items go down or need to be taken offline for service.

Great thoughts Tilman SM0JZT and everyone else.  I know you made me think about how robust I have my end setup and what my fallback alternatives are.

73

Steve
K9ZW


Photo of Ken, K2KXK

Ken, K2KXK

  • 20 Posts
  • 6 Reply Likes
When I first experienced the outage and thought about it I thought, well no big deal and like some have commented, this is a hobby and a little potential inconvenient is nothing important.  However, after reflecting a little more, I have slightly changed my views.  Although ham radio is primarily a hobby and not a necessity, there are times when it is critical for the community.  My club for example has a very active RACES Group and is a significant component of the county emergency services plan.  It occurred to me that remote access to Flex equipment could be very important in an emergency.  It is something that communities around the world depend on.  Just a thought that Flex may want to factor into their approach for remote access and support.
Photo of Michael Walker

Michael Walker, Employee

  • 290 Posts
  • 77 Reply Likes
Thanks guys for all your opinions.  I know your thoughts will be reviewed.

One last thing, and that is if your remote operation is mission critical for what every reason, you do need to ensure that you personally limit your single points of failures.  Every remote operator goes through this.   In this case, having a Softether VPN in place might not be a bad Plan B.

Again, thanks for all your comments.

Mike 

This conversation is no longer open for comments or replies.