2.x upgrade has been a nightmare. Several problems but this is a show stopper.

  • 4
  • Question
  • Updated 2 years ago
  • Answered
I am getting the square of death in the water fall when I access smartSDR or the ios app via the 2.x smart link.  The IOS app via smartlink is useless unless I enable a seperate vpn to that network that the radio resides on.

Everything works fine if I use the VPN.

According to the Facebook experts the radio is having fragmentation and that is why things are broke only on the smartlink mode. Ok I was remote last week and decided to wait until I was physically at the radio to test. 

I did a wireshark/tcpdump and see the fragmentation.

Here is the problem. I have jumbo frames and the MTU set to 9000 the entire path to the cable modem ( I have a 1gig symetric connection.)

According to my switch, the radio is connection at 1000 full duplex. It is impossible to hard set the network settings on the radio, thus it is impossible to troubleshoot.

The support I have seen on this is that people have changed a setting and things start working?

I do not run consumer gear. Thus ideas where to go?

1. I have jumbo frames enabled the entire path.
2. MTU is set to 9000 the entire path except the radio which I can not verifiy.
3. The radio ethernet is connecting at 1000 full duplex.
4. I ran a sniffer via a spanned port on the local switch and and see fragmentation on the local port. This means the radio is probally using a low mtu for some odd reason.
5. I have 1gig symetric with static ip space.
6. I am running a complete ubiquiti stack, all of the firmwares are current version.
7. Other udp fragmentation sensative protocols like voip, and L2TP work fine.
8. Everything works fine when accessing the radio via L2tp, openvpn, and Softether vpns. 
9. The radio is installed on a secure iot lan segment due to the fact that the OS is not accesible thus its security level can not be verified.


Ideas?

This particular 2.x problem is a showstopper since I can have the same working functionality with a 1.x firmware.  I am also having the audio drop problem when running digital modes, but that has a work around. 
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
  • annoyed

Posted 2 years ago

  • 4
Photo of Mike va3mw

Mike va3mw

  • 824 Posts
  • 199 Reply Likes
David

I would be I only know enough about jumbo frames to be dangerous other than everyone has to play nice in both directions through the entire chain.

If you have jumbo frames turned off totally, do you have the same problem?

I also had your problem, and I was using pfSense as my router at the radio end.  At some point, I started to see the same issue.  

As a test, I swapped out the pfSense router and replaced it with an IQrouter.  That was 3 weeks ago and it has been flawless since.  

I suspect it was due to traffic shaping on the pfSense router that caused it.  Others (like Ria) are running pfSense without issue.  For me, my upstream bandwidth was limited at 1mb/sec as that is all I could get from the ISP.

My  2 cents (and certainly not the solution).  

mike va3mw

(speaking as a user, not an employee)
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
I can not turn off jumbo frames since it is pointless to run gigabit. It would kill my network performance. Everything works fine if I access that network via external VPNS, or if I plug the radio into the local lan segment. The problem is the new smart link feature.

I can not swap out the edge router since I have a dual wan with a redundant lte connection.  The only router on the market that supports this is Ubiquiti Edge routers and mushroom networks.  I have the flexradio hardset to affinity to one side of the dual wan only. For giggles I hooked up a consumer asus router directly connected to a cable modem and the radio was directly connected, and the problem was still there. 

I am not running any kind of QOS, The radio has access to 1gig up and down. The rest of the network has 2gig up and down with a lte backup.
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
That IQrouter looks interesting. I wish vendors would stop putting wifi as a package and offer traditional routers.

I took the test and I got an expected A+ 1890 Mb down and 1921 Mb up.
Photo of Mike va3mw

Mike va3mw

  • 824 Posts
  • 199 Reply Likes
For consumers, the IQRouter works better than most.  I have installed about 20 of them for home consumers and all have been very pleased.  Up here in the Great White North having what you have for speeds would be consider royalty.  

Normal for us is 15M down and 1 or 2 up.  Seriously.
Photo of Chris Tate  - N6WM

Chris Tate - N6WM, Elmer

  • 975 Posts
  • 272 Reply Likes
I also have had great success with the IQ router.  It solved many of my issues on Uverse.
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
Just updated to .19.  problem still there.
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
I understand about the internet issue. The reason I live in Norman Okahoma is becuase it is about as rural as you can get in the US and still have a serious internet connection.

Since it is business grade with a sla, I pay for it to.
Photo of Dieter

Dieter

  • 5 Posts
  • 0 Reply Likes
To encourage you: in my case setting up smartlink and having the first remote qso with my hand held took me about 4 hours with the ios version and another 2 hours to add ssdr for windows 10. almost everything went smoothly as expected.
73s, DJ1YG
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
New update: I ran an ethernet cable from my shack to my neighbors dsl modem. SAME EXACT PROBLEM. I could use SMARTSDR on the local network, but smartlink was having the same issue. His connection is DSL with a Asus router. I was able to connect to my work UDP vpn, and connect to my asterisk pbx via SIP.


I am starting to think that there is something wrong with the radio.
Photo of Bill -VA3WTB

Bill -VA3WTB

  • 3975 Posts
  • 967 Reply Likes
If the radio connects and works, it does not sound like a radio problem.
At this point, why not start a help ticket and have Flex work with you?
Photo of Michael Aust

Michael Aust

  • 135 Posts
  • 32 Reply Likes
Sounds like a Router Issue, Port forwarding, UPnP Enable, IP  address issue LAN versus WAN, etc 
73 Mike
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
tried three different routers and two different firewalls.

three different ISP, three different ISP technologies. The only thing consistant is the radio.
Photo of Harold Rosee

Harold Rosee

  • 151 Posts
  • 33 Reply Likes
David,

If I was troubleshooting this I would take a step back and downgrade to the previous version you were on and make sure you can get back to where you were on V1 when it was all working. Once all is working as it was I would then perform the upgrade to V2 again.

If you want you could even go to the v1.11 new release first and see if all is still working using your VPN and then go to V2.

Sometimes things get so screwed up when you do things you just need to start over. Happens to me all the time and this is the approach I take. Not specific to just this problem you have but to any problems I have. I always try to get back to where I was and try again. Sometimes solves the problem and sometimes doesn't.

Think about it. I think it's worth a try.

Harold
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
I attempted the downgrade this weekend and the radio bricked.  I am now on a three month deployment ( gov contractor) and will not be able to deal with it again until January. The radio light is blinking purple when power is applied. 
Photo of Mike va3mw

Mike va3mw

  • 824 Posts
  • 199 Reply Likes
All

David has done a great job of providing some data that may help to isolate this.  

However (speaking on my past IT career--not on behalf of FRS or our wonderful engineering team), this may or may not be a FlexRadio issue.  

You may find it a surprise that there is no standard when communicating over the internet.  Just a pile of comprises.  :)  

The black box of death is just a symptom of something else and engineering may be able to figure it out once they know what is causing it.  Right now, it is not that simple.  

For most (not all), SL works.  All of us have little control on the pipeline that connects one end to the other and there are a LOT of players in between.  Wifi radios, routers, switches, etc.  It only takes one not to play nice before we notice it.

Other applications like streaming TV, etc, it may already happen, but with those modes, you can have a 5% packet loss and not even notice.

my 2 cents as someone who has had to deal with moving data in a hurry from my previous career.

Mike
Photo of Kevin

Kevin

  • 931 Posts
  • 271 Reply Likes
This pile of compromises (you meant compromises, right?) is the same pile of compromises that runs businesses and governments 7x24 globally doing everything from deep sea exploration to deep space exploration and all possibilities in between. It transports unimaginable amounts of data per day consisting of everything from batch transfers to realtime simulations and wartime command and control. It is so basic to our infrastructure that household appliances complain if they don't have Internet access.

But the FlexRadio Systems panadapter finally brought the Internet to its knees.

Really?

Here's what I gather from all I've heard over the last couple of years. It is just my opinion of course. I'm not an employee of FRS so I'm definitely not talking for them:

1) we don't know what's wrong
2) if we don't know what's wrong it can't be wrong
3) even if we could fix it we're too busy fixing other things
4) we fix things you didn't know were broken
5) one day the Internet will catch up to us
6) it is not our fault you don't have 1 Gbps faster than the speed of light Internet access with zero drops and jumbo packets (really?)
7) it's your DPC, not mine
8) if it is bad, it is not a bug, it's a feature
9) if it is good, it's not a feature, it's a bug
10) you're using the wrong filter
11) you're using the wrong switch
12) you're using the wrong router
13) you're using the wrong cable
14) 3 wire fans are good for you
15) if it is good for a contester it is good enough for you
16) CW problems? you probably can't send good CW anyway
17) that never happened to me
18) show your support, send money
19) no sidetone on your PC? feature
20) restart DAX every 24 hours? feature
21) only half a slider working? feature
22) but we have pop-outs! bug
23) you don't have a need to know
24) you can't be trusted with bug lists
25) roadmaps? we don't need no roadmaps.
26) we're secure
26) acknowledged

Did I miss anything? You bet.

"You may find it a surprise that there is no standard when communicating over the internet.  Just a pile of comprises.  :) "

Really?

73,
Kev K4VD
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
FRS is the Apple of ham radio gear. It is the coolest stuff around, as long as everything works. If there is a problem, it is obviously the users problem.

Like apple they keep everything closed so it can not easily be troubleshoot. 

Like Apple they say they are open since they publish an API. The problem is that the API does not cover alot of things.

Like Apple they lock down weird things like network settings. 

Unlike Apple, they charge for software upgrades.

My radio was perfectly fine on 1.x.  Now it has usability problems after the $200 upgrade.  Of course I could downgrade but then I am still out of the money.

I am giving it until the New icom is out.

I am seriously thinking of dumping this thing while it still has value and buy the new Icom when it comes out.  Will it be a better radio? Probally not, but one thing is for sure. 25 years from now, as long as the electronics are working, the radio will still be as usable as when it came out. A 6300 will be a boat anchor unless you are willing to run a 25 year old version of windows in a vm or original hardware.

I work in IT and to be honest, I really do not want to mess with IT in order to keep my radio running.
Photo of Clay N9IO

Clay N9IO

  • 631 Posts
  • 169 Reply Likes
David,
Please just open a helpdesk ticket.  Two heads.
Photo of Mike - VE3CKO

Mike - VE3CKO, Elmer

  • 543 Posts
  • 280 Reply Likes
Gee Kevin, that's a whole laundry list of negativity. Can't imagine the time you spent compiling that list. Anyone who has worked in the internet industry knows there was, there is and there will be tons of compromises. There are no standard requirements by law of what equipment to use for every single ISP out there. Not every ISP spends the resources needed for an ideal network. Some have very good system administrators but then there are non qualified people who are administrating when they really shouldn't be. ISPs oversell their bandwidth, ports are blocked, bandwidth is shaped, compressed, limited at various times of day for various reasons. Deals are made between ISPs that compromise. Bandwidth demand is ever increasing and will continue.

Like the roads, bridges, highways and off ramps of the transportation infrastructure, blaming Flexradio is like blaming the car rather than a poorly designed transportation system. You don't have to be a mechanic to see a bottleneck up the road, hell they have apps for that now. However, most people surfing and streaming through buffered technology do not see the problems the internet has, doesn't mean they are not there. This radio application will immediately reveal any lag or compromise of the bandwidth integrity will show in panadapter and connection issues.

I for one am appreciative of VA3MW Michael Walker's IT experience and what he brings to the table to educate and assist those who don't know what really goes on behind the curtain.

A suggestion, maybe developers can dumb it up a bit and spit out a popup window when connection is compromised to the point of disconnection. Create an algorithm that counts how many interruptions and the length of each then or whatever, if it gets to predetermined points display messages like, "Hey, man, sorry to interrupt your QSO but it appears your internet connection somewhere is having some issues".
(Edited)
Photo of Kevin

Kevin

  • 931 Posts
  • 271 Reply Likes
Hi Mike:

That's a fairly typical response here in the community. Basically a few paragraphs of why I'm wrong and white knight treatment for FRS. Not a single mention of any issue on the list.

It took me moments to come up with that list. A few more moments and I could probably double its size. I read almost every post in this community and try to understand problems from the other guy's perspective. Not just mine. So when someone posts a problem they are having and the next five responses are "works for me, must be you" for some reason it makes me want to jump in to the conversation.

While you see this as negativity I see it more as honest (and sometimes constructive though maybe redundant) criticism of a product or a company that I truly hope has a bright future. I said it before, I love the 6500. I respect FRS. I have little love for SSDR.

While I try to have respect for others it doesn't mean that if I hear something silly I keep quiet. One of those quotes that made my jaw drop is "You may find it a surprise that there is no standard when communicating over the internet.  Just a pile of comprises.  :) " I'm not going to give you my resume or convince you I'm an authority that cannot be refuted. In this case I'd simply point to the fact that an IETF RFC has a standards track and that IP, including TCP and UDP, are well documented standards. See: https://www.rfc-editor.org/standards

Maybe if FRS communicated better: bug reports, roadmaps, newsletters, there would be less need for community members to come to its defense. Maybe they think they are dragging oldtimers kicking and screaming into the 21st century and it is a noble cause. Maybe they make things so hidden and blackbox that there really won't be too much difference between picking up a Heil microphone or picking up a Galaxy S8. 

Finally, if a car is built for the road and the roads can't handle it, the car is recalled not the roads. Car analogies (most analogies really) never stand up well. 

Drive safely.

73,
Kev K4VD
Photo of Clay N9IO

Clay N9IO

  • 631 Posts
  • 169 Reply Likes
Hi Kevin,
Please let me reason with you for a moment, seriously.  Would it have killed you to offer a positive possible solution for David rather than jump right in and attack Michael's comment?  That served no good purpose whatsoever.

Michael had stated:
"My  2 cents (and certainly not the solution).  
mike va3mw
(speaking as a user, not an employee)"

Brother you have funny way of showing respect for FRS.
I had a pleasant conversation with Michael on the phone once, nice helpful guy.
I don't doubt for a moment that you love your 6500.  What's not to love?
Why not lighten up a little, life is short and I assume we're all friends here if for any reason our love of these great radios.
I seriously hope we're still friends here after my comments but I can't subscribe to this line of negativity, it's pointless other than to stir people up.  Everything is not an argument Kevin.
Not trying to give you a hard time, just hoping you will listen to reason and help make this community site a more friendly place.

Not sure why am bothering other than belief that this site can be better.
So here I am interjecting my opinion just as you did, forgive me.
Personally I am just a consumer, I come here for information and
occasionally attempt to offer help or encouragement if I can. Nothing more.

Take care.
Photo of Kevin

Kevin

  • 931 Posts
  • 271 Reply Likes
Hi Clay:

Fair enough. I'll unkey the mic for a while.

73,
Kev K4VD
Photo of Clay N9IO

Clay N9IO

  • 631 Posts
  • 169 Reply Likes
Have a good one.
Photo of Tim - W4TME

Tim - W4TME, Customer Experience Manager

  • 9198 Posts
  • 3558 Reply Likes
Michael, VA3MW wrote...

The black box of death is just a symptom of something else and engineering may be able to figure it out once they know what is causing it.


The number of bytes that comprises a frame of display data can, and usually is, greater than 1500 bytes.  We send display data to the client in single frames for rendering.  The display data transport is UDP (VITA-49) and the maximum Ethernet frame size on the Internet is 1518 bytes which include a 1500 byte data payload. Since the display frame data can exceed the maximum size of a single Ethernet frame, the radio's TCP/IP stack fragments the data, meaning that the display data frame is divided up into multiple VITA-49 packets and the client host's TCP/IP stack reassembles them into a single display frame.  IP fragmentation/reassembly is an Internet Protocol (IP) process that breaks datagrams into smaller pieces (fragments) which are reassembled by the receiving host running SmartSDR.  This capability is standard and required in all TCP/IP protocol stacks as per RFCs 791, 815 and 1191.

If somewhere in the connection between the radio (this includes the LAN and the LAN router), the multitude of transport providers on the Internet, and the LAN and LAN router at the client end of the connection does not allow fragmented packets to pass through so that the client host can reassemble them into a single display frame, then you will experience the frozen display and no waterfall data.  

Now, it is very unlikely that the transport providers are not in compliance with RFCs 791 and 815 so that leaves the radio and client ends (endpoints) of the connection as probable areas where the issue is occurring.  Edge routers that are used for these endpoints usually have firewall features and by their very nature block different traffic types.  With some routers, you have control over the features and in others, you have very limited control.  Some client locations prevent fragmented packets as a security feature because, in the past, hackers have used this part of the TCP/IP protocol as a DoS attack vector.  Rather than using a stateful process to inspect the packets they use brute force methods of denying all of them.  Other client locations deny UDP fragments because it represents a possible high bandwidth usage scenario that they want to prevent rather than using a stateful firewall that operates at layer 6 and 7, again, another brute force method to control data flow. If one of these features is preventing the multiple VITA-49 packet fragments from traversing the firewall/router, then the SmartSDR client is not getting the data it needs to reassemble the packet fragments and build a complete frame of display data.  

This behavior is easy to validate.  If you have this issue, start reducing the size of the SmartSDR application so that the panafall's geometry begins to get smaller.  At a certain point, it will reach a size where all of the data for a single display frame is 1500 bytes or less.  In this scenario we do not need multiple VITA-49 packets to construct a display frame; fragmentation is not required and the display will begin rendering.

When we engineer a network feature, we do so in order to achieve the most efficient transfer of data while being compatible with the greatest number of client configurations. In the case of how we transfer display frame data, we chose the method described above because it uses the least amount of CPU resources on the radio where they are finite (the fragmentation/reassembly is performed using the TCP/IP kernel mode daemon) and fragmentation is a standard component of the TCP/IP protocol stack, so it is universal. We have been using this method of rendering display data since before SmartSDR v1.0.0 was released and it has worked flawlessly.

Since a majority of customers who are using SmartLink are not globally experiencing this particular problem, meaning that they connect to their radio from a different location or Internet provider and it functions as intended, our primary design goal has been met.  For those instances where this was not the case, turning off certain firewall/router or client based Internet security features has resolved the problems. This does not mean that every use case will experience success.  Cases, where the radio or the client end of the connection (LANs), are using a variety of firewall policies (either configured or unbeknownst to them which is usually the case) or configuration that would be considered advanced in nature may not operate properly.  In these cases, we will work to identify the cause of the problem, but FlexRadio's support mandate does not extend to advanced network troubleshooting, analysis or issue resolution.  

Since we are aware of the situation, we have defect listed in our bug tracker to investigate possible remedies for datapaths that are blocking VITA-49 fragments.  We have some ideas, but until we investigate and analyze the impact that has on the radio operational performance, we are not ready to commit to an alternate solution or a time frame.  I hope this clears up the nature of this situation at hand.

And I hope David can find a resolution to his issue as he has been provided the key information as to why his issue is occurring.  

David, If you believe the radio is at fault, then open a HelpDesk ticket and send it in for validation.  We'll put in on our LAN and I'll do the client testing myself since both networks are known good running SmartLink daily.  I use my location for worldwide demo purposes, most recently from Germany and Japan.  If there is a problem, we'll correct it.
(Edited)
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
Update. I have switched my entire stack to 1500 mtu on the client site and the switch path was set to 1518 - Layer 3 was set to 1500,  Same problem.

Next I switched the switch to 1522 to make sure I did not have some forgotten vlan in there.  SAME PROBLEM.

I then changed layer 3 to as CISCO ASA I had sitting around... Guess what - SAME PROBLEM.

Then I plugged the radio directly into the ASA - SAME PROBLEM.

I then took the ASA and changed it from my primary COX Business connection to my google fiber home connection.  - SAME PROBLEM.

I then took the ASA and hooked it to my netgear LTE modem that I use as a backup path, and guess what SAME PROBLEM.

Next I took the ethernet cable hooked to the radio and the asa and then repeated the above scenarios.

I am now convinced the problem is either in the software somewhere or the radio is broke.


I will open a ticket tommorow. My problem is that I live on the road and honestly dont have the personal time to troubleshoot IT crap.  

This problem gave me the justification to rework my stack at home and make damn sure everything is current.    I am at wits end.

I really do like this radio.  I want to step up someday, but things like this make me want to pull my hair out.
(Edited)
Photo of David H Hickman

David H Hickman

  • 48 Posts
  • 4 Reply Likes
BTW, thanks for info on the protocol.  As a suggestion put this stuff somewhere. There are hams that want this kind of info.  Once I have this working, I will  show my solution.
Photo of Kevin Hogg

Kevin Hogg

  • 25 Posts
  • 4 Reply Likes

I have a similar problem. It does not work on Brighthouse/Spectrum networks and I have read on here complaints of similar problems from other amateurs using different networks.  I get around it by using a Verizon mifi, which is far from ideal due to poor signal and additional cost. Flex Radio Systems (FRS) blame the ISP the ISP say it is not their fault they are blocking nothing so its FRS fault. 

In my mind Its like designing an app which works on AT&T but not T-Mobile. That would appear to be a design constraint or a  Project/Quality Management issue in that important aspects which should of been captured when assessing the technological environment were missed and not incorporated into the design. 

However there are other aspects of smartlink which are fantastic, I love the ease of connectivity, that is a great job!

I think FRS should consider the limitation and implement a design change asap. In the interim it is only right that consumers are made aware and FRS should put this limitation on their sales documentation i.e. that smartlink software does not work on all networks (perhaps with a list of networks it works on and those it doesn't work on) so that consumers can make an informed decision.


(Edited)
Photo of Steve - N5AC

Steve - N5AC, VP Engineering / CTO

  • 1057 Posts
  • 1097 Reply Likes
Official Response
Tim's comment provides our most current information on this problem.  The incidence is low -- we didn't see this at all during testing of v2.0 which lasted several months with something like 50 people testing in a large number of environments, ISPs, etc.  My personal guess (which is probably wrong) is that it is an issue in a core open source router protocol implementation that rejects fragmented UDP packets.  This source then got passed around like a virus.  I have virtually no data to support this -- it's just a guess.

The unfortunate thing is that the fix we've seen solve the problem has typically been to switch a setting or for the customer to replace their router.  I don't want anyone to have to do either of these things if it's something we can fix.  Our alternative to using the fragmented protocol is write our own fragmenter/assembler on either end and not rely on this part of the UDP/IP protocol.  We have an issue for our engineering team to go investigate this and see if we can fix it (I suspect we can).  My preference would be a fix in our software that doesn't rely on this part of the protocol or checks to see if we can and, if not, ratchets down to our own fragmenter/assembler.  We'll get to this as soon as we can.  If you are having the problem, please enter a help desk ticket and let Tim know and when we believe we have a solution I'd like to send it to you to verify (because we have not see this problem except in one case where we couldn't get a Maestro to play remotely).
Photo of Peter K1PGV

Peter K1PGV, Elmer

  • 553 Posts
  • 322 Reply Likes
Would it not help to use Path MTU Discovery (PMTUD) to determine the largest packet that can travel the circuit between the source and destination? That would avoid the necessity of intermediate systems doing fragmentation, no?

Just a suggestion,

Peter
K1PGV
Photo of Steve - N5AC

Steve - N5AC, VP Engineering / CTO

  • 1057 Posts
  • 1097 Reply Likes
Not sure -- I'll add this to the issue and we can look into it.  From what I understand, the protocol stack splits the packet according to the MTU it sees and this is what causes the issue.  The intermediate nodes don't like the fragmented UDP protocol and discard the packets, even though it's a well established standard.  I believe it was created to free the programmer from fretting about the size of a datagram to handle auto-split based on MTU.  We can, of course, just create packets inside the MTU ourselves and not rely of this protocol.

Incidentally, I just realized I think I owe you an email on another topic.  I'll send you something later today.

This conversation is no longer open for comments or replies.