Welcome to the new FlexRadio Community! Please review the new Community Rules and other important new Community information on the Message Board.
If you are having a problem, please refer to the product documentation or check the Help Center for known solutions.
Need technical support from FlexRadio? It's as simple as Creating a HelpDesk ticket.

Flex Lockup on 1.10.16

189101113

Comments

  • Don, VE2HJ
    Don, VE2HJ Member ✭✭
    edited June 2017

    Here Eric answer to my question on that point .

    "To clarify, there are two separate shutdown measures that the radio takes.  If the radio firmware is up and running properly, a single short press of the power button should initiate a shutdown within 5-10 sec.  Otherwise, there is a 60 sec timer in the power control chip that will shut things down if the firmware does not respond.  

    The latter is what indicates the firmware was not running properly.  Holding the power button down for ~4 seconds performs the same shutdown as the end of the 60 sec timer.

  • Roy Laufer
    Roy Laufer Member ✭✭
    edited June 2017
    Here are the results of a little experimentation with my 6700.

    I reflashed to the latest 1.10.16, reset to factory default, and started my 6700 up in the default 20 meter panadapter. It ran crash free for almost a day. I then added a few more HF panadapters on 10M, 17M, and 40M. 

    It ran crash-free for another half a day.

    I then configured the XVTR tab for my DEMI LPNA and opened a 2M panadaptor and populated it with 5 slices on five different repeater frequencies. There were 3 other panadapters open - 10M, 20M, and 40M. NO profiles were imported, or even saved. No transmissions were made - receive only.

    Two odd hours later it locked up and crashed!

    (Your mileage might vary.)

    73,
    Roy AC2GS 
  • Eric Gruff
    Eric Gruff Member ✭✭
    edited September 2019
    I have (at least in my case with Flex 6700 and latest SmartSDR version) confirmed that the Dimension time sync program was the cause of my problems. I've never had an issue even with many other programs in use (multiple instances of JT programs, HRD, etc.), and was using SPTimeSync successfully. Two weeks ago, I was on a business trip to Europe, and operated remotely for the entire week with no lockups, etc. The radio was still fine when I got home. Then, I installed Dimension for time synchronization since it's automatic, and experienced a bunch of lockups when I was home. The radio looked fine (display had no errors and the usual info on it), but SmartSDR had lost connection, and the only way to fix was to hold the power button until it powered down (occasionally with an error message), and then restart the radio.

    I went on vacation a few days ago, and by the time I arrived at my destination, the radio was offline. Fortunately, our housesitter was able to reboot the radio for me, and before that, I uninstalled Dimension. Everything has been great for two days now, and I'm still connected.

    So, I realize this is simply empirical evidence, but may help those who are still having issues, at least if the cause is Dimension. It may not be the "fault" of the program, but associated with the changing PC time? A big thanks to posters in this thread that suggested it as a possible cause - it led me to removing Dimension, which appears to have fixed my problem.

    73 de Eric NC6K


  • Bill W2PKY
    Bill W2PKY Member ✭✭
    edited June 2017
    Eric-
    Interesting comment about time keeping. I was having crashes on 1.10.16 and noticed my computer time was monitored by Windows Net Time. Learned Meinberg would not work if Windows Net Time was active. So turned off Windows time correction and Meinberg took over keeping my computer right on time. I use "Time.IS"  in a browser to show the timing error which is typically  <.01 seconds. So since switching over to Meinberg no crashes. 
    Question for FRS: Is there any communication between the radio and SSDR that is time stamped. If so what happens if the time stamp goes backwards?
  • Roy Laufer
    Roy Laufer Member ✭✭
    edited June 2017
    Well, I am open to new ideas.

    I read about the "Dimension time sync" interaction and decided to test the hypothesis for my particular problem. I uninstalled Dimension and switch back to the latest SSDR firmware with my profiles installed and my 5 2 Meter FM repeaters on the panadapter...

    Five hours later, my 6700 crashed and locked up!

    I'm back (once again) to my last stable firmware, and the knowledge that, at least for me, the problem is this firmware really has a problem with the "magic" involved in getting 2M operation out of the 6700.

    It would have been nice if it was just all Dimension's fault, but that has not been the case for me.

    Your mileage might vary.
  • Jim Runge
    Jim Runge Member ✭✭
    edited June 2017
    No dimension running and have had lockup with tone on 2 6500s and a 6700 just running one slice on SSB.
  • DaveC
    DaveC Member ✭✭
    edited June 2017
    Rick,
    The point is, FRS has not actually been able to reliable reproduce the issue. I am suggesting that stressing the radio might help in making the problem more frequent

    As an engineer, I know  the hardest problems to find, let alone solve, are the ones that happen once in a blue moon.

    The problem with real time systems, it is very difficult to distinguish a FW verse a HW problem. They both act the same.

    This is the type of problem I would love to work on. But I am not part of FRS so all I can do is suggest things to try. 

    Dave KB1WOD


  • Rick W7YP
    Rick W7YP Member ✭✭
    edited June 2017
    Well, DAX crashed during the night and took my PC down with it.  The 6700 appeared to be alive because a single, brief push immediately put it into "Shutting down."  When I rebooted the PC, it crashed again while DAX was loading.  Went into safe mode and uninstalled 1.10.15 BETA. leaving FlexVSP still installed.  System booted normally after that.  Will reinstall 1.10.16.174 and try not to think about how much money I've paid for all of these delightful experiences.
  • Rick W7YP
    Rick W7YP Member ✭✭
    edited June 2017
    I hear you, Dave.  I'm an old EE who's done decades of embedded systems development, most of it designing data networking equipment.  These are hard problems to solve.

    What I'm kicking against isn't the long delay in resolving this; instead, it's the apparent lack of commitment to fixing it right now.  All that's being offered are excuses that it's something on the user's end, whether it's supposedly high latencies in the user's network/PC or something else going on in the PC.

    I'm sorry, but this is a client-server architecture and NOTHING which happens on the client side should ever cause the server (radio) side to crash (segmentation fault), overwriting profiles and who knows what else.  NOTHING!

    When that does happen, YOU HAVE A PROBLEM IN THE SERVER (RADIO).

    I totally agree with you that they're not adequately stressing the systems in their labs and that's why they're not seeing the crashes while dozens of us are.  I know for a fact that they do not have any systems in their lab with an SPE Expert 1.3K-FA and DEMI 2m LDPA connected to the radio's two USB ports, as I do.  I offered to help, at least with testing of SSDR 2.0 alpha/beta, but was flatly told that "the alpha testing is closed."

    Who, when faced with problems like these, doesn't make an exception to do testing that they cannot do themselves?  I've helped many other teams in the past and they jumped at my offer.

    But not FRS!

    I didn't pay nearly 10 grand for my station so that I could spend much of my time grabbing my ankles why repeating "Thou shalt not anger the FRS gods."

  • Tim - W4TME
    Tim - W4TME Administrator, FlexRadio Employee admin
    edited June 2017
    Question for FRS: Is there any communication between the radio and SSDR that is time stamped. If so what happens if the time stamp goes backwards?

    No.  All time stamping is relative, not absolute so a clock that is not synced will not make any difference.  And the types of things timed are the network quality metrics, nothing critical to the operation of the radio.
  • Rick W7YP
    Rick W7YP Member ✭✭
    edited June 2017
    Sure sounds like a "crash" to me.  I think what Tim was trying to say is that a "crash" to them is a case of the radio going down due to a segmentation fault.

    I hope this discussion isn't going to degrade into an argument over semantics.  Whether there are different failure scenarios, which there likely are, the radio has gone down and the effect on the consumer is the same.

    These must be resolved soon and every step taken to resolve it, even if it means reaching out more proactively to users like us that are seeing this regularly.  They clearly haven't succeeded in recreating the operating environments that we have; otherwise, they'd be seeing these lockups in their lab.  I know for a fact that they don't have my setup in their lab, which consists of a 6700 with both an SPE Expert 1.3K-FA and a DEMI 2m LDPA connected to the radio's USB ports.

    If this isn't a firmware problem, then it's a hardware problem and covered under warranty.  If this is a hardware problem and the incident rate is as low as they insist it to be, then try swapping out the radios of 2 or 3 people who see this problem regularly and see if it goes away.  If it does, then take the hit and do warranty replacements.  In the end, the cost of that could easily be much less than the sales hit for failing to act quickly.

    The longer they delay, the hotter everyone gets and the more the word spreads that you might want to avoid the Signature Series for a while.  

    I am supposed to give a presentation on my 6700 to our local ham club in August.  I'm sorely tempted to give a FULLY honest presentation.

  • Bill W2PKY
    Bill W2PKY Member ✭✭
    edited June 2017
    Tim-
    Thanks for the clarification, so looks like we can exclude time keeping software from causing the radio to crash. 
  • Rick W7YP
    Rick W7YP Member ✭✭
    edited June 2017
    I find that very interesting, Roy, as all of my crashes have occurred since I updated to 1.10.16 while also adding the DEMI 2m LPDA and an SPE Expert 1.3K-FA into the picture.   I have run for 4 days with that setup, but only one time.  Most of the time the radio would crash in less than 24 hours.

    As far as I've been able to determine, since the LDPA interface is via a BIT cable, the presence of that interface is not likely to be the source of the problem.  The fact that we're running several slice receivers on 2m and in FM mode is likely stressing the radio differently than is the case with most of the radios in FRS' lab.

    I don't know how many, if any, of their alpha/beta testers have an LDPA in their setup.  I did get confirmation that FRS' lab doesn't have a radio with my setup.  I've offered to help do some late testing of the SSDR 2.0 alpha/beta, but was flatly turned down.

    That certainly doesn't install confidence in the level of support that we can count on.
  • Rick W7YP
    Rick W7YP Member ✭✭
    edited June 2017
    No Dimension here, either; however, I also have the DEMI LDPA and run a panadapter with several slice repeaters covering our local VHF repeaters.  Never saw the crash before I added the LDPA, but that may simply be due to the fact that I was running 1.9 before I added the LDPA and had to upgrade to 1.10 to get the necessary support for the LDPA.

    I suspect that few, if any, of the radios in FRS' lab are stress-testing 2m FM mode, which is one thing that's unique about 2m support versus SSB and CW in HF modes.

    In any event, I can't stress this enough:  This is a CLIENT-SERVER architecture and what happens on the client side SHOULD NEVER CAUSE THE SERVER (RADIO) TO CRASH!

    It's unfair to blame Dimension, network latencies, or anything else OUTSIDE of the radio for what is clearly happening INSIDE the radio.

  • Roy Laufer
    Roy Laufer Member ✭✭
    edited June 2017
    I have the exact same setup - 6700, DEMI LDPA, Expert 1.3K-FA!
    (I doubt that our mutual problems are merely coincidental.)

    I doubt that it is the USB signals (my DEMI is the older one without USB controls, only the 1.3K-FA has a USB connection to my 6700).

    FRS used a few technical "tricks" to get their DACs at 2M frequencies. There has been a "bug" on their list regarding 2M slices not showing a signal. I regularly select an input other than XVTR and then re-select XVTR, to get my 2M reception back. This has been documented from a very early version of SSDR and there is no cure in sight.

    I think the firmware has to be very mindful of using the 2M slice and if you don't, the whole thing crashes. Since few people use their 6700 as an All-In-One there is relatively little, in the way of complaints.

    There are probably other ways to crash SSDR, but using 2M FM repeater receivers seems to be my particular way.

    Perhaps with this new data FRS will borrow someone's DEMI LDPA and fix this bug????????????
  • Frederic HB9CQK
    Frederic HB9CQK Member ✭✭
    edited June 2017
    Hi Rick, I could not agree more with you regarding the client should never be able to crash the radio. I think FRS support is not playing a blame game here. My understanding is that they are desperately looking for a "reliable" trigger that causes the radios to crash, so they can find a way to fix it on the radio side. My "long button press recovery crashes" ALL happened without any USB connections to the radio, but with many active slices using JTDX with my 6700. I used several different time synch programs - does not make a difference. I use IP ports to contol the slices, no virtual COM ports The last 10 days or so I have not had a crash, but I am using the radio differently: My shack is 33C (do not ask me for F - it is too warm for me!), so I use the WLAN and run SINGLE slices with JTDX and my MacBook. Only a few crash free days however does not mean that this really makes a difference. I am not surprised that the beta gave you troubles. For me this was worse than the release version.
    73, Frédéric, HB9CQK
  • Frederic HB9CQK
    Frederic HB9CQK Member ✭✭
    edited June 2017
    Oh, I forgot: I never used 2m with my 6700!
  • Mike VE3CKO
    Mike VE3CKO Member ✭✭✭
    edited June 2017
    My 6700 is still crashing occasionally and I have not yet removed Dimensions. Will try that. I use BKTimeSync by IZ2BKT
    
  • KS0CW
    KS0CW Member ✭✭
    edited June 2017

    FWIW... whatever is causing the instability under 1.10.16,  doesn't appear to rear its head under 1.9.13 (I have rolled between the two versions on two clients)-- or at least not @ my QTH.

  • Eric Gruff
    Eric Gruff Member ✭✭
    edited June 2017
    Just wanted to add that I don't have the GPS option, nor was I using 2 M/transverter. I also want to be clear that I am NOT blaming Dimension, but another day has passed successfully with no crashes, which tells me that something about the combination of my setup with Dimension is likely the root cause of my crashes. 

    I should also note that when my radio crashed under the circumstances I described, there was no sound (I have heard that high-pitched tone in the past with a 6300 I used to have). Just trying to provide as much info as I can to help everyone solve the issue.
  • DaveC
    DaveC Member ✭✭
    edited May 2020
    I don't think the peripheral concerns at the cause of any crashes per se.


    Here is my system:

    6300 direct connect to PC. no other devices connected to computer Ethernet connector.
    Del OptiPlex 740 running windows 10.( internet connection through USB WiFi dongle and wireless router)
    running in CW mode.
    one slice opened
    no DAX

    Can't get much simpler. 

    Since the 6300 has different hardware then the 6500 and 6700, I am less likely to point the finger at the hardware.

    My gut felling is this is a asynchronous timing issue in the software. such as missing a flag or missing an interrupt. something along those lines.

    At my company the problem closes to the customer gets the most attention. In other words "All hands on deck!" for customer problems and everything else put on hold.


    I will say one more thing:

    A good reputation cost money, a bad one comes for free.

    Dave KB1WOD

  • Rick W7YP
    Rick W7YP Member ✭✭
    edited June 2017
    Kevin, 1.9.13 seems to be stable for those that have rolled back to it.  That's what I was running without any crashes.  Then I added the DEMI 2m LDPA and SPE Expert 1.3K-FA to the picture, both connected to the 6700 via its 2 USB ports.  Support for those accessories required an update to 1.10, and that's when things began to unravel for me.
  • Rick W7YP
    Rick W7YP Member ✭✭
    edited June 2017
    Eric, are you running Windows or IOS?  If Windows, what version and release?
  • Rick W7YP
    Rick W7YP Member ✭✭
    edited June 2017
    Dave, I agree with you 100% that peripheral concerns aren't the cause of the problem.  Some things on the client side may change the timing of events within the radio, which ultimately precipitates the crash within the radio.  A number of things can produce that timing difference, easily leading to the belief that this NTP routine or that GPS time sync routine, or network latency or whatever is the 'trigger.'

    But correlation doesn't always equal causation and your very simple setup shows that quite conclusively.  It's clear that this can happen very randomly and its quite possible that certain radios are more susceptible to it due to timing differences within each of them.

    I think you're on the right track about this being a timing issue or what we frequently referred to as a "race condition."  Based upon the fact that the 'true' crash, as Tim would call it, is due to a segmentation fault, the problem is either causing an attempt to execute code in an invalid memory space or code with a bad data pointer is attempting a read or write to invalid memory space.

    My decades of experience doing embedded systems has educated my "gut" to suspect that this is the latter case, as I've seen far more of them than the former.  The former occurs when attempting to "jump through a pointer" to a firmware routine when the pointer hasn't been properly initialized or got corrupted.

    The latter often occurs in firmware/software that uses lots of string or circular buffers, where "put" and "take" pointers are being updated and used by asynchronously executing code.  I can guarantee that these radios use lots of these kinds of buffers.  Unless the buffer pointer manipulation routines are written with bullet-proof control over accessing of the buffers' control structures, sooner or later two asynchronous events occur which cause pointer corruption and BOOM, down she goes.

    These can be quite difficult to find but I've found some in the past just by doing a very close scrutiny of the buffer access control routines and associated hardware support (e.g., "semaphores.")

    Rick, W7YP
  • Eric Gruff
    Eric Gruff Member ✭✭
    edited June 2017
    Windows 10 Home v1607 release 14383.1198
  • k3Tim
    k3Tim Member ✭✭✭
    edited June 2017
    Any modern OS worth it's salt would 'catch' a memory pointer dereference to nonexistent memory or other bad actions such as divide by zero. I was debugging a Android based camera a couple weeks back and a div/0 was caught, logged and the stack dump (ie. core dump for the OT'ers) unwound to show exactly the line of source code causing the fault.
    BUT as Tim mentioned the logging program gets locked out also. w/o a log, it's all guess work. A useful tool in this case is the Lauterbach debugger that uses h/w regs on the CPU along with JTAG to log the full information even after the fact. I've seen some talented Engineers work with these to find strange bugs. I'm convinced one of these in the hands of a sharp person could become  a modern version of "Have Gun - Will Travel" as they are expensive and not easily learned.

    Another useful tool is a good source code analyzer.

    ---
    now for some info to shed some light on the problem (doubtful)
    A 6500 is in use at this station and about 4 full lock ups have been observed so far. The radio runs 24/7, mainly SWL and some CW. To test DaX interaction all 4 slices were activated with DaX, and some FLDiGI / WSJT. Even turned on a couple Dax I/Q channels with nothing attached. There was no lockup with this test case, after several days. Not very scientific but it seems plugging cables in the switch the radio runs on would eventually cause a lock but since that might take several hours this doesn't seem reasonable. Switching cables was to put Maestro on the switch and remove it. Sometimes lock was on the Win-10 machine and sometimes when using Maestro. Twice locks happened late at night when there was no activity. Placing the 6500, PC and Maestro on an independent switch and leaving cable alone and no locks observed since. This switch is very old and made to Home Theater setup.

    Putting it all in perspective, this doesn't even qualify as a minor nuisance here, a minute or two to reboot and the station is back on the air.

    The 6500 has been converted to 'man pack' portable operation in preparation for Field Day (along with Maestro),  Will post some images if anyway would like.

    Regards,.

    _..--
    TiM

    k3Tim now portable 6

  • EA4GLI
    EA4GLI Member ✭✭✭
    edited June 2017
    I have the 6700, 1.3k, 2m DEMI amp and 70cm demi xvtr. Using both USB ports, one has the demi the other a hub for the 2 1.3k CATs and 1 ThumbDV for dstar.

    No freezes.

    I use windows 7 64bit.

    Anyone with windows 7 suffering the issue? Just adding my 2 cents in a setup that doesn't suffer the problem. I use 1.10.16.
  • Rick W7YP
    Rick W7YP Member ✭✭
    edited June 2017
    Considering its relatively low cost, FRS should buy one or two of the LDPAs for their lab.
  • Norm - W7CK
    Norm - W7CK Member ✭✭
    edited June 2017
    As far as I know, the 6700 is the only rig that can use the LDPA on native 2m.  The 6700 is now 5 years old and I doubt there is much interest in fixing the problems.  I also doubt there will be another Flex rig in the near future with 2m built in.

    I've pretty much given up using 2m FM on my 6700.  When having any other panadapters open at the same time it has difficulty changing bands on multiple panadapter with a 2m panadapter open.  The lack of a good all mode squelch, constant bugs showing up when changing bands and no repeater tone suppression of receive signals pretty much keeps me off of the repeaters.  I do use it for 2m SSB though there isn't much activity down there.

    I have to count myself as lucky..  I've only had a few lock ups on my 6700 where I was forced to do a hard reset.  Due to the COM (in-use) issue that has been around for months and still not fixed, I am still using an older version of SmartSDR.

    I love my 6700, but my patience is starting to wear thin.  When I purchased my rig, I knew the software was not quite complete.  I had no idea I would still be plagued with these types of problems 5 years later.

    I have already solved the remote operations by instituting a VPN, so I really don't need that functionality. Bugs have actually kept me from using my rig remotely. Instead I've been using an IC-7100 for remote.  I just want the basic radio to work as it should and to include the basic functions nearly every other radio built in the last 20 years has.  The bells and whistles should follow basic functionality and major bugs.

    Just one software developer (or 2 hrs per day) dedicated to clean up and bug fixes should of been able to take care of most of these issues and would have gone a long way in keeping customers happier.  Bug fixes should have been released in between normal software updates.  Instead, we've had to resort to several work-a-rounds.  5 years is too long.

    While I'll most likely keep my 6700 for many more years,  I almost hate to admit that I'm now looking for a 2nd SDR.   Next time, I won't be sold on what the rig will be able to do in the future.


  • mikeatthebeach .
    mikeatthebeach . Member ✭✭
    edited June 2017
    Will think twice before any more $$ for Flex from me for any of its products with so many bugs, & lockups with my Flex6700
    7 mike

Leave a Comment

Rich Text Editor. To edit a paragraph's style, hit tab to get to the paragraph menu. From there you will be able to pick one style. Nothing defaults to paragraph. An inline formatting menu will show up when you select text. Hit tab to get into that menu. Some elements, such as rich link embeds, images, loading indicators, and error messages may get inserted into the editor. You may navigate to these using the arrow keys inside of the editor and delete them with the delete or backspace key.