Service Hooks messages stuck in queue or processing state - tfs

After upgrade to TFS 2018 Update 2 (with Azure DevOps 2019 still persisting), some service hooks messages are not sent correctly but are stuck in the queue with status Queued or Processing (see the image). Messages in Processing state appear to be delivered correctly to the receiver, the ones in Queued state not. Service hooks are working for 1-2 days correctly after a restart, next days are more and more buggy. Receiver side is ok, many times tested.
This happens on all service hooks (some are HTTP webhooks, other posts to Teams), in 2 of our 3 collections. On the 3rd collection, it works without issues. Disable/Enable or recreating the hook doesn't help. Is there any way how to debug service hooks or some log.
Also reported here. Received several patches from MS but with the same result. Installed Azure DevOps 2019 with the same results.
Any help would be appreciated.

The answer is - this is a Microsoft and TFS problem. After several months of communication with Microsoft and trying several patches for this problem, finally, the right patch arrived. And the problem is solved.
The patch is non-public now, so if you are experiencing a similar problem, ask Microsoft support.
Edit: Problem is fixed also in official release 2019.0.1. See Release notes

Related

MQTT Subscribe / OTA Update Deep Sleep / ESP32 / FreeRTOS

The goal is to receive messages over MQTT in an IoT device that comes out of deep sleep periodically. The exact same considerations exist for OTA update as for any other parameter update. In my case, ultimately, I want to use this for both.
Progress
It runs
The device wakes for about 15 seconds. If during that time, I publish a bunch of messages to the relevant topic, the message arrived successfully. Inside the AWS console I can publish to :
$aws/things/<device-name>/shadow/update/delta
{
"state":{
"desired":{
"output":true
}
}
}
And the delta callback function runs for 'output'. Great but no practical use to anyone.
IoT Job
I created a custom AWS IoT job in the console in an effort to overcome the problem. My thinking was that it might retain the message to ensure delivery. I've been running the job for the past half hour but so far nothing has come through. It had a 20 timeout but is still stuck in queued, not even in progress yet... So, there is clearly a flaw in this approach.
AWS CLI test
Just for completeness, I've attempted to fire off the MQTT message from the console. It has the benefit that you can specify the QOS, (in theory) ensuring that it gets delivered at least once.
aws iot-data publish --topic "$aws/things/<device-name>/shadow/update/delta" --qos 1 --payload file://Downloads/outputTrue.json --cli-binary-format raw-in-base64-out
But oddly this didn't seem to work at all. I didn't see the message arrive at the broker at all: subscribing in the console test.
AWS IoT Core does not support retained messages, see here.
The MQTT specification provides a provision for the publisher to request that the broker retain the last message sent to a topic and send it to all future topic subscribers. AWS IoT doesn't support retained messages. If a request is made to retain messages, the connection is disconnected.
As the wake-up times are perriodically, a possible approach could be to publish the next wake-up slot of your device in a separate topic where your backend is listening to. Your backend will then publish the desired information to your device-topic once the slot opens up.
Of course this approach is quite fragile concerning latency and network stability.
Time to share the answer I found from piecing together numerous posts and reaching out to the very helpful AWS support team. This link is the one that really covers it:
https://docs.aws.amazon.com/iot/latest/developerguide/jobs-devices.html#jobs-workflow-device-online
My summarised pseudo code is :
1. init() & connect() to mqtt as before.
2. Subscribe to the following topics & create callback function for each:
a. Get pending.
b. Notify next.
c. Get next.
d. Update rejected.
e. Update accepted.
3. Create Publish topics:
a. Get pending.
b. Get Next.
4. Pending topics = optional. But necessary to handle many tasks and select between them.
5. Aws-iot-jobs-describe() to publish a request for the next job. It links up to the notify next callback (somehow).
6. In the callback, grab job document, execute job & report Success / Failure.
7. Done.
There is a helpful example in esp-aws-iot/samples/linux/jobs-samples/jobs_sample.c. You need to copy some of the constants over from the sample aws_iot_config.h.
Once you've done all of this, you are able to use AWS Jobs to manage your OTA roll out, which was the original intent.

How to schedule an on-premise Azure DevOps build to run every 5 minutes?

Never mind the rationale, I have a case where a build needs to run every 5 minutes. On-premise installation does not support schedules in the YAML.
So, how do we do it? I can probably use the REST Api, but that sucks, because it seems either I create a one-off script or a script for very simple type of schedules. Building a reusable solution, that could be used in general for other builds seems to be involved. So, instead of concentrating on my business I need to go sideways and cover for the deficiencies of the on-premise version of Azure DevOps.
I wonder if there is a better way.
Understand your concern. However, this is not supported at present with on-premise TFS sever.
The UI for defining time-based build triggers isn't flexible enough. It can only support fixed times on days of the week.
Just as you have pointed out in the comment, we have a need to run a build every 5 minutes which requires us to create 288 schedules which is tedious.
Actually, this has already been a user voice.
Scheduled builds - More flexible timing configuration
https://developercommunity.visualstudio.com/idea/365630/scheduled-builds-more-flexible-timing-configuratio.html
Multiple persons commented and echoed. After go through the marketplace, haven't found a pretty appropriate workaround. Sorry for any inconvenience. You could monitor the status of above user voice.

TFS 2015 Intermittent connection issues

For the last few weeks our office has experienced intermittent and fairly crippling connection issues to our on premises TFS 2015 (w/update3). When it occurs visual studio basically becomes useless and web view pages of TFS either don't load the page, load the header toolbars or take several minutes and eventually open. Queries take minutes to run and so on. Then suddenly all will be well and it works again perfectly.
There are no errors shown when this happens, either to the user or in the TFS application event logs. The system is not overloaded on any resources. I have tried various things like; rebooting (obviously!), iisrest, cleared cache on app tier and who knows what else at this point.
Are there any other logs I could be looking at or things I could try to diagnose?
Worth noting: Users have all recently migrated to a new domain but the TFS servers are still on the old domain. However we had migrated in my office long before these issues occurred. Other offices who connect in have only recently migrated into new domain.
System setup is VMware 6.0, TFS app tier with separate SQL data tier and analysis database.
According to your description: There are no errors shown when this happens, either to the user or in the TFS application event logs.
Agree with Daniel in the comment, this kind of issue should not related to TFS server side. If the server always be slowly, this should be a performance issue. However, you said suddenly all will be well and it works again perfectly. Then the most possibility should be network related issue. Suggest your team use some Network Analyzer Tools to trouble shoot it.
For example, double check the DNS related area. Your TFS server is on one domain and some users are on another. First make sure the domains are trusted each other, double check that if the slow performance was being caused by an authentication issue.
One of the fix is to have users log in using the full domain name. A sample: if they are currently logging in with DEV\MyUserAccount, then they should instead log in with DEV.COM\MyUserAccount.
It has something to so with how the TFS server is looking up the accounts when a short domain name is used. It is pre-pending the name to all of the dns suffixes which in turn ends up creating bad ones and causes delays as it's not finding any valid domains.
Besides, about the performance issue, you could also take a look at this great answer from jessehouwing.

Recurring job in Hangfire works intermittently

I have 3 websites configured in IIS which use the same application pool. Each use the same code base (by nature the database is different for each client) and execute a hangfire recurring job each day. Now for 2 of the websites I don't have any problems but for one of the websites, the job does not run each day. Since the job starts immediately when a user access the website, this makes me think that the application pool is suspended and it is "awaken" when the user access the website.
I have already implemented the instructions http://docs.hangfire.io/en/latest/deployment-to-production/making-aspnet-app-always-running.html so that the application is always running. As I mentioned it works fine for the other 2 and it is just for 1 website where it does not work always. Has anybody else encountered such things before? Or does Hangfire is showing signs of instability where the same code runs perfectly fine for 2 and intermittently for 1.
Thanks
I asked this question on the hangfire forum and someone suggested the server itself did not have reosurces enough to run everything and would force-sleep inactive apps even when told not to in the config. Although there was nothing to suggest anything supporting or contradict it. I thought this was the case as the problem mostly occurred on weekend. What I am doing now is to ping the application every hour so the application pool remains active - this mechanism is incorporated within the website and is also scheduled through Hangfire. This has solved the problem and I have not had a single failure since. See https://discuss.hangfire.io/t/recurring-job-does-not-run-sometimes/1860 for further details.

.NET 1.1 Windows service doesnt start due to leap year

We have a few backend windows services written in .net framework 1.1 running and on December 31st 2008 around 5.00 pm (EST) we stopped these services to run some year end reports.After the reports were run , we tried to start the services and the moment we attempt , it would give an error "Service did not start in a timely fashion".We tried every possible things that google came up with.Service Packs , WinDbg , GFlags etc etc.
Finally we called in Microsoft support around 10.30pm(we had premium support contract with Microsoft) and they started collecting all kinds of server data and around 12.05 am (EST) the services starting successfully with no issues.We hadn't done anything different or special to get it started.The whole team was stumped as to what was happening and equally glad that it was working.
The conclusion is Microsoft support team thinks that this could very well be something to do with the .NET Framework 1.1 namepsace System.ServiceProcess.ServiceBase inability to support leap year.They havent confirmed it yet and are still investigating.
I will keep posted as and when I have updates form MSFT support.I was planning for a dreadful 2009 start but gladly it all ended fine.
I wonder if the bug is related to the one that causes Zune 30GBs to lock up on 31 December in leap years: it tests for day>365 in one place and day>366 in another and as a result when day=366 it falls through the second test and goes in to an infinite loop.

Resources