I have a sagemaker endpoint with two production variants (for A/B testing). After testing, one of the variant gets zero traffic and other gets 100% of the traffic. This results in zero data capturing for the first variant. When model explainability job runs, it searches for captured for the variant with zero traffic as well which causes it to fail.
Is there a way to run model explainability monitoring job with an endpoint having two production variants?
Model Monitor works with endpoint with single variants. If you are looking to do shadow deployment I would refer you look at this blog
I have a Dockerized Django application which have a number of CRON-jobs that need to be executed.
Right now I'm running it with the package Supercronic (which is recommended for running cron-jobs inside containers). This will be deployed on a two servers for redunancy-purposes, i.e. If one goes down the other one need to take over and execute the cron-jobs.
However, the issue is that without any configuration this will result in duplicate cron-jobs being executed, one for each server. I've read that you can set up something called a "lease" for the cron-jobs to retrieve, to avoid duplicates from different servers, but I haven't found any instructions on how to set this up.
Can someone maybe point me in the right direction here?
If you are running Supercron in two different instance, Supercron doesn't know about whether the job gets triggered, Its up to the application to handle the consistency.
You can do it in many ways either controlling the state with File or DB entries or any better way where your docker application can check the status before it start executing the actual process.
I have inherited a system that consists of a couple daemons that asynchronously process messages. I am trying to find a clean way to introduce integration testing into this system with minimal impact/risk on the existing programs. Here is a very simplified overview of their responsibilities:
Process 1 polls a queue for messages, and inserts a row into a DB for each one it dequeues.
Process 2 polls the DB for rows inserted by Process 1, does some calculations, and then deposits a file into a directory on the host and sends an email.
These processes are quite old and complex, and I am strongly inclined to avoid modifying them in any way. What I would like to do is put each of them in a container, and also stand up the dependencies (queue, DB, mail server) in other containers. This part is straightforward, but what I'm unsure about is the best way to orchestrate these tests. Since these processes consume and generate output asynchronously I will need to poll or wait for the expected outcome (mail sent, file created).
Normally I would just write a series of tests in a single test suite of my language of choice (Java, Go, etc), and make the setUp / tearDown hooks responsible for resetting the environment to the desired state. But because these processes have a lot of internal state I am afraid I cannot successfully "clean up" properly after each distinct test. This would be a problem if, for example, one test failed to generate the desired output in a specific period of time so I marked it as failed, but a subsequent test falsely got marked as passed because the original test case actually did output something (albeit much slower than anticipated) that was mistakenly attributed to the subsequent test. For these reasons I feel I need to recreate the world between each test.
In order to do this the only options I can see are:
Use a shell script to actually run my tests -- having it bring up the containers, execute a single test file, and then terminate my containers for each test.
Follow my usual pattern of setUp / tearDown in my existing test framework but call out to docker to terminate and start up the containers between each test.
Am I missing another option? Is there some kind of existing framework or pattern used for this sort of testing?
The situation right now:
Every Monday morning I manually check Jenkins jobs jUnit results that ran over the weekend, using Project Health plugin I can filter on the timeboxed runs. I then copy paste this table into Excel and go over each test case's output log to see what failed and note down the failure cause. Every weekend has another tab in Excel. All this makes tracability a nightmare and causes time consuming manual labor.
What I am looking for (and hoping that already exists to some degree):
A database that stores all failed tests for all jobs I specify. It parses the output log of a failed test case and based on some regex applies a 'tag' e.g. 'Audio' if a test regarding audio is failing. Since everything is in a database I could make or use a frontend that can apply filters at will.
For example, if I want to see all tests regarding audio failing over the weekend (over multiple jobs and multiple runs) I could run a query that returns all entries with the Audio tag.
I'm OK with manually tagging failed tests and the cause, as well as writing my own frontend, is there a way (Jenkins API perhaps?) to grab the failed tests (jUnit format and Jenkins plugin) and create such a system myself if it does not exist?
A good question. Unfortunately, it is very difficult in Jenkins to get such "meta statistics" that spans several jobs. There is no existing solution for that.
Basically, I see two options for getting what you want:
Post-processing Jenkins-internal data to get the statistics that you need.
Feeding a database on-the-fly with build execution data.
The first option basically means automating the tasks that you do manually right now.
you can use external scripting (Python, Perl,...) to process Jenkins-internal data (via REST or CLI APIs, or directly reading on-disk data)
or you run Groovy scripts internally (which will be faster and more powerful)
It's the most direct way to go. However, depending on the statistics that you need and depending on your requirements regarding data persistance , you may want to go for...
The second option: more flexible and completely decoupled from Jenkins' internal data storage. You could implement it by
introducing a Groovy post-build step for all your jobs
that script parses job results and puts data of interest in a custom, external database
Statistics you'd get from querying that database.
Typically, you'd start with the first option. Once requirements grow, you'd slowly migrate to the second one (e.g., by collecting internal data via explicit post-processing scripts, putting that into a database, and then running queries on it). You'll want to cut this migration phase as short as possible, as it eventually requires the effort of implementing both options.
You may want to have a look at couchdb-statistics. It is far from a perfect fit, but at least seems to do partially what you want to achieve.
I am developing a live update for my application. So far, I have created almost all unit tests but I have no idea how to test an specific class that connects to a FTP server and downloads new versions.
To test this class, should I create an FTP test server and use it in my unit tests? If so, how can I make sure this FTP server is always consistent to my tests? Should I create manually every file I will need before the test begin or should I automate this in my Test Class (tear down and setup methods)?
This question also applies to unit testing classes that connects do any kind of server.
EDIT
I am already mocking my ftp class so I dont always need to connect to the ftp server in other tests.
Let me see if I got this right about what Warren said in his comment:
I would argue that once you're talking to a separate app over TCP/IP
we should call that "integration tests". One is no longer testing a
unit or a method, but a system.
When a unit test needs to communicate to another app (that can be a HTTP server or FTP server) is this no longer a unit test but a integration server? If so, am I doing it wrong by trying to use unit testing techniques to create this test? Is it correct to say that I should not unit test this class? It does make sense to me because it seems to be a lot of work for a unit test.
In testing, the purpose is always first to answer the question: what is tested - that is, the scope of the test.
So if you are testing a FTP server implementation, you'll have to create a FTP client.
If you are testing a FTP client, you'll have to create a FTP server.
You'll have therefore to downsize the test extend, until you'll reach an unitary level.
It may be e.g. for your purpose:
Getting a list of the current files installed for the application;
Getting a list of the files available remotely;
Getting a file update;
Checking that a file is correct (checksum?);
and so on...
Each tested item is to have some mocks and stubs. See this article about the difference between the two. In short (AFAIK), a stub is just an emulation object, which always works. And a mock (which should be unique in each test) is the element which may change the test result (pass or fail).
For the exact purpose of a FTP connection, you may e.g. (when testing the client side) have some stubs which return a list of files, and a mock which will test several possible issues of the FTP server (time out, connection lost, wrong content). Then your client side shall react as expected. Your mock may be a true FTP server instance, but which will behave as expected to trigger all potential errors. Typically, each error shall raise an exception, which is to be tracked by the test units, in order to pass/fail each test.
This is a bit difficult to write good testing code. A test-driven approach is a bit time consuming at first, but it is always better in the long term. A good book is here mandatory, or at least some reference articles (like Martin Fowler's as linked above). In Delphi, using interfaces and SOLID principles may help you writing such code, and creating stubs/mocks to write your tests.
From my experiment, every programmer can be sometimes lost in writing tests... good test writing can be more time consuming than feature writing, in some circumstances... you are warned! Each test shall be see as a feature, and its cost shall be evaluated: is it worth it? Is not another test more suitable here? Is my test decoupled from the feature it is testing? Is it not already tested? Am I testing my code, or a third-party/library feature?
Out of the subject, but my two cents: HTTP/1.1 may be a better candidate nowadays than FTP, even for file update. You can resume a HTTP connection, load HTTP content by chunks in parallel, and this protocol is more proxy friendly than FTP. And it is much easier to host some HTTP content than FTP (some FTP servers have also known security issues). Most software updates are performed via HTTP/1.1 these days, not FTP (e.g. Microsoft products or most Linux repositories).
EDIT:
You may argue that you are making integration tests, when you use a remote protocol. It could make sense, but IMHO this is not the same.
To my understanding, integration tests take place when you let all your components work together as with the real application, then check that they are working as expected. My proposal about FTP testing is that you are mocking a FTP server in order to explicitly test all potential issues (timeout, connection or transmission error...). This is something else than integration tests: code coverage is much bigger. And you are only testing one part of the code, not the whole code integration. This is not because you are using some remote connection that you are doing integration tests: this is still unitary testing.
And, of course, integration and system tests shall be performed after unitary tests. But FTP client unitary tests can mock a FTP server, running it locally, but testing all potential issues which may occur in the real big world wide web.
If you are using Indy 10's TIdFTP component, then you can utilize Indy's TIdIOHandlerStream class to fake an FTP connection without actually making a physical connection to a real FTP server.
Create a TStream object, such as TMemoryStream or TStringStream, that contains the FTP responses you expect TIdFTP to receive for all of the commands it sends (use a packet sniffer to capture those beforehand to give you an idea of what you need to include), and place a copy of your update file in the local folder where you would normally download to. Create a TIdIOHandlerStream object and assign the TStream as its ReceiveStream, then assign that IOHandler to the TIdFTP.IOHandler property before calling Connect().
For example:
ResponseData := TStringStream.Create(
'220 Welcome' + EOL +
... + // login responses here, etc...
'150 Opening BINARY mode data connection for filename.ext' + EOL +
'226 Transfer finished' + EOL +
'221 Goodbye' + EOL);
IO := TIdIOHandlerStream.Create(FTP, ResponseData); // TIdIOHandlerStream takes ownership of ResponseData by default
FTP.IOHandler := IO;
FTP.Passive := False; // Passive=True does not work under this setup
FTP.Connect;
try
FTP.Get('filename.ext', 'c:\path\filename.ext');
// copy your test update file to 'c:\path\filename.ext'...
finally
FTP.Disconnect;
end;
Unit tests are supposed to be fast, lightening fast. Anything that slows them down discourages you from wanting to run them.
They are also supposed to be consistent from one run to another. Testing an actual file transfer would introduce the possibility for random failures in your unit tests.
If the class you are testing does nothing more than wrap the api of the ftp library you are using then you've reached one of the boundaries of your application you don't need to unit test it. (Well, sometimes you do. Its called exploratory testing but these are usually thrown away once you get your answer)
If, however, there is any logic in the class you should try to test it in isolation from the actual api. You do this by creating a wrapper for the ftp api. Then in your unit tests you create a test double that can stand in as a replacement for the wrapper. There are lots of variations that go by different names: stub, fake, mock object. Bottom line is you want to make sure your unit tests are isolated from any external influence. A unit test with sporadic behavior is less than useless.
Testing the actual file transfer mechanism should be done in integration testing which is usually run less frequently because its slower. Even in integration testing you'll want to try to control the test environment as much as possible. (i.e. testing with a ftp server on a the local network that is configured to mimic the production server).
And remember, you'll never catch everything up front. Errors will slip through no matter how good the tests are. Just make sure when they do that you add another test to catch them the next time.
I would recommend either buying or checking out a copy of XUnit Test Patterns by Gerard Meszaros. Its a treasure trove of useful information on what/when/how to unit test.
Just borrow the FTP or HTTP Server demo that comes with whatever socket component set you prefer (Indy, ICS, or whatever). Instant test server.
I would put it into a tools folder to go with my unit tests. I might write some code that checks if TestFtpServer.exe is already live, and if not, launch it.
I would keep it out of my unit test app's process memory space, thus the separate process.
Note that by the time you get to FTP server operations, unit testing should really be called "integration testing".
I would not manually create files from my unit test. I would expect that my code should check out from version control, and build, as it is, from a batch file, which runs my test program, which knows about a sub-folder called Tools that contains EXEs and maybe a folder called ServerData and LocalData that could be used to hold the data that is starting out on the server and being transferred down to my local unit test app. Maybe you can hack your demo server to have it terminate a session part way through (when you want to test failures) but I still don't think you're going to get good coverage.
Note If you're doing automatic updates, I think that no amount of unit testing is going to cut it. You need to deal with a lot of potential issues that are internet related. What happens when your hostname doesn't resolve? What happens when a download gets part way through and fails? Automatic-updating is not a great match with the capabilities of unit testing.
Write a couple of focused integration tests for the one component which knows how to communicate with an FTP server. For those tests you will need to start an FTP server before each tests, put there any files needed by the test, and after the test shutdown the server.
With that done, in all other tests you won't use the component which really connects to an FTP server, but you will use a fake or mock version of it (which is backed by some in-memory data structure instead of real files and network sockets). That way you can write unit tests, which don't need an FTP server or network connection, for everything else except the FTP client component.
In addition to those tests, it might be desirable to also have some end-to-end tests which launch the whole program (unlike the component-level focused integration tests) and connect a real FTP server. End-to-end tests can't cover all corner cases (unlike unit tests), but they can help to solve integration issues.