How to configure CUPS to reject bad print jobs (<5k)? - printing

I have an application team that doesn't know how to prevent their application from sending empty print jobs to the queue. This is happening randomly, but at least once a minute.
Is there a way I can configure CUPS to reject empty print jobs, or print jobs that are below, say 5k? All of the bad jobs are 3k which look like header info like time, date, sender, name of file and so on. Just no content.

It is not possible with standard tools or settings provided by CUPS itself to require a minimum print jobfile size.
Only the contrary, setting the maximum print jobfile size is available This is controlled by making use of a LimitRequestBody setting in the cupsd.conf configuration file. By default, CUPS silently uses LimitRequestBody 0, which allows unlimited jobfile sizes.
To achieve what you want is easily possible via various methods. Each of these however would require some "hacking", writing your own filters for CUPS. (CUPS filters can be written in any language, even Bash/Shell.)
Since not enough details are available about your specific setup, it is not possible to suggest which of the many options would be best for your situation and to specify any details about it. It would be best if you hired somebody with enough CUPS expertise to advise you. I guess it can be done in a job of only a few hours....

Related

Locust - how to delay collection of RPS data until all threads have started

Scenario
locust test with gradual spawn-rate, chart looks like a 45-degree angle.
I would like to know the RPS of the system while all threads are running.
The out-of-the-box RPS value from locust will include RPS values from the beginning of the run when there were fewer threads.
How can I customize my locust script to start calculating RPS from when all threads are running?
Is this a reasonable load-test practice?
An alternative option would be to "simulate reality" as much as possible (and in the real word there is ramp-up when the system starts up). To get a more representative RPS value, run the test longer.
There are many reasons why you want to pay attention to what your system can handle while new load is being added. There can be performance problems accepting the connection, for example, if you have improper or older SSL/TLS settings or libraries. In some instances having new load come up can affect users already connected to and using your system. You might even have additional server logic that happens when a new connection is accepted. In short, you should go with 3) above.
However, enough people like to ignore or gloss over what things look like during ramp up that Locust does have a configuration option --reset-stats that will automatically reset all collected stats once all spawning has completed so it appears as if the load test started with all users connected instantaneously. That should give you what you were asking for.

Adobe Analytics | Merge data from multiple report suites

We are capturing information for consumer sites in multiple different report suites.
Is it possible to merge all these data to a parent report suite without adding that parent report suite's account id in s_account variable?
For example
Site 1 uses report-suite1
s_account = "report-suite1";
Site 2 uses report-suite2
s_account = "report-suite2"
Instead of using
s_account = "report-suite1,report-suite2"
is it possible to merge the data to a 3rd virtual account from the Reports console itself?
The only way you can route data to a separate fully fledged report suite is either via javascript (e.g. setting s_account as you have shown in your post), or to ask Adobe To create a VISTA rule.
You didn't state your reasons for not wanting to throw a "global" rsid into your js code. Is it because you don't have the technical resources/ability to do it? If so, and if you want a full 3rd rsid for all the data to go to, then you can ask Adobe to create a VISTA rule. It should be fairly easy for them to setup, but they will charge you for it. And I think they will create one for each report suite. I don't generally recommend going this route unless you really have to, though. Mostly because the cost, but also because you don't have personal visibility into it.
Alternatively, if you do have the tech resources to update the js code, but the cost of throwing another rsid into the mix is an issue (from extra server hits), then you may want to consider replacing all of your report suites with a single global report suite, e.g.
s_account='report-global';
Then, create a Virtual Report Suite for each site. You can go to Components > Virtual Report Suites to set them up. The TL;DR is you create them by pointing at your report-global rsid as the source and then creating a segment based off something unique to the site (e.g. the domain, or maybe some eVar with a site-specific value).
The major downside to going the virtual report suite route is historical data from your previous report suites will not be available in the same place as this new global report suite and its virtual report suites. But it's a "one time migration" thing, and the historical data won't be lost; you'll just have some extra work on your end referencing it in the old rsids, esp if you want to compare historical to current in the new (virtual) risds.
The 2nd major thing to consider is unique limits. Not sure how much traffic / unique values vars get on your sites, but there is a monthly unique value limit you may have to consider with all of the sites going to the same report suite. Beyond looking at tricks to make values less unique on a case by case basis (e.g. removing query param string from URLs), there isn't a good way to solve for this except to stick with separate rsids. Well.. Adobe will increase unique limit on certain vars if you ask them, but it will cost you..
Another alternative to consider is a Rollup report suite. If you go to Admin > Report Suites, where your current report suites are listed. To the left you should see Rollups and an Add link next to it. This will create a Rollup report suite made up of data from one or more report suites.
Note though that a Rollup report suite is not the same as full fledged report suite. Please refer to the link above for full details/limitations, but the main benefit is it won't cost you anything except the couple of minutes to set it up in the interface. But the limitations of it.. the main points of note are you only get aggregated data, data is not deduped between the rsids, and many reports are limited or not available. In practice, I rarely ever see anybody actually go this route because it's too limited. But hey, maybe it's good enough for you.

Debugging Amazon SQS consumers

I'm working with a PHP frontend which connects to a distributed back end, using Amazon SQS and a variety of message types and message consumers. I'm trying to come up with a way to safely debug those consumers, as we don't want message handlers with new, untested code consuming end-user messages, risking the messages being lost or incorrectly processed.
The actual message queue names are hardcoded as PHP constants in a class, so my first tactic was to create two different sets of queues, one for production and another for debugging, and to externalise the queue name constants into two different files. Depending on whether our debug condition is true or not, I wanted to include one or the other of those constant definitions and assign the constants in the included file to the class constants which currently have the names hardcoded.
This doesn't seem to work though because constants seem to act like class variables in PHP whereas I am trying to assign the values like instance variables. The next tactic was to see if there was anything on Amazon's side that would allow us to debug our message consumers transparently without adding lots of hacks to our code, but I couldn't see anything there that facilitated this. I'd love to know if anyone else has experienced (and ideally, solved this problem)
SQS doesn't provide a way to inspect the contents of messages in the queue, or for the sender to see if any consumers are failing to process messages.
A common approach to this problem would be to set up two sets of queues as you suggest and have the producer post the same message onto both queues. That way you can debug your code against a stream of production messages without affecting the actual production queue.
I'd recommend moving the decision of which queue to use out of your code and into config, and then deploy different config files to your development boxes vs your production boxes. The risk is always that a development box ends up talking to production systems, so having a single consistent approach to configuring those end-points across all your code is much less risky that doing it on an ad-hoc basis each time you call out to a service.
I'd also recommend putting your production and development queues in different AWS accounts with different access credentials. That way you can give your production account permission to publish to the development account's queue, but you can guarantee that your development systems can't read from the production queue.

Best way to run rails with long delays

I'm writing a Rails web service that interacts with various pieces of hardware scattered throughout the country.
When a call is made to the web service, the Rails app then attempts to contact the appropriate piece of hardware, get the needed information, and reply to the web client. The time between the client's call and the reply may be up to 10 seconds, depending upon lots of factors.
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
I basically see two options. Either run JRuby and use multithreading or else run several regular Ruby instances and hope that not many people try to use the service at a time. JRuby seems like the much better solution, but it still doesn't seem to be mainstream and have out of the box support at Heroku and EngineYard. The multiple instance solution seems like a total kludge.
1) Am I right about my two options? Is there a better one I'm missing?
2) Is there an easy deployment option for JRuby?
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
From an engineering perspective, this seems like it would be the best alternative.
Why don't you want to do it?
There's a third option: If you host your Rails app with Passenger and enable global queueing, you can do this transparently. I have some actions that take several minutes, with no issues (caveat: some browsers may time out, but that may not be a concern for you).
If you're worried about browser timeout, or you cannot control the deployment environment, you may want to process it in the background:
User requests data
You enter request into a queue
Your web service returns a "ticket" identifier to check the progress
A background process processes the jobs in the queue
The user polls back, referencing the "ticket" id
As far as hosting in JRuby, I've deployed a couple of small internal applications using the glassfish gem, but I'm not sure how much I would trust it for customer-facing apps. Just make sure you run config.threadsafe! in production.rb. I've heard good things about Trinidad, too.
You can also run the web service call in a delayed background job so that it's not hogging up a web-server and can even be run on a separate physical box. This is also a much more scaleable approach. If you make the web call using AJAX then you can ping the server every second or two to see if your results are ready, that way your client is not held in limbo while the results are being calculated and the request does not time out.

Using Erlang, how should I distribute load amongst a cluster?

I was looking at the slave/pool modules and it seems similar to what I
want, but it also seems like I have a single point of failure in my
application (if the master node goes down).
The client has a list of gateways (for the sake of fallback - all do
the same thing) which accept connections, and one is chosen from
randomly by the client. When the client connects all nodes are
examined to see which has the least load and then the IP of the least-
loaded server is forwarded back to the client. The client then
connects to this server and everything is executed there.
In summary, I want all nodes to act as both gateways and to actually
process client requests. The load balancing is only done when the
client initially connects - all of the actual packets and processed on
the client's "home" node.
How would I do this?
I don't know if there is this modules implemented yet but what I can say, load balance is overrated. What I can argue is, random placing of jobs is best bet unless you know far more information how load will come in future and in most of cases you really doesn't. What you wrote:
When the client connects all nodes are examined to see which has the least load and then the IP of the least- loaded server is forwarded back to the client.
How you know that all those least loaded node will not be highest loaded just in next ms? How you know that all those high loaded nodes which you will not include in list will not drop load just in next ms? You really can't know it unless you have very rare case.
Just measure (or compute) your node's performance and set node's probability be chosen depend of it. Choose node randomly regardless of current load. Use this as initial approach. When you set it up, then you can try make up some more sophisticated algorithm. I bet that it will be very hard work to beat this initial approach. Trust me, very hard.
Edit: To be more clear in one subtle detail, I strongly argue that you can't predict future load from current and historical load but you should use knowledge about tasks durations probability and current decomposition of task's lifetime. This work is so hard to try achieve.
The purpose of a supervision tree is to manage the processes not necessarily forward requests. There is no reason you couldn't use different code to send requests directly to members of the list of available processes. See the pool:get_nodes or pool:get_node() functions for one way to get those lists.
You can let the pool module handle the management of the processes (restarting, monitoring, and killing processing) and use some other module to transparently redirect requests to the pool of processes. Maybe you were looking for distributed pools though? It'll be hard to get away from the master process in erlang whithout going to distributed nodes. The whole running system is pretty much one large supervision tree.
I recently remembered the pg module which allows you to setup process groups. messages sent to the group go to every process in the group. It might get you part way toward what you want. you would have to write the code to decide which process handles the request for real but you would get a pool without a master using it.

Resources