SSRS2005 timeout error - stored-procedures

I've been running around circles the last 2 days, trying to figure a problem in our customers live environment. I figured I might as well post it here, since google gave me very limited information on the error message (5 results to be exact).
The error boils down to a timeout when requesting a certain report in SSRS2005, when a certain parameter is used.
The deployment scenario is:
Machine #1 Running reporting services (SQL2005, W2K3, IIS6)
Machine #2 Running datawarehouse database (SQL2005, W2K3) which is the data source for #1
Both machines are running on the same vm cluster and LAN.
The report requests a fairly simple SP - lets called it sp(param $a, param $b).
When requested with param $a filled, it executes correctly. When using param $b, it times out after the global timeout periode has passed.
If I run the stored procedure with param $b directly from sql management studio on #2, it returns the results perfectly fine (within 3-4s).
I've profiled the datawarehouse database on #2 and when param $b is used, the query from the reporting service to the database, never reaches #2.
The error message that I get upon timeout, when using param $b, when invoking the report directly from SSRS web interface is:
"An error has occurred during report processing.
Cannot read the next data row for the data set DataSet.
A severe error occurred on the current command. The results, if any, should be discarded. Operation cancelled by user."
The ExecutionLog for the SSRS does give me much information besides the error message rsProcessingAborted
I'm running out of ideas of how to nail this problem. So I would greatly appreciate any comments, suggestions or ideas.
Thanks in advance!

The first thing you need to do is to ensure your statistics are up to date.
(It sounds like a case of an incorrect query plan being used due to parameter sniffing, as described in this SO answer: Parameter Sniffing (or Spoofing) in SQL Server).
One way to fix this in SQL Server 2005, is using the OPTIMIZE FOR query hint. See also OPTIMIZE FOR query hint in SQL Server 2005
Also, do you have a regular scheduled index rebuild job for some or all of your indexes?

Related

PubsubIO does not output custom timestamp attribute as context.timestamp when running with DataflowRunner and Dataflow service

I am working on a Apache Beam project that ran onto an issue with Dataflow service and PubsubIO related to the custom timestamp attribute. Current version of Beam SDK is 2.7.0.
In the project, we have 2 Dataflow jobs communicating via a PubSub topic and subscription:
The first pipeline (sinking data to PubSub)
This pipeline works on per-basis messages, therefore it had no custom window strategy applied besides the GlobalWindows (default by Beam). At the end of this pipeline, we sunk (wrote) all the messages which had already been assigned a map of attributes including their event timestamp (e.g. "published_at") to a PubSub topic using PubsubIO.writeMessages().
Note: if we use PubsubIO.writeMessages().withTimestampAttribute(), this method will tell PubsubIO.ShardFn, PubsubIO.WriteFn and PubsubClient to write/overwrite the sinking pipeline's processing time to this attribute in the map.
The second pipeline (reading data from PubSub)
In the second pipeline (reading pipeline), we have tried PubsubIO.readMessagesWithAttributes().withTimestampAttribute("published_at") and PubsubIO.readStrings().withTimestampAttribute("published_at") for the source.
When running with DirectRunner, everything worked well as expected. The messages
were read from the PubSub subscription and outputted to the
downstream stages with a ProcessContext.timestamp() equals to their
event timestamp "published_at".
But when running with DataflowRunner, the ProcessContext.timestamp()
was always set near real time which is closed to the sinking
pipeline's processing time. We checked and can confirm that those
timestamps were not from PubSub's publishing time. All the data were
then assigned to the wrong windows compared to their event domain
timestamp. We expected late data to be dropped not to be assigned
into invalid windows.
Note: We had left the Pubsub topic populated with a considerable amount of data before we turned on the second pipeline to have some kind of historical/late data.
Pubsub messages with invalid context timestamp
Assumed root cause
Looking deeper into the source code of DataflowRunner, we can see that Dataflow Service uses a completely different Pubsub code (overriding the PubsubIO.Read at the pipeline's construction time) to Read from and Sink to Pubsub.
So if we want to use the Beam SDK's PubsubIO, we have to use the experimental option "enable_custom_pubsub_source". But so far no luck yet as we have run into this issue https://jira.apache.org/jira/browse/BEAM-5674 and have not been able to test Beam SDK' Pubsub codes.
Workaround solution
Our current workaround is that, after the step assigning windows to the messages, we implemented a DoFn to check their event timestamp against their IntervalWindow. If the windows are invalid, then we just drop the messages and later on run a weekly or half a week jobs to correct them from a historical source. It is better to have some missing data rather than the improperly calculated ones.
Messages dropped due to invalid windows
Please share with us experiences on this case. We know that from the perspective of the Dataflow watermark management, the watermark is said to adjust itself into the current real time if the ingested data is sparsed (not dense enough overtime).
We also believe that we are misunderstanding something about the way Dataflow service maintains the PubsubUnboundedSource's output timestamp as we are still new to Apache Beam and Google's Dataflow so there are things that we have not come to know of yet.
Many Thanks!
I found the fix for this issue. In my sinking pipeline, the timestamp attribute is set with wrong date format compared to RFC 3339 standard. The formatted dates missed 'Z' character. We either did fix the 'Z' character or changed to use the milliseconds since epoch. Both worked well.
But one thing is that when Dataflow service could not parse the wrong date formats, it did warn or throw error but instead took the processing time for all the elements therefore, they were assigned to the wrong event_time windows.

Neo4j query monitoring / profiling for long running queries

I have some relly long running queries. Just as abckground information: I am crawling my graph for all instances of a specific meta path. for example, count all instances of a specific metha path found in the graph.
MATCH (a:Content) - [:isTaggedWith]-> (t:Term) <-[:isTaggedWith]-(b:Content) return (*)
In the first place, I want to measure the runtimes. is there any possibility to do so? especially in the community edition?
Furthermore, I have the problem that I do not know, whether a query is still running in neo4j or if it was already terminated. I issue the query from a rest client but I am open to other options if necessary. For example, I queried neo4j with a rest client and set the read timeout (client side) on 2 days. The problem is, that I can't verify whether the query is still running or if the client is simply waiting for the neo4j answer, which will never appear because the query might already be killed in the backend. is there really no possibility to check from the browser or another client which queries are currently running? maybe with an option to terminate them as well.
Thanks in advance!
Measuring Query Performance
To answer your first question, there are two main options for measuring the performance of a query. The first is to use PROFILE; put it in front of a query (like PROFILE MATCH (a:Content)-[:IsTaggedWith]->(t:Term)...), and it will execute the query and display the execution plan used, including the native API calls, number of results from each operation, number of total database hits, and total time of execution.
The downside is that PROFILE will execute the query, so if it is an operation that writes to the database, the changes are persisted. To profile a query without actually executing it, EXPLAIN can be used instead of PROFILE. This will show the query plan and native operations that will be used to execute the query, as well as the estimated total database hits, but it will not actually run the query, so it is only an estimate.
Checking Long Running Queries (Enterprise only)
Checking for running queries can be accomplished using Cypher in Enterprise Edition: CALL dbms.listQueries;. You must be logged in as an admin user to perform the query. If you want to stop a long-running query, use CALL dbms.killQuery() and pass in the ID of the query you wish to terminate.
Note that besides manual killing of a query and timeout of it based on the configured query timeout, unless you have something else set up to kill long-runners, the queries should, in general, not be getting killed on the backend; however, with the above method, you can double-check your assumptions that the queries are indeed executing after sending.
These are available only in Enterprise Edition; there is no way that I am aware of to use these functions or replicate their behavior in Community.
For measuring long running queries I figured out the following approach:
Use a tmux (tmux crash course) terminal session, which is really very easy. Hereby, you can execute your query and close the terminal. Later on you can get back the session.
New session: tmux new -s *sessionName*
Detach from current session (within session): tmux detach
List sessions: tmux ls
Re-attach to session: tmux a -t *sessionName*
Within the tmux session, execute the query via the cypher shell. Either directly in the shell or pipe the command into the shell. The ladder approach is preferable because you can use the unix command time to actually measure the runtime as follows:
time cat query.cypher | cypher-shell -u neo4j -p n > result.txt
The file query.cypher simply conatins the regular query including terminating semicolon at the end. The result of the query will be piped into the result.txt and the runtime of the execution will be displayed in the terminal.
Moreover, it is possible to list the running queries only in the enterprise edition as correctly stated by #rebecca.

MySQL Error 2013: Lost connection to MySQL server during query

I've read all post with the same or very close headline, but still can't find a proper solution or explanation to my problem.
I'm working with MySQL Workbench 6.3 CE. I have been able to create a database with several tables, and create a conexion with python to write data to it. Still, I had a problem related to a varchar field that needed to be set to more than 45 characters. When I try to set it to bigger limits, like VARCHAR(70), no matter how many times I try, wether I set higher limits for timeout, I get the 2013 error, saying my connection was closed during the query.
I'm using the above version of workbench, on windows 10, and I'm trying to modify that field from the workbench. Afer that first time, I can't drop a table either, nor can I connect from python.
What is happening?
Ok, apparently what was happening is that I had a block, and there where a lot of query waiting in a situation of "waiting for table metadata block".
I did the following in the console of workbench
Select concat('KILL ',id,';') from information_schema.processlist where user='root'
that generates a list of all those processes. I copy that list in a new tab, and execute a massive kill of processes. After that it worked again.
Can anybody explain me how did I arrive to that situation and what precautions to take in my python scripts so as to avoid it?
Thnak you

Mysterious time out on .NET MVC application with SQL Server

I have a very peculiar problem and I'm looking for suggestions that might help me get to the bottom of it.
I have an application in .NET 3.5 (MVC3) on a SQL Server 2008 R2 database.
Locally and on two other servers, it runs fine. But on the live server there is a stored procedure that always times out after 30 seconds.
If I run the stored procedure on the database, it takes a couple of seconds. But the if the stored procedure is received by the application, then profiler says it took over 30 seconds.
The same query the profiler receives, runs immediately if we run it directly on the DB.
Furthermore, the same problem doesn't occur on any of the other 3 local servers.
As you can understand, it's driving me nuts and I don't even have a clue how to diagnose this.
The even logs just show the timeout as a warning.
Has anyone had anything like this before and where could I start looking for a fix?
Many thanks
You probably have some locking taking place in your application that doesn't occur when running the query on the server.
To test this run your query in your application using READ UNCOMMITTED or the NOLOCK hint. If it works you need to check your sequence of calls or check to see whether your isolation level isn't too aggressive.
These can be tricky to nail down.

Talend - Lock wait timeout exceeded

I use the ETL Talend Open Studio (TOS). I want transfered a data base A into a data base B. I use a tMap component. When I use a tLogRow to look results, it's ok. TOS shows data correctly. But when I make the transfer, TOS writes "Lock wait timeout exceeded; try restarting transaction".
I don't understand this problem... It's ok for the reading of data but there is a problem for the writing of data.
Can you help me, please ?
Try running your job using a single connection to Mysql ( I assume you are using it as the error is a mysql error )
The error above can occur when you attempt to insert/update/delete from two or more connections concurrently.
to create a single connection and have all components share it you will need a pair of components: "tMysqlConnection" and "tMysqlCommit"
the Connection component should be placed before you attempt to query the database. Once you have it in the job, you can link the tMysqlInput components to it by selecting "use existing connection"
The Commit component will issue the commit command and close the transaction.
You will need Connection components for each separate DB server you are working with.
The base A contains 300 articles. I think that this problem is caused by Talend Open Studio. TOS can't execute more 100 articles. I have tried to "cut" the base A in three bases. Then, I run TOS. The error has missing. It's strange... but it works.

Resources