We have recently moved to SSDT for our database management and deployment tool. We are using SqlPackage.exe to deploy the package. We are occasionally getting timeout errors when deploying the package. After looking at the errors, I added /p:CommandTimeout=900 in the commandline parameters of the sqlpackage. But it is still failing on some occasions and when it fails, it fails within few seconds. So I'm guessing it is not hitting that p:CommandTimeout.
I couldn't find documentation on any other timeout.
Here is the detailed error message -
Error SQL72014: .Net SqlClient Data Provider: Msg -2, Level 11, State
0, Line 0 Execution Timeout Expired. The timeout period elapsed prior
to completion of the operation or the server is not responding. Error
SQL72045: Script execution error. The executed script:
CREATE DATABASE [$(DatabaseName)]
ON
PRIMARY(NAME = [PhoenixDB], FILENAME = '$(DefaultDataPath)PhoenixDB_Data.mdf', SIZE = 8000 MB, FILEGROWTH = 10 %)
LOG ON (NAME = [PhoenixDB_log], FILENAME = '$(DefaultLogPath)PhoenixDB_Log.ldf', SIZE = 2000 MB, FILEGROWTH = 10 %) COLLATE SQL_Latin1_General_CP1_CI_AS;
Error SQL72014: .Net SqlClient Data Provider: Msg 1802, Level 16,
State 4, Line 1 CREATE DATABASE failed. Some file names listed could
not be created. Check related errors. Error SQL72045: Script execution
error. The executed script:
CREATE DATABASE [$(DatabaseName)]
ON
PRIMARY(NAME = [PhoenixDB], FILENAME = '$(DefaultDataPath)PhoenixDB_Data.mdf', SIZE = 8000 MB, FILEGROWTH = 10 %)
LOG ON (NAME = [PhoenixDB_log], FILENAME = '$(DefaultLogPath)PhoenixDB_Log.ldf', SIZE = 2000 MB, FILEGROWTH = 10 %) COLLATE SQL_Latin1_General_CP1_CI_AS;
Related
I am trying to aggregate various columns on a 450 million row data set. When I use Dask's built in aggregations like 'min', 'max', 'std', 'mean' keep crashing a worker in the process.
The file that I am using can be found here: https://www.kaggle.com/c/PLAsTiCC-2018/data look for test_set.csv
I have a google kubernetes cluster which consists of 3 8core machines with a total of 22GB of RAM.
Since these are just the built in aggregation functions, I haven't tried too much else.
It's not using that much RAM either, it stays steady around 6GB total and I haven't seen any errors that would indicate an out of memory error.
Below is my basic code and the error log on the evicted worker:
from dask.distributed import Client, progress
client = Client('google kubernetes cluster address')
test_df = dd.read_csv('gs://filepath/test_set.csv', blocksize=10000000)
def process_flux(df):
flux_ratio_sq = df.flux / df.flux_err
flux_by_flux_ratio_sq = (df.flux * flux_ratio_sq)
df_flux = dd.concat([df, flux_ratio_sq, flux_by_flux_ratio_sq], axis=1)
df_flux.columns = ['object_id', 'mjd', 'passband', 'flux', 'flux_err', 'detected', 'flux_ratio_sq', 'flux_by_flux_ratio_sq']
return df_flux
aggs = {
'flux': ['min', 'max', 'mean', 'std'],
'detected': ['mean'],
'flux_ratio_sq': ['sum'],
'flux_by_flux_ratio_sq': ['sum'],
'mjd' : ['max', 'min'],
}
def featurize(df):
start_df = process_flux(df)
agg_df = start_df.groupby(['object_id']).agg(aggs)
return agg_df
overall_start = timer()
final_df = featurize(test_df).compute()
overall_end = timer()
Error logs:
distributed.core - INFO - Event loop was unresponsive in Worker for 74.42s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - INFO - Event loop was unresponsive in Worker for 3.30s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - INFO - Event loop was unresponsive in Worker for 3.75s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
A number of these occur, then:
distributed.core - INFO - Event loop was unresponsive in Worker for 65.16s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.worker - ERROR - Worker stream died during communication: tcp://hidden address
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/distributed/comm/tcp.py", line 180, in read
n_frames = yield stream.read_bytes(8)
File "/opt/conda/lib/python3.6/site-packages/tornado/iostream.py", line 441, in read_bytes
self._try_inline_read()
File "/opt/conda/lib/python3.6/site-packages/tornado/iostream.py", line 911, in _try_inline_read
self._check_closed()
File "/opt/conda/lib/python3.6/site-packages/tornado/iostream.py", line 1112, in _check_closed
raise StreamClosedError(real_error=self.error)
tornado.iostream.StreamClosedError: Stream is closed
response = yield comm.read(deserializers=deserializers)
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/opt/conda/lib/python3.6/site-packages/distributed/comm/tcp.py", line 201, in read
convert_stream_closed_error(self, e)
File "/opt/conda/lib/python3.6/site-packages/distributed/comm/tcp.py", line 127, in convert_stream_closed_error
raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: TimeoutError: [Errno 110] Connection timed out
It runs fairly quickly and I'm just looking to get consistent performance without my workers crashing.
Thanks!
I have a Watir script, that occasionally and unpredictably returns this error:
Net::ReadTimeout
I searched this error and found this question already asked. I followed the top answer, and implemented this:
attempts = 0
url = "https:/www.google.com/"
begin
doc = Watir::Browser.start url
rescue Net::ReadTimeout
retry
end
but I'm still getting the same timeout error.
I've never had any connection issues with my network. I get the error on both an Ubuntu and a Windows 10 machine. My code goes through an average of around 30 iterations before this error manifests itself. I'm using Chrome.
Any suggestions?
The above error was thrown when the page load time exceeds for 60 seconds so write the following code for page load
client = Selenium::WebDriver::Remote::Http::Default.new
client.read_timeout = 120 # seconds
driver = Selenium::WebDriver.for :firefox,http_client: client
b=Watir::Browser.new driver
b.goto "www.google.com"
Now your code would wait for 120 seconds for any page load which has been caused by #click and also wait to load the url by goto method.
We're using Team Foundation Server 2015 Update 2 on-premise. The Visual Studio Test task takes about 30 seconds to publish the test results after it is run.
Small unit test project:
2016-05-02T01:02:56.9641774Z Attachments:
2016-05-02T01:02:56.9641774Z C:\Agent1\_work\9\TestResults\eb650e78-ddfa-4116-af15-9847b5cc2632\TFSBUILD_BuildAgent 2016-05-02 03_02_23.coverage
2016-05-02T01:02:56.9641774Z Total tests: 316. Passed: 316. Failed: 0. Skipped: 0.
2016-05-02T01:02:56.9641774Z Test Run Successful.
2016-05-02T01:02:56.9641774Z Test execution time: 35,1251 Seconds
2016-05-02T01:02:57.1048030Z Results File: C:\Agent1\_work\9\TestResults\TFSBUILD_BuildAgent 2016-05-02 03_02_31.trx
2016-05-02T01:03:26.6662691Z Publishing Test Results...
2016-05-02T01:03:31.2109274Z Test results remaining: 316
2016-05-02T01:03:37.6228586Z Published Test Run : http://<tfs server>:8080/tfs/DefaultCollection/Project/_TestManagement/Runs#runId=52024&_a=runCharts
As you can see after finishing all tests and writing the results file there is a 30 second stop before "Publishing Test Results..." even appears. Then it takes another 11 seconds to upload a few kb over the local network.
In the _diag folder I find the following entries in the corresponding log file (of a newer build, but everything else is identical):
06:48:13.171983 BaseLogger.LogConsoleMessage(scope.JobId = 5f7ff256-ef21-4150-86fc-678cdef40792, message = Results File: C:\Agent1\_work\9\TestResults\TFSBUILD_BuildAgent 2016-05-12 08_47_49.trx)
06:48:45.798627 FindFiles.FindMatchingFiles(rootFolder = C:\Agent1\_work\9\TestResults, matchPattern = *.trx, includeFiles = True, includeFolders = False
I'll assume that this is not working as intended, but how do I best debug such a problem?
To quote the TFS documentation:
"When you use these predefined reports or create your own reports, there is a time delay between the time that you save the test results and the time that the data is available in the warehouse database or the analysis services database in Team Foundation Server."
I think this might explain the problem you seem to have
In our unit testing suite, we create and destroy a large number of SQLite databases that use the path of ":memory:". Occasionally, and only when running on the iOS simulator, the creation of those databases fails with the rather enigmatic message:
Database ":memory:": unable to open database file
99% of the time, these requests succeed. (Subsequent tests within the same test run typically succeed after this failure occurs.) But when you're using this in an automated build-acceptance test, you want 100%.
We've instrumented for memory consumption (it's within normal limits) and disk-space availability (20GB+ available).
Any ideas?
UPDATE: Captured this happening with extra logging per Richard's suggestion below. Here's the log output:
SQLITE ERROR: (28) attempt to open "/Users/xxx/Library/Developer/CoreSimulator/Devices/CF762060-7D23-4C79-A466-7F20AB6233E7/data/Containers/Data/Application/582E1ED0-81E0-4CC7-A6F6-DBEBC101BBE8/tmp/etilqs_1ghbf1MSTa8ilSj" as
SQLITE ERROR: (14) cannot open file at line 30595 of [f66f7a17b7]
SQLITE ERROR: (14) os_unix.c:30595: (17) open(/Users/xxx/Library/Developer/CoreSimulator/Devices/CF762060-7D23-4C79-A466-7F20AB6233E7/data/Containers/Data/Application/582E1ED0-81E0-4CC7-A6F6-DBEBC101BBE8/tmp/etilqs_1ghbf1MST
I've noticed that even a :memory: database will files on disk if you create a temporary table. The temporary files for unix system are built by a Prng, so there is a non-zero chance of name collision if lots and lots of temporary files are created simultaneously. Or, if the disk is full, the create could fail. Or if for some reason the unix temp directory is not accessible either because it's been deleted or permissions on it are invalid.
For example, I've turned on several loggers in sqlite3 command line by adding these command line arguments to llvc-gcc: -DSQLITE_DEBUG_OS_TRACE=1 -DSQLITE_TEST=1 -DSQLITE_DEBUG=1 then I observed a temp file being created from the command line using this SQL:
$ ./sqlite3
SQLite version 3.8.8.2 2015-01-30 14:30:45
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create temporary table t( x );
OPENX 3 /var/folders/nf/l1cw8sn1707b73zy5nqycrpw0000gn/T//etilqs_fvwR6KbMm518S4w 01002
OPEN 3
WRITE 3 512 0 0
OPENX 4 /var/folders/nf/l1cw8sn1707b73zy5nqycrpw0000gn/T//etilqs_OJJJ1lrTtQIFnUO 05402
OPEN 4
WRITE 4 1024 0 0
WRITE 4 1024 1024 0
WRITE 3 28 0 0
sqlite>
No ideas. But perhaps if you turned on the error and warning log it will give some clues.
I am using flume to write to Google Cloud Storage. Flume listens on HTTP:9000. It took me some time to make it work (add gcs libaries, use a credentials file...) but now it seems to communicate over the network.
I am sending very small HTTP request for my tests, and I have plenty of RAM available:
curl -X POST -d '[{ "headers" : { timestamp=1417444588182, env=dev, tenant=myTenant, type=myType }, "body" : "some body ONE" }]' localhost:9000
I encounter this memory exception on first request (then of course, it stops working):
2014-11-28 16:59:47,748 (hdfs-hdfs_sink-call-runner-0) [INFO - com.google.cloud.hadoop.util.LogUtil.info(LogUtil.java:142)] GHFS version: 1.3.0-hadoop2
2014-11-28 16:59:50,014 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:467)] process failed
java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
(see complete stack trace as a gist for full details)
The strange part is that folders and files are created the way I want, but files are empty.
gs://my_bucket/dev/myTenant/myType/2014-12-01/14-36-28.1417445234193.json.tmp
Is it something wrong with the way I configured flume + GCS or is it a bug in GCS.jar ?
Where should I check to gather more data ?
ps : I am running flume-ng inside docker.
My flume.conf file:
# Name the components on this agent
a1.sources = http
a1.sinks = hdfs_sink
a1.channels = mem
# Describe/configure the source
a1.sources.http.type = org.apache.flume.source.http.HTTPSource
a1.sources.http.port = 9000
# Describe the sink
a1.sinks.hdfs_sink.type = hdfs
a1.sinks.hdfs_sink.hdfs.path = gs://my_bucket/%{env}/%{tenant}/%{type}/%Y-%m-%d
a1.sinks.hdfs_sink.hdfs.filePrefix = %H-%M-%S
a1.sinks.hdfs_sink.hdfs.fileSuffix = .json
a1.sinks.hdfs_sink.hdfs.round = true
a1.sinks.hdfs_sink.hdfs.roundValue = 10
a1.sinks.hdfs_sink.hdfs.roundUnit = minute
# Use a channel which buffers events in memory
a1.channels.mem.type = memory
a1.channels.mem.capacity = 10000
a1.channels.mem.transactionCapacity = 1000
# Bind the source and sink to the channel
a1.sources.http.channels = mem
a1.sinks.hdfs_sink.channel = mem
related question in my flume/gcs journey: What is the minimal setup needed to write to HDFS/GS on Google Cloud Storage with flume?
When uploading files, the GCS Hadoop FileSystem implementation sets aside a fairly large (64MB) write buffer per FSDataOutputStream (file open for write). This can be changed by setting "fs.gs.io.buffersize.write" to a smaller value, in bytes, in core-site.xml. I imagine 1MB would suffice for low-volume log collection.
In addition, check what the maximum heap size is set to when launching the JVM for flume. The flume-ng script sets a default JAVA_OPTS value of -Xmx20m to limit the heap to 20MB. This can be set to a larger value in flume-env.sh (see conf/flume-env.sh.template in the flume tarball distribution for details).