So, I've been syncing databases from azure to local SQL. I've already synced azure database in 45 minutes, which is not that bad at all, considering I have 200+ tables and countless rows on my database.
Problem is when I'm syncing to local, it's taking a lot of time. I've been practically sitting on this for three hours and still, the sqlServerProv.Apply() functionality hasn't finished yet.
Any ideas on how long this takes?
Here's my sample code, or a part of it.
SqlSyncScopeProvisioning sqlServerProv = new SqlSyncScopeProvisioning(sqlServerConn, myScope);
// Apply the scope provisioning.
sqlServerProv.Apply();
Also, it's a windows service so I can't really debug other than creating a log file if any error throws, so far, none has triggered any errors yet.
that code is provisioning...that's not synchronizing... it's just creating the objects needed by Sync Framework (tables, triggers, stored procedures, etc...)
when provisioning, Sync Fx populates the tracking tables...so if you have a table in your sync scope that has 10M rows, it will insert 10M rows in the tracking tables.
how big is your database?
Related
We experience intermittent, seemingly random brownouts of a firebase realtime database. We are beginning to shard our data into multiple databases, however, we are not sure this will solve our problem. It appears to us that firebase cannot scale to meet our needs in terms of doing frequent writes to a specific data set.
We sync data from a third-party data source in cycles (every 4-10 minutes, 1000 active jobs). Each update has the potential to change a few thousand nodes in firebase, most of which lie pretty low. However, most of the time the number of low-level nodes changed is much lower. We do differential updates on the sync'd data in order to allow very small writes to the lower-level nodes. This helps prevent our users from downloading a ton of additional data. We also batch all of our updates per cycle into only a handful of writes, between 10-20 (not sure of the performance impact of a batched write to multiple nodes vs. a write to a single node).
Here is an image of the database load graph, which includes some sharding:
Database Load
The "blue" line is our "main" database. The "orange line" is a database containing only the data that requires many writes, as described above. Currently, the main (blue) database is supporting normal operations, including reads/writes, etc.. The shard (orange) database is only handling writes. The mirror of these is pretty indicative of a "write" load issue, given that a large percentage of writes occurs in the morning.
At times, the database load reaches 100% and remains in this state for 30+ minutes.
Please let me know if I can expand on anything or explain anything in more detail. Would appreciate any suggestions on debugging strategies or explanations as to why this may be occurring.
We are actively refactoring a lot of code to mitigate this issue, however, it is not obvious what the main driver is.
I have been attempting to implement iCloud with my Core Data based small business apps. Been using a GitHub method called Ubiquity Store Manager (USM) and more generic Apple code example methods. It almost seems to work...but there are 2 major issues that I can't seem to consistently address:
Timing - When the context is saved to the Ubiquity container it is beyond your control to determine when it is upload to iCloud. If two transactions are saved in less than 3-5 seconds often they will be uploaded to iCloud in the reverse chronological order they were entered/saved. For example: trans1 at 8:01:01 and trans2 at 8:01:04, trans2 will often upload and download onto other devices BEFORE trans1. If these are simple records like appointments or contacts, probably not a big deal. With parent-child related records it's a very big deal as the child records arrive before and parents and are effectively "lost" in iCloud. I have tried a timer between transactions 5-7 second delay will eliminate the problem, but is there a better way to handle this?
Reliability - When testing on 2 devices after a pause of as little as 2 minutes, if 2 successive transactions are saved frequently the first transaction will not be displayed on the 2nd device. If a "wake up" transaction is created prior to the entry of the real transaction then the reliability can be restored. Again, this is a kluggy solution, does any one have a better way to handle this?
Key Value iCloud transaction are almost instantaneous, error free and bulletproof. How can this be achieved using Core Data or is Core Data just not appropriate for complex (multiple relationship) business transactions?
Thanks for any help or ideas!
I am running an ASP.NET MVC 3 web application and would like to gather statistics such as:
How often is a specific product viewed
Which search phrases typically return specific products in their result list
How often (for specific products) does a search result convert to a view
I would like to aggregate this data and break it down:
By product
By product by week
etc.
I'm wondering what are the cleanest and most efficient strategies for aggregating the data. I can think of a couple but I'm sure there are many more:
Insert the data into a staging table, then run a job to aggregate the data and push it into permanent tables.
Use a queuing system (MSMQ/Rhino/etc.) and create a service to aggregate this data before it ever gets pushed to the database.
My concerns are:
I would like to limit the number of moving parts.
I would like to reduce impact on the database. The fewer round trips and less extraneous data stored the better
In certain scenarios (not listed) I would like the data to be somewhat close to real-time (accurate to the hour may be appropriate)
Does anyone have real world experience with this and if so which approach would you suggest and what are the positives and negatives? If there is a better solution that I am not thinking of I'd love ot hear it...
Thanks
JP
I needed to do something similar in a recent project. We've implemented a full audit system in a secondary database, it tracks changes on every record on the live db. Essentially every insert, update and delete actually updates 2 records, one in the live db and one in the audit db.
Since we have this data in realtime on the audit db, we use this second database to fill any reports we might need. One of the tricks I've found when working with a reporting DB is to forget about normalisation. Just create a table for each report you want, and have it carry just the data you want for that report. Its duplicating data, but the performance gains are worth it.
As to filling the actual data in the reports, we use a mixture. Daily reports are generated by a scheduled task at around 3am, ditto for the weekly and monthly reports, normally over weekends or late at night.
Other reports are generated on demand, using mostly the data since the last daily, so its not that many records, once again all from the secondary database.
I agree that you should create a separate database for your statistics, it will reduce the impact on your database.
You can go with your idea of having "Staging" tables and "Aggregate" tables; that way, if you want to access the near-real-time data you go o the staging table, when you want to historical data, you go to the aggregates.
Finally, I would recommend you use an asynchronous call to save your statistics; that way your pages will not have an impact in response time.
I suggest that you will create a separate database for this. The best way is to use BI technique. There is a separate services in
SQL server for Bi.
Is it ever a good idea to let a large amount of people connect to your website while it is using sqlite?
edit: I am using it in a critical ruby on rails application that may have hundreds of concurrent users.
There are two important properties unique to SQLite that I know of that are relevant:
When doing multiple inserts, you will get better performance if you wrap them all in a single transaction. If the inserts are done individually, SQLite waits for the disk platters to rotate around completely on each insert, so that the inserted data can be read back from the disk and validated.
When writing to a SQLite file, the entire file is locked, which can cause writer starvation. This situation improved in SQLite 3.
The SQLite website says that SQLite is suitable for small to medium traffic websites, with low OLTP capability. This accounts for about 95% of all websites.
I’m using SSIS to synchronize data between two databases. I’ve used SSIS and DTS in the past, but I generally write an application for things of this nature (I’m coder and it just comes easier to me).
In my package I use a SQL Task that returns about 15,000 rows. I’ve hooked that up to a Foreach Container, and within that I assign the resultset column values to variables, and then map those variables to parameters that are fed to another SQL Task.
The problem I’m having is with debugging, and not just more complicated debugging like breakpoints and evaluating values at runtime. I simply mean that if I run this with debugging rather than without, it takes hours to complete.
I ended up rewriting the process in Delphi, and the following is what I came up with:
Full Push of Data:
This pulls 15,000 rows, updates a destination table for each row, then pulls 11,000 rows and updates a destination table for each row.
Debugging:
Delphi App: 139s
SSIS: 4 hours, 46 minutes
Not Debugging:
Delphi App: 132s
SSIS: 384s
Update of Data:
This pulls 3,000 rows, but no updates are needed or made to the destination table. It then pulls 11,000 rows but, again, no updates are needed or made to the destination table.
Debugging:
Delphi App: 42s
SSIS: 1 hours, 10 minutes
Not Debugging:
Delphi App: 34s
SSIS: 205s
The odd thing is, I get the feeling that most of this time spent debugging is just updating UI elements in Visual Studio. If I watch the progress tab, a node is added to a tree for each iteration (thousands total), and this gets slower and slower as the process goes on. Trying to stop debugging usually doesn’t work, as Visual Studio seems caught in a loop updating the UI. If I check the profiler for SQL Server no actual work is being done. I'm not sure if the machine matters, but it should be more than up to the job (quad core, 4 gig of ram, 512 mb video card).
Is this sort of behavior normal? As I’ve said I’m a coder by trade, so I have no problem writing an app for this sort of thing (in fact it takes much less time for me to code an application than “draw” it in SSIS, but I figure that margin will shrink with more work done in SSIS), but I’m trying to figure out where something like SSIS and DTS would fit into my toolbox. So far nothing about it has really impressed me. Maybe I’m misusing or abusing SSIS in some way?
Any help would be greatly appreciated, thanks in advance!
SSIS control flow and loops are not very high performance, and not designed for processing these amounts of data. Especially during the debugging - before and after each task execution, debugger sends notifications to designer process, which updates colors of the shapes and this could be slow.
You could get much better performance using data flow. Data flow does not operate with single rows, it works with buffers of rows - much faster, and the debugger is only notified about beginning/end of the buffers - so its impact is less noticeable.
SSIS is not designed to do a foreach like that. If you are doing something for each row coming in, you probably want to read those into a dataflow and then using a lookup or merge join, determine whether to do an INSERT (these happen in bulk) or a database command object for multiple SQL UPDATE commands (a better performing option is to batch these into staging table and do a single UPDATE).
In another typical sync situation, you read all the data into a staging table, and do a SQL Server UPDATE on the existing rows (INNER JOIN) and INSERT on the new rows (LEFT JOIN, rhs IS NULL). There is also the possibility of using linked servers, but joins over that can be slow, since all (or a lot of) the data may have to come across the network.
I have SSIS packages that regular import 24 million rows, including handling data conversion and validation and slowly changing dimensions using the TableDifference component, and it performs relatively quickly for that large amount of data versus a separate client program.
I have noticed this is the behavior, I had an SSIS package for moves, that did somewhere in the neighborhood of 3 million entries, it was not possible to debug as it would run for about 3-4 days.
SSIS is still the way I did it, I just don't "debug" with SSIS, I run them when working with the full datasets. If I must debug, I use very small datasets.