Batch Move Data from TADOTable to MySQL TMyTable - delphi

I am trying to import a table from an old database (MS Access) to MySQL server using CRBatchMove using Delphi 2007.
The program fetches data from the legacy database over an ODBC connection and stores it on the local hard drive using TADOTable.SaveToFile(). The second part of the program reads this file into another TADOTable and uses TCRBatchMove to transfer it to a MySQL server (via DevArt's TMyTable). In this process the batch move appears to be extremely slow for some reason.
Amount of data in the following trial is about 100,000 records each with about 120 fields. Most of the fields are integers and VARCHAR (each of VARCHAR less than 32 chars).
The performance figures I obtained are:
Time taken to bring data to local file over ODBC connection: 17 seconds
Time taken to load data from local file into TADOTable: 3 seconds
Time taken by TCRBatchMove to move data from TADOTable to TMyTable: > 30 minutes
MySQL server is running locally on the development machine (which is an i7-2.8GHz) and the database is otherwise very snappy).
Why is it so slow for the batch move to push data to MySQL server. Is there a way to speed up this task? Or is there a better way to accomplish this?

Not really an answer, but I'm running out of space in the comments.
MySQL has a function called load data infile
see: http://dev.mysql.com/doc/refman/5.1/en/load-data.html
You can use that to time the fastest time possible to insert data. This will give you a baseline for the insert time into MySQL and allow you to pinpoint whether the delay is in MySQL or Delphi. If you have the source for TMyTable, you can use a profiler as well.
Another option is to download ZEOS data access components at:
http://sourceforge.net/projects/zeoslib/
If there's is some snafu in the component you're using a change of toolset might fix the problem. (Devart's components are usually excellent though).
On the MySQL side you can disable index updates before the bulk-insert and enable the index after. If you have a lot of inserts that usually works out faster.
See: https://stackoverflow.com/a/9524988/650492
SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
your insert here
SET autocommit=1;
SET unique_checks=1;
SET foreign_key_checks=1;

Related

MySQL Error 2013: Lost connection to MySQL server during query

I've read all post with the same or very close headline, but still can't find a proper solution or explanation to my problem.
I'm working with MySQL Workbench 6.3 CE. I have been able to create a database with several tables, and create a conexion with python to write data to it. Still, I had a problem related to a varchar field that needed to be set to more than 45 characters. When I try to set it to bigger limits, like VARCHAR(70), no matter how many times I try, wether I set higher limits for timeout, I get the 2013 error, saying my connection was closed during the query.
I'm using the above version of workbench, on windows 10, and I'm trying to modify that field from the workbench. Afer that first time, I can't drop a table either, nor can I connect from python.
What is happening?
Ok, apparently what was happening is that I had a block, and there where a lot of query waiting in a situation of "waiting for table metadata block".
I did the following in the console of workbench
Select concat('KILL ',id,';') from information_schema.processlist where user='root'
that generates a list of all those processes. I copy that list in a new tab, and execute a massive kill of processes. After that it worked again.
Can anybody explain me how did I arrive to that situation and what precautions to take in my python scripts so as to avoid it?
Thnak you

SQL Server Express: How to determine SP memory usage

I am developing with Microsoft SQL Server 2008 R2 Express.
I have a stored procedure that uses temp tables and outputs some processed data usually within 1 second.
Over a few months, my DB has gathered a lot of data almost reaching the 10 GB limit. At this point, this particular stored procedure started taking as much as 5 mins for the same input parameters. After I emptied some of the huge tables in DB, it got back to normal.
After this incident, I am worried if my stored procedure needs more than necessary space in DB. How can I be sure? Any leads?
Thanks already
Jyotsna
Follow this article
Other old school way is run spwho2 check your spid related to the database see CPU and IO usage.
To validate run DBCC INPUTBUFFER(spid)
Also check STATISTICS of SP in original scenario without purging data from tables.
SET STATISTICS IO ON
EXEC [YourSPName]
see the logical reads , also refer article

Migrate Data from Neo4j to SQL

Hi I am using neo4j in my application and my structure is as following:
I am using Embedded Graph API
I have several databases that I point to using a pool that I maintain in my application eg-> db1, db2, db3, ..... db100
When I want to access a particular database I point to it using new EmbeddedGraphDatabase("Path to db(n)")
The problem is that when the connection pool count increases the RAM size being consumed by the application keep increasing and breaks down the application at a point of limit.
So I am Thinking of migrating from Neo4j to some other Database.
Additionally only a small part of my database is utilizing the graph structure.
One way for migration is that I write a script for it. Is there any better option?
My another question is what is the best Database so that my structure can be maintained.
Other view-point that I am thinking about is I can keep part of my data into Neo4j and shift another part to some other database.
If anything is unclear I can clarify.
Thanks in advance.
An EmbeddedGraphDatabase instance is not the equivalent of a "connection" in SQL. It's designed to run a long time (days, months). Hence starting/stopping is costly.
What is the use case for having hundreds of separate databases in the same JVM?
Your lots of small databases will perform poorly as the graphdb is designed to hold the whole datamodel on a single host.
Do you run a single JVM per database?
You can control the amount of memory used by neo4j by providing the correct properties for memory mapping and also use the gcr cache from neo4j-enterprise and control the cache size-property variables.
I think it still makes sense to keep the graph part in Neo4j and only move the non-graphy part.

Updating large amounts of data in Rails App

I have a rails app with a table of about 30 million rows that I build from a text document my data provider gives me quarterly. From there I do some manipulation and comparison with some other tables and create an additional table with a more customized data.
My first time doing this, I ran a ruby script through Rails console. This was slow and obviously not the best way.
What is the best way to streamline this process and update it on my production server without any, or at least very limited downtime?
This is the process I'm thinking is best for now:
create rake tasks for reading in the data. Use activerecord-import plugin to do batch writing and to turn off activerecord validations. Load this data into brand new, duplicate tables.
Build indexes on newly created tables.
Rename newly created tables to the names the rails app is looking for.
Delete the old.
All of this I'm planning on doing right on the production server.
Is there a better way to do this?
Other notes from comments:
Tables already exist
Old tables and data are disposable
Tables can be locked for select only
Must minimize downtime
Our current server situation is 2 High CPU Amazon EC2 instances. I believe they have 1.7GB of RAM so storing the entire import temporarily is probably not an option.
New data is raw text file, line delimited. I have the script for parsing it already written in Ruby.
1) create "my_table_new" as an empty clone of "my_table"
2) import the file (in batches of x lines) into my_new_table - indexes built as you go.
3) Run: RENAME TABLE my_table TO my_table_old, my_table_new TO my_table;
Doing this as one command makes it instant (close enough) so virtually no downtime. I've done this with large data sets, and as its the rename that's the 'switch' you should retain uptime.
Depending on your logic, I would seriously consider processing the data in the database using SQL. This is close to the data and 30m rows is typically not a thing you want to be pulling out of the database and comparing to other data you have also pulled out of the database.
So think outside of the Ruby on Rails box.
SQL has built-in capability to join data and compare data and insert and update tables, those capabilities can be very powerful and fast, allowing the data to be processed close to the data.

patterns to feed a remote MySQL with data

I would like to hear about from the community a nice pattern to the following problem.
I had a "do-everything" server, which were webserver, mysql, crawlers server. Since two or three weeks, using monitoring tools, i saw that always when my crawlers were running, my load average was going over 5 (a 4 core server, would be ok to have until 4.00 as load). So, i've got another server and i want to move my crawlers to there. My question is. As soon as i have the data crawled in my crawler server, i have to insert in my database. And i wouldn't like to open a remote connection and insert it in the database, since i prefer to use the Rails framework, btw i'm using rails, to keep easier to create all relationships, and etc.
problem to be solved:
server, has the crawled data (bunch of csv files) and i want to move it to a remote server and insert it in my db using rails.
restriction: I don't want to run mysql (slave + master) since it would require a deeper analysis to know where happens more write operations.
Ideas:
move the csvs from crawlers to remove server using (ssh, rsync) and importing it during the day
write an API in the crawler server that my remote server can pull (many times at day) and import the data
any other idea or good patterns around this theme?
With a slight variation to the second pattern you have noted you could have a API in your web-app-server/db-server. Which the crawler will use to report in his data. He could do this in batches, real-time or only in a specific window of time (day/night time...etc).
This pattern will let the crawler decide when to report in the data. rather than having the web-app do the 'polling' for data.

Resources