Why does it take so long to run: bitcoind -reindex -txindex -debug=net -printtoconsole - bitcoind

If I run the following command from my bitcoin console client:
bitcoind -reindex -txindex -debug=net -printtoconsole
it take extremely long to run, does this reindex all the previous bitcoin transactions ?

Here are the detail about the options you use:
-txindex: Maintain a full transaction index (default: 0)
-reindex: Rebuild blockchain index from current blk000??.dat files
-debug: Output extra debugging information. Implies all other -debug* options
It's normal that this operation takes time because txindex represent a uge amount of data and you force the bitcoin core to rebuild the blockchain from your local saves each time you run it (which is, from my experience, not necessary). My suggestion is to remove -reindex and try to figure out if you really need -txindex.
If you want to check on all the transactions related to your wallet, I think this option is more appropriate:
-rescan: Rescan the block chain for missing wallet transactions
note: this will also be time consuming
information from : http://we.lovebitco.in/bitcoin-qt/command-line-options/

Tips for faster reindexing:
Use -printtoconsole=0 (will not output anything at all to the console)
Increase dbcache from default 450 - for example, to 1000: -dbcache=1000

Related

select from system$stream_has_data returns error - parameter must be a valid stream name... hmm?

I'm trying to see if there is data in a stream and I provided the exact stream name as follows :
Select SYSTEM$STREAM_HAS_DATA('STRM_EXACT_STREAM_NAME_GIVEN');
But, I get an error :
SQL compilation error: Invalid value ['STRM_EXACT_STREAM_NAME_GIVEN'] for function 'SYSTEM$STREAM_HAS_DATA', parameter 1: must be a valid stream name
1) Any idea why ? How can this error be resolved ?
2) Would it hurt to resume a set of tasks (alter task resume;) without knowing if the corresponding stream has data in it or not? I blv if there is (delta) data in the stream, the task will load it, if not, the task won't do anything.
3) Any idea how to modify / update a stream that shows up as 'STALE' ? - or should just loading fresh data into the table associated with the stream should set the stream as 'NOT STALE' i.e. stale = false ? what if loading the associated table does not update the state of the task? (and that is what is happening currently in my case, as things appear.
1) It doesn't look like you have a stream by that name. Try running SHOW STREAMS; to see what streams you have active in the database/schema that you are currently using.
2) If your task has a WHEN clause that validates against the SYSTEM$STREAM_HAS_DATA result, then resuming a task and letting it run on schedule only hits against your global services layer (no warehouse credits), so there is no harm there.
3) STALE means that the stream data wasn't used by a DML statement in a long time (I think its 14 days by default or if data retention is longer than 14 days, then it's the longer of those). Loading more data into the stream table doesn't help that. Running a DML statement will, but since the stream is stale, doing so may have bad consequences. Streams are meant to be used for frequent DML, so not running DML against a stream for longer than 14 days is very uncommon.

HANA: xsodata: Huge performance gap between first and 2nd request execution

If I expose a VIEW
CREATE VIEW myView AS
SELECT ...
FROM ...
via xsodata
service namespace "oData" {
entity "mySchema"."myView" as "myView";
}
and GET /myView for the first time after VIEW creation the performance is very low:
However: After performing the same request again (and everytime after that) the performance is what I want it to be:
Questions:
Why?
How to avoid the first long-running request?
Already tried:
Execution of the sql profiler-output (without statement preparation) in HANA Studios SQL console gives good performance always
Table hotloading (LOAD myTable ALL;) had no effect
Update
We found out the "Why"-Part: xs-engine is running the query as a prepared statement even if there are no parameters in the request. On first execution (within the user's context) the query gets perpared, resulting in an entry in M_SQL_PLAN_CACHE (SELECT * FROM M_SQL_PLAN_CACHE WHERE USER_NAME = 'myUser'). Clearing the plan cache (ALTER SYSTEM CLEAR SQL PLAN CACHE) makes the oData request slow again, leading to the assumption that the performance gap lies in the re-preparation of the query.
We are now stuck with the 2nd question: How to avoid that? Our approach to mark certain plan cache entries for recompilation (ALTER SYSTEM RECOMPILE SQL PLAN CACHE ENTRY 123) just invalidated the entry and did not update it automatically...
I'm not to sure you can REMOVE the first execution long time, but you can try changing the view to a Calculation View executed in the SQL Engine.
HANA is been super optimized for using its Calculation Views, and the Plan Cache should run faster with them, maybe reducing the first execution time significantly. Also, Plan Cache of Calc. Views should be shared between users (since _SYS_REPO is the one who generates them).
If you use the script version I believe you could reuse a lot of your current SQL, but you can also try using the graphical approach as well.
Let us know if you had any luck. Modeling with Big Data is always a surprise.

How Erlang access huge shared data structures like BTree in CouchDB

In CouchDB there's huge BTree data structure and multiple processes (one for each request).
Erlang processes can't share state - so it seems that there should be dedicated process responsible for accessing BTree and communicating with other processes via messages. But it would be inefficient - because there only one process who can access data.
So how such cases are handled in Erland, and how it's handled in this specific case with CouchDB?
Good question this. If you want an authoritative answer the
best place to ask a question about couchdb internals is the couchdb mailing list they are very quick and one of the core devs can probably give you a better answer. I will try to answer this as best as I can just keep in mind that I may be wrong :)
The first clue is provided by the couchdb config file. Start couchdb in the shell mode
couchdb -i
point your browser to
http://localhost:5984/_utils/config.html
You will find that under the daemon section there is a key value pair
index_server {couch_index_server, start_link, []}
aha! so the index is served by a server. What kind of server? We will have to dive into the code:-
It is a gen_server. All the operations to the couchdb view are handled by this gen_server.
A gen_server is an erlang generic implementation of the client server model. It is concurrent by default. So your observation is correct. All the requests to the view are distinct process managed with the help of gen_server.
index_server defines three ets tables. You can verify this by typing
ets:i() in the erlang shell we started earlier and you should see:-
couchdb_indexes_by_db couchdb_indexes_by_db bag 1 320 couch_index_server
couchdb_indexes_by_pid couchdb_indexes_by_pid set 1 316 couch_index_server
couchdb_indexes_by_sig couchdb_indexes_by_sig set 1 316 couch_index_server
When the index_server gets a call to get_index it adds a list of Waiters to the ets couchdb_indexes_by_sig. Or if a process requests it it simply sends a reply with the location of the index.
When the index_server gets a call to async_open it simply iterates over the list of Waiters and sends a reply to them with the location of the index
Similarly there are calls to reset_indexes and other ops on indexes which again send a reply with the location of the index.
When the index is created for the first time couchdb calls async_open to serve the index to all the waiting processes. Afterwards every process is given access to the index.
An important point to note here is that the index server does not do anything special except for making the index available to other processes(for example to couch_mr_view_util.erl). In that respect it acts as a gateway.Index write operations are handled by couch_index.erl, couch-index_updater.erl and couch_index_compactor.erl which (unsurprisingly) are all gen_servers!
When a view is being created for the first time only one process can access it. The query_server process(couchjs by default). After the view has been built it can be read and updated in a concurrent manner. The actual querying of views is handled by couch_mr_view which is exposed to us as the familliar http api.

Updating core data performance

I'm creating an app that uses core data to store information from a web server. When there's an internet connection, the app will check if there are any changes in the entries and update them. Now, I'm wondering which is the best way to go about it. Each entry in my database has a last updated timestamp. Which of these 2 will be more efficient:
Go through all entries and check the timestamp to see which entry needs to be updated.
Delete the whole entity and re-download everything again.
Sorry if this seems like an obvious question and thanks!
I'd say option 1 would be most efficient, as there is rarely a case where downloading everything (especially in a large database with large amounts of data) is more efficient than only downloading the parts that you need.
I recently did something similiar.
I solve the problem, by assigning an unique ID and a global 'updated timestamp' and thinking about 'delta' change.
I explain better, I have a global 'latest update' variable stored in user preferences, with a default value of 01/01/2010.
This is roughly my JSON service:
response: {
metadata: {latestUpdate: 2013...ecc}
entities: {....}
}
Then, this is what's going on:
pass the 'latest update' to the web service and retrieve a list of entities
update the core data store
if everything went fine with core data, the 'latestUpdate' from the service metadata became my new 'latest update variable' stored in user preferences
That's it. I am only retrieving the needed change, and of course the web service is structured to deliver a proper list. Which is: a web service backed by a database, can deal with this matter quite well, and leave the iphone to be a 'simple client' only.
But I have to say that for small amount of data, it is still quite performant (and more bug free) to download the whole list at each request.
As per our discussion in the comments above, you can model your core data object entries with version control like this
CoreDataEntityPerson:
name : String
name_version : int
image : BinaryData
image_version : int
You can now model the server xml in the following way:
<person>
<name>michael</name>
<name_version>1</name_version>
<image>string_converted_imageData</image>
<image_version>1</image_version>
</person>
Now, you can follow the following steps :
When the response arrives and you parse it, you initially create a new object from entity and fill the data directly.
Next time, when you perform an update on the server, you increase the version count of an entry by 1 and store it.
E.g. lets say the name michael is now changed to abraham, then version count of name_version on server will be 2
This updated version count will come in the response data.
Now, while storing the data in the same object, if you find the version count to be same, then the data update of that entry can be skipped, but if you find the version count to be changed, then the update of that entry needs to be done.
This way you can efficiently perform check on each entry and perform updates only on the changed entries.
Advice:
The above approach works best when you're dealing with large amount of data updation.
In case of simple text entries for an object, simple overwrite of data on all entries is efficient enough. And this also keeps the data reponse model simple.

Parsing a CSV for Database Insertion when Formatted Incorrectly

I recently wrote a mailing platform for one of our employees to use. The system runs great, scales great, and is fun to use. However, it is currently inoperable due to a bug that I can't figure out how to fix (fairly inexperienced developer).
The process goes something like this...
Upload a CSV file to a specific FTP directory.
Go to the import_mailing_list page.
Choose a CSV file within the FTP directory.
Name and describe what the list contains.
Associate file headings with database columns.
Then, the back-end loops over each line of the file, associating the values with a heading, and importing these values into a database.
This all works wonderfully, except in a specific case, when a raw CSV is not correctly formatted. For example...
fname, lname, email
Bob, Schlumberger, bob#bob.com
Bobbette, Schlumberger
Another, Record, goeshere#email.com
As you can see, there is a missing comma on line two. This would cause an error when attempting to pull "valArray[3]" (or valArray[2], in the case of every language but mine).
I am looking for the most efficient solution to keep this error from happening. Perhaps I should check the array length, and compare it to the index we're going to attempt to pull, before pulling it. But to do this for each and every value seems inefficient. Anybody have another idea?
Our stack is ColdFusion 8/9 and MySQL 5.1. This is why I refer to the array index as [3].
There's ArrayIsDefined(array, elementIndex), or ArrayLen(array)
seems inefficient?
You gotta code what you need to code, forget about inefficiency. Get it right before you get it fast (when needed).
I suppose if you are looking for another way of doing this (instead of checking the array length each time, although that really doesn't sound that bad to me), you could wrap each line insert attempt in a try/catch block. If it fails, then stuff the failed row in a buffer (including the line number and error message) that you could then display to the user after the batch has completed, so they could see each of the failed lines and why they failed. This has the advantages of 1) not having to explicitly check the array length each time and 2) catching other errors that you might not have anticipated beforehand (maybe a value is too long for your field, for example).

Resources