YapDatabase memory consumption on fts index update

YapDatabase memory consumption on fts index update - yapdatabase

I´ve done some changes to my YapDatabase-fts-index, by adding a some columns and updated the versionTag.
Now, when starting up the database, yap runs through all relevant rows in order to rebuild the index, as expected.
But while doing this, memory allocations keeps increasing until the app crashes - before the fts-index rebuild completes.
My dataset is about 100k rows, where each row has 3 columns containing about 2-3 short words.
I verified the memory usage with a simple handler block:
let block = YapDatabaseFullTextSearchHandler.withObjectBlock { (dict: NSMutableDictionary, collection: String, key: String, object: AnyObject) -> Void in
dict.setObject("123 234 345", forKey: "1")
dict.setObject("456 567 678", forKey: "2")
}
Is this a bug in Yap? Is adding fts-columns not supported? Should I consider adding a new index for the additional fts-columns?

Related

How to properly use apoc.periodic.iterate to reduce heap usage for large transactions?

I am trying to use apoc.periodic.iterate to reduce heap usage when doing very large transactions in a Neo4j database.
I've been following the advice given in this presentation.
BUT, my results are differing from those observed in those slides.
First, some notes on my setup:
Using Neo4j Desktop, graph version 4.0.3 Enterprise, with APOC 4.0.0.10
I'm calling queries using the .NET Neo4j Driver, version 4.0.1.
neo4j.conf values:
dbms.memory.heap.initial_size=2g
dbms.memory.heap.max_size=4g
dbms.memory.pagecache.size=2g
Here is the cypher query I'm running:
CALL apoc.periodic.iterate(
"UNWIND $nodes AS newNodeObj RETURN newNodeObj",
"CREATE(n:MyNode)
SET n = newNodeObj",
{batchSize:2000, iterateList:true, parallel:false, params: { nodes: $nodes_in } }
)
And the line of C#:
var createNodesResCursor = await session.RunAsync(createNodesQueryString, new { nodes_in = nodeData });
where createNodesQueryString is the query above, and nodeData is a List<Dictionary<string, object>> where each Dictionary has just three entries: 2 strings, 1 long.
When attempting to run this to create 1.3Million nodes I observe the heap usage (via JConsole) going all the way up to the 4GB available, and bouncing back and forth between ~2.5g - 4g. Reducing the batch size makes no discernible difference, and upping the heap.max_size causes the heap usage to shoot up to almost as much as that value. It's also really slow, taking 30+ mins to create those 1.3 million nodes.
Does anyone have any idea what I may be doing wrong/differently to the linked presentation? I understand my query is doing a CREATE whereas in the presentation they are only updating an already loaded dataset, but I can't imagine that's the reason my heap usage is so high.
Thanks

My issue was that although using apoc.periodic.iterate, I was still uploading that large 1.3million node data set to the database as a parameter for the query!
Modifying my code to do the batching myself as follows fixed my heap usage problem, and the slowness problem:
const int batchSize = 2000;
for (int count = 0; count < nodeData.Count; count += batchSize)
{
string createNodesQueryString = $#"
UNWIND $nodes_in AS newNodeObj
CREATE(n:MyNode)
SET n = newNodeObj";
int length = Math.Min(batchSize, nodeData.Count - count);
var createNodesResCursor = await session.RunAsync(createNodesQueryString,
new { nodes_in = nodeData.ToList().GetRange(count, length) });
var createNodesResSummary = await createNodesResCursor.ConsumeAsync();
}

Firebase query in swift not working properly as usual

I have firebase database structure like this
All I want is those data which start with string '48DqysTKV0cMGf8orGlfhNaFLEw2' in the location "request/waiting"
My code is like below
But this code returns all the array in the waiting node not the specific nodes

The reason why is because you're using .childAdded. When called, initially it retrieves every element in the given path, thereafter it starts monitoring for added nodes only. More info.
Use .value instead in your query.
queryStarting(atValue) doesn't work the way you think it does. To achieve what you want, you need to limit your queries to a certain range/boundary. Read the different query types and decide for yourself which combination makes sense.

Try this (I think the problem in childAdded. Change it to .value):
let ref = FIRDatabase.database()
.reference(withPath: "request/waiting")
.queryStarting(atValue: "48DqysTKV0cMGf8orGlfhNaFLEw2")
.queryEnding(atValue: "48DqysTKV0cMGf8orGlfhNaFLEw2" + "\u{f8ff}")
ref.observeSingleEvent(of: .value, with: { snapshot in
if let data = snapshot.value as? [String: Any] {
// code..
}
})
The f8ff character used in the query above is a very high code point in the Unicode range. Because it is after most regular characters in Unicode, the query matches all values that start with a 48DqysTKV0cMGf8orGlfhNaFLEw2.
Hope it helps

query mid-section of firebase database in swift

I'm using firebase for a large-ish database, with each entry using an autoid key. To get, say, the last ten entries, I can use:
ref.queryLimitedToLast(10).observeSingleEventOfType(.Value, withBlock: { snapshot in
for item in snapshot.children {
//do some code to each item
}
})
However, I can't for the life of me work out how to then get just the ten entries before that. Eg. if the database had 100 entries, my code would return 90-100, but how would I then get entries 80-90 (without, for example, querying the last 20 and throwing half away, as it seems inefficient)?
Edit:
I ended up using
ref.queryOrderedByChild("timecode").queryEndingAtValue(final).queryLimitedToLast(10).observeSingleEventOfType(.Value, withBlock: { snapshot in
for item in snapshot.children {
//do some code to each item, including saving a new value of 'final'
}
})
and saving the value 'final' as the timecode of the last update. that is, first i would get results, 90-100, say, and save the timecode of 90 as final (minus one second), then use this for the ending value, etc... to find results 80-89.
Just as Jay describes below, but using a timestamp instead of an index number (as it was already in there)
Edit 2:
Also, to get it working better, I also added ".indexOn": "timecode" to the firebase rules for the database

There's a couple of ways to do this but an easy solution is to keep a total_count in another node and an index within each node.
Then use queryStartingAtValue and queryEndingAtValue to query the range of child nodes you are interested in.
When you add a child to your 'posts' node for example, add one to the total_count node and save it. Over time you'll have 100 posts and the total_count node will have a value of 100. You can then query for any range of posts: .queryStartingAtValue(80) and . queryEndingAtValue(89), or .queryStartingAt(20) and .queryEndingAt(30)
For example, assume there's 45 posts (showing just 4 of them here)
posts
...
post_1024
text: "my post!"
index: 42
post_1025
text: "another post"
index: 43
post_1026
text: "yippee"
index: 44
post_1027
text: "Stuff and Things"
index: 45
and then a node to track them
post_info
total_count: 45
and the code to query for the middle two nodes
let ref = myRootRef.childByAppendingPath("posts"
ref.queryOrderedByChild("index").queryStartingAtValue(43).queryEndingAtValue(44)
.observeEventType(.Value, withBlock: { snapshot in
print(snapshot.key)
})
and the output would be
post_1025
text: "another post"
index: 43
post_1026
text: "yippee"
index: 44
That being said, this may be slightly redundant depending on what happens to your data. If you never delete posts, then you're set. However, if you delete posts then obviously there's a gap in your indexes (42, 43, .. 45) so other factors need to be taken into consideration.
You may not even need a total_count - it just depends on how your app works.
You could also leverage the priority variable on your nodes to store the index instead of having it be a child node.
Transitions and .observeSingleEvent with .Value and .numChildren can be also be used to obtain a live node count.

Have you tried stacking queries?
ref.queryLimitedToLast(20).queryLimitedToFirst(10).observeSingleEventOfType(.Value, withBlock: { snapshot in
for item in snapshot.children {
//do some code to each item
}
})
Just a thought, haha.

Need to arrange the displaying pattern in table view controller Xcode 7 using Firebase

I have created a table view controller which populates the cell from firebase. The table loads perfectly with a minor problem. Whenever a new item is added the item get displayed at the bottom of the screen instead on the top. How do I arrange my item, so that whenever a new item is added, it gets on the top cell and get displayed on the top of the app not on the bottom so I don't have to scroll down to see what new item has been added..

Firebase doesn't have a natural order - you need to decide how you want your data ordered and set up your structure accordingly. Then retrieve the data by that order. You can order them by their keys, values or priority.
For example:
You have a blog app and you always want the the blogs listed in time order:
post_0
timestamp: 1
post: "posted at 1am"
post_1
timeStamp: 13
post: "posted at 1pm"
post_2
timeStamp: 9
post: "posted at 9am"
When you read the posts node, order by timeStamp and it will be in that order, 1, 9, 13.
ref.queryOrderedByChild("timeStamp").observeEventType(.ChildAdded, withBlock: { snapshot in
println("\(snapshot.value)")
})
If you want just the latest two posts, use a limit query.
let scoresRef = Firebase(url:"https://dinosaur-facts.firebaseio.com/scores")
postsRef.queryOrderedByValue().queryLimitedToLast(2).observeEventType(.ChildAdded, withBlock: { snapshot in
println("post time \(snapshot.value)")
})
Of course there are many other options:
Read the firebase data into an array, then use NSSortDescriptors to sort descending.
Read the firebase data in via order by (so they are 1, 2, 3, 4) with 4 being the most current post, and insert each one into an array at position 0.
4 Most current post stored position 0 in the array
3 <- read after 2
2 <- read after 1
1 <- first read in
The Firebase Reading Data is a really good reference for understanding queries and the ordering of data.

Query to get rows with unique objects parse ios

I have a table called table 1 which contains a field called parent field which contains object(Objectid) of table 2.Now i dont want to get the duplicated objectId's and arrange them on the basis of ascending order.Here is the data of the table.
Table1
parentfield(id)
790112
790000
790001
790112
790000
790001
Now the result would be the first three elements but i dont know the number of id's matched.Is there a way to do that?

Unfortunately, there is not SELECT DISTINCT / GROUP BY operation in Parse.
See this thread: https://parse.com/questions/retrieving-unique-values
Suggested team solution:
There's no built-in query constraint that would return distinct values
based on a column.
As a workaround, you can query for all the rows, then iterate through
them and track the distinct values for the desired column
So, the sad, bad, horrible idea, is to make a Cloud Function that fetch all the possible elements ( keep in mind that Parse allow you to fetch at maximum 1000 elements for each query ) and then remove the duplicates from the resulting list. You can do this all on the Cloud code function, and then returning the cleaned list to the client, or you can do this directly on your client devices. All this means that if you want to retrieve the real select distinct equivalent at this conditions, you should first fetch all the element ( a loop of query, retrieving 1000 items at time ) and then apply your custom algorithm for removing the duplicates. I know, it's really long and frustrating, considering the fact that the Parse cloud function execution has a timeout limit of 7-10 seconds. Maybe moving to the Parse backgroud jobs, you can populate a distinct temporary table, since you should have up to 15 minutes of execution before the timeout.
Another drastic solution is to move your data on another server that support an ER databases ( like on Openshift, that keep a free tier ) and with some Parse background job, you synchronize the elements from parse to the ER db, so you redirect the client request to the ER db instead of Parse.
Hope it helps

i have an app with similar requirements to display each company name only once and the number of items it has
here is how I implemented the solution at client side (swift 3), my database is relatively small:
var companyItemsCount = [Int: Int]() // stores unique companyId and number of items (count)
query.findObjectsInBackground(block: { (objects: [PFObject]?, error: Error?) in
var companyId: Int = 0
if error == nil {
// The find succeeded.
// Do something with the found objects
if let objects = objects {
for object in objects {
companyId = object["companyId"] as! Int
if self.companyItemsCount[companyId] == nil {
self.companyNames.append(object["companyName"] as! String)
self.companyIds.append(object["companyId"] as! Int)
self.companyItemsCount[companyId] = 1
}else{
self.companyItemsCount[companyId]! += 1
}
self.tableView.reloadData()
}
}
} else {
// Log details of the failure
print("Error: \(error!) \(error!.localizedDescription)")
}
})
self.tableView.reloadData()

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart