I want to ask how I should handle a large list in Flutter. My app gets super slow when I am at a data item that is really deep in the list which I am searching. My list is 70,000+ objects of a data structure large.
The following is how I am "Searching" the list.
Future<Iterable<SomeDataStruct>> _getAllData() async {
return allData.where((a) => (a.dataTitle.toLowerCase().contains(querySearch.toLowerCase().trim())));
}
Building the list using a ListView.builder inside of a FutureBuilder.
When I search and a result or results from deep inside the list are populated the app is extremely slow to the point where I click a list item and it takes few seconds before it does its onTap. And if I need to change the search query it takes time for the soft keyboard to come back up after I click on the TextField.
Where am I making a mistake or handling this wrong, what should I do to have my huge list searchable without making app unbearable.
EDIT:
How I made it not slow down the app after change to code. Is this correct?
String tempQuery;
List<SomeDataStruct> searchResults = [];
Future<List<SomeDataStruct>> _getAllData() async {
if(querySearch!=tempQuery) {
tempQuery = querySearch;
searchResults = allData.where((a) => (a.dataTitle.toLowerCase().contains(querySearch.toLowerCase().trim()))).toList();
}
return searchResults;
}
Contains is expensive
Contains-queries are expensive, because every entry needs to be checked at every position (up to value.length - searchTerm.length) if the search term can be found.
Limiting search support to the beginning of the string would improve performance a lot already. In addition you could create helper data-structures where the whole list of values is split into parts with the same character at the beginning. If the chunks are still too big another level could be added for the 2nd character. The lookup would be fast because there are only a limited number of characters.
Using a database might take off some programming work (maintaining indexes). A database like SQLite could be used with indexes specialized to your kind of queries.
Split into smaller chunks of work to allow the framework to do its work
If you can't limit to "beginning of string"-search, you could still split the data structure into smaller chunks and invoke search for each chunk async. This way the UI gets "some air to breath" to re-render the UI before the next chunk is searched. The search result would be updated incrementally.
Move work off the UI thread
Another way would be to start up another isolate and do the search there. Another isolate can run on another CPU (core) and therefore would not block the UI thread when searching. This way it wouldn't be necessary to split into chunks. It might still be advantageous though to incrementally update the UI instead of keeping the user waiting until the whole search result becomes available.
See also
https://api.dartlang.org/stable/2.2.0/dart-isolate/dart-isolate-library.html
https://pub.dartlang.org/packages/isolate
https://codingwithjoe.com/dart-fundamentals-isolates/
Caching
It might also help improve performance to keep search results in memory. For example if the user enters foo and then presses backspace then you could reuse the search result for fo that you previously calculated already, but that only helps in some cases.
Measuring
Another important point of course is to do benchmarking. Whatever you try to improve performance, create benchmarks to learn what measures have what effect and if it's worth it. You'll learn a lot about your scenario, your data, Dart, ..., and this will allow you to make good decisions.
Searching a list will crash your app if the list is large enough. I tested searching through a list of 150,000 items and my app crashed. It seems to work if what you searching is near the start of the list, but if you search for something that might be deep in the list for example index 100,000 then things slow down considerable and app will freeze.
To solve this I moved to a database solution by using the moor library which is built on top of sqlite. You simple follow the docs to set up the boilerplate stuff, doesn't take long.
You can create a simple database table which just 2 fields, id and another to hold your list values. For this example I call the other field itemName
In your database class you can have a function to do the filtering e.g.
Future<List<Item>> getFilteredItems(search) => (select(Items)..where((t) => t.itemName.like(search))).get();
Then your code will look something like:
String tempQuery;
List<SomeDataStruct> searchResults = [];
Future<List<SomeDataStruct>> _getAllData() async {
if(querySearch!=tempQuery) {
tempQuery = querySearch;
query = await MyDatabaseClass().getFilteredItems("%$querySearch%");
for (int i = 0; i<query.length; i++) {
searchResults.add(query[i].itemName); //itemName
}
}
return searchResults;
}
No more crashes or freezes after this.
Maybe this way :
ListView.builder(
itemBuilder: (BuildContext context, int index) {
return Text(data[index]);
},
)
Related
I'm designing the UI for a Lua program, one element of which requires the user to either select an existing value from a master table or create a new value in that table.
I would normally use an IUP list with EDITBOX = "YES".
However, the number of items that the user can select may run into many hundreds or possibly thousands, and the performance when populating the list in iup (and also selecting from it) is unacceptably slow. I cannot control the number of items in the table.
My current thinking is to create a list with an editbox, but without any values. As the user types into the editbox (after perhaps 2-3 characters) the list would populate with the subset of table items that start with the characters typed. The user could then select an item from the list or keep typing to narrow the options or create a new item.
For this to work, I need to be able to create a new table with the items from the master table that start with the entered characters.
One option would be to iterate through the master table using the Penlight 'startswith' function to create the new table:
require "pl.init"
local subtable = {} --empty result table
local startstring = "xyz" -- will actually be set by the iup control
for _, v in ipairs (mastertable) do
if stringx.startswith(v, startstring) then
table.insert(subtable,v)
end
end
However, I'm worried about the performance of doing that if the master table is huge. Is there a more efficient way to code this, or a different way I could implement the UI?
There are various approaches you can take to improve the big-O performance of your prefix search, at the cost of increased code complexity; that said, given the size of your dataset (thousands of items) and the intended use (triggered by user interaction, rather than e.g. game logic that needs to run every frame), I think a simple linear search over the options is almost certainly going to be fast enough.
To test this theory, I timed the following code:
local dict = {}
for word in io.lines('/usr/share/dict/words') do
table.insert(dict, word)
end
local matched = {}
local search = "^" .. (...)
for _,word in ipairs(dict) do
if word:match(search) then
table.insert(matched, word)
end
end
print('Found '..#matched..' words.')
I used /usr/bin/time -v and tried it with both lua 5.2 and luaJIT.
Note that this is fairly pessimistic compared to your code:
no attempt made to localize library functions that are repeatedly called, or use # instead of table.insert
timing includes not just the search but also the cost of loading the dictionary into memory in the first place
string.match is almost certainly slower than stringx.startswith
dictionary contains ~100k entries rather than the "hundreds to thousands" you expect in your application
Even with all those caveats, it costs 50-100ms in lua5.2 and 30-50ms in luaJIT, over 50 runs.
If I use os.clock() to time the actual search, it consistently costs about 10ms in lua5.2 and 3-4 in luajit.
Now, this is on a fairly fast laptop (Core i7), but also non-optimized code running on a dataset 10-100x larger than you expect to process; given that, I suspect that the naïve approach of just looping over the entries calling startswith will be plenty fast for your purposes, and result in code that's significantly simpler and easier to debug.
I'm scanning through the order list using the standard OrderSelect() function. Since there is a great function to get the current _Symbol for an order, I expected to find the equivalent for finding the timeframe (_Period). However, there is no such function.
Here's my code snippet.
...
for (int i=orderCount()-1; i>=0; i--) {
if (OrderSelect(i, SELECT_BY_POS, MODE_TRADES)) {
if (OrderMagicNumber()==magic && OrderSymbol()==_Symbol ) j++;
// Get the timeframe here
}
}
...
Q: How can I get the open order's timeframe given it's ticket number?
In other words, how can I roll my own OrderPeriod() or something like it?
There is no such function. Two approaches might be helpful here.
First and most reasonable is to have a unique magic number for each timeframe. This usually helps to avoid some unexpected behavior and errors. You can update the input magic number so that the timeframe is automatically added to it, if your input magic is 123 and timeframe is M5, the new magic number will be 1235 or something similar, and you will use this new magic when sending orders and checking whether a particular order is from your timeframe. Or both input magic and timeframe-dependent, if you need that.
Second approach is to create a comment for each order, and that comment should include data of the timeframe, e.g. "myRobot_5", and you parse the OrderComment() in order to get timeframe value. I doubt it makes sense as you'll have to do useless parsing of string many times per tick. Another problem here is that the comment can be usually changed by the broker, e.g. if stop loss or take profit is executed (and you need to analyze history), and if an order was partially closed.
One more way is to have instances of some structure of a class inherited from CObject and have CArrayObj or array of such instances. You will be able to add as much data as needed into such structures, and even change the timeframe when needed (e.g., you opened a deal at M5, you trail it at M5, it performs fine so you close part and virtually change the timeframe of such deale to M15 and trail it at M15 chart). That is probably the most convenient for complex systems, even though it requires to do some coding (do not forget to write down the list of existing deals into a file or deserialize somehow in OnDeinit() and then serialize back in OnInit() functions).
The following exercise comes from p. 234 of Ierusalimschy's Programming in Lua (4th edition). (NB: Earlier in the book, the author explicitly rejects the word memoization, and insists on using memorization instead. Keep this in mind as you read the excerpt below.)
Exercise 23.3: Imagine you have to implement a memorizing table for a function from strings to strings. Making the table weak will not do the removal of entries, because weak tables do not consider strings as collectable objects. How can you implement memorization in that case?
I am stumped!
Part of my problem is that I have not been able to devise a way to bring about the (garbage) collection of a string.
In contrast, with a table, I can equip it with finalizer that will report when the table is about to be collected. Is there a way to confirm that a given string (and only that string) has been garbage-collected?
Another difficulty is simply figuring out what the desired function's specification is. The best I can do is to figure out what it isn't. Earlier in the book (p. 225), the author gave the following example of a "memorizing" function:
Imagine a generic server that takes requests in the form of strings with Lua code. Each time it gets a request, it runs load on the string, and then calls the resulting function. However, load is an expensive function, and some commands to the server may be quite frequent. Instead of calling load repeatedly each time it receives a common command like "closeconnection()", the server can memorize the results from load using an auxiliary table. Before calling load, the server checks in the table whether the given string already has a translation. If it cannot find a match then (and only then) the server calls load and stores the result into the table. We can pack this behavior in a new function:
[standard memo(r)ized implementation omitted; see variant using a weak-value table below]
The savings with this scheme can be huge. However, it may also cause unsuspected waste. ALthough some commands epeat over and over, many other commands happen only once. Gradually, the ["memorizing"] table results accumulates all commands the server has ever received plus their respective codes; after enough time, this behavior will exhaust the server's memory.
A weak table provides a simple solution to this problem. If the results table has weak values, each garbage-collection cycle will remove all translations not in use at that moment (which means virtually all of them)1:
local results = {}
setmetatable(results, {__mode = "v"}) -- make values weak
function mem_loadstring (s)
local res = results[s]
if res == nil then -- results not available?
res = assert(load(s)) -- compute new results
result[s] = res -- save for later reuse
end
return res
end
As the original problem statement notes, this scheme won't work when the function to be memo(r)ized returns strings, because the garbage collector does not treat strings as "collectable".
Of course, if one is allowed to change the desired function's interface so that instead of returning a string, it returns a singleton table whose sole item is the real result string, then the problem becomes almost trivial, but I find it hard to believe that the author had such a crude "solution" in mind2.
In case it matters, I am using Lua 5.3.
1 As an aside, if the rationale for memo(r)ization is to avoid invoking load more often than necessary, the scheme proposed by the author does not make sense to me. It seems to me that this scheme is based on the assumption (a heuristic, really) that a translation that is used frequently, and thus would pay to memo(r)ize, is also one that is always reachable (and hence not collectable). I don't see why this should necessarily, or even likely, be the case.
2 One may be able to put lipstick on this pig in the form of a __tostring method that would allow the table (the one returned by the memo(r)ized function) to masquerade as a string in certain contexts; it's still a pig, though.
Your idea is correct: wrap string into a table (because table is collectable).
function memormoize (func_from_string_to_string)
local cached = {}
setmetatable(cached, {__mode = "v"})
return
function(s)
local c = cached[s] or {func_from_string_to_string(s)}
cached[s] = c
return c[1]
end
end
And I see no pigs in this solution :-)
one that is always reachable (and hence not collectable). I don't see why this should necessarily, or even likely, be the case.
There will be no "always reachable" items in a weak table.
But most frequent items will be recalculated only once per GC cycle.
The ideal solution (never collect frequently used items) would require more complex implementation.
For example, you can move items from normal cache to weak cache when item's "inactivity timer" reaches some threshold.
So, for instance, given I have the following class
class Foo{
String id;
Foo(this.id);
}
I want to have some sort of collection of Foos, and then be able to find any Foo by its id. I want to compare these two ways of accomplishing this:
With a Map:
var foosMap = <String, Foo>{"foo1": new Foo("foo1"), "foo2": new Foo("foo2")};
var foo2 = foosMap["foo2"];
With a List:
var foosList = <Foo>[new Foo("foo1"), new Foo("foo2")];
var foo2 = foosList.singleWhere((i) => i.id == "foo2");
Is it more convenient in terms of performance doing it the first way (with a Map)? Are there any other considerations to take into account?
It really depends on the number of items you're searching through. If you know big-O notation, retrieving a value from a map is O(1) or constant time, while searching linearly through a list is O(n) or linear time. That means that the lookup time is the same for a map no matter not many elements are in it, but the lookup time for a list grows with the number of elements.
Because of this lot of programmers use hash maps for everything when for very small sets lists are often faster. If you ever look at performance critical code that does lookups, you'll sometimes see special cases to switch to lists instead of maps for small sets. The only way to know if this is a good strategy is to do performance testing.
Speed isn't everything though, and I prefer the clarity of the map syntax in many situations, assuming you have a map already. If you would have to build a map just to perform a lookup, then singleWhere() or firstWhere() are great.
I'm pretty sure this is a silly newbie question but I didn't know it so I had to ask...
Why do we use data structures, like Linked List, Binary Search Tree, etc? (when no dynamic allocation is needed)
I mean: wouldn't it be faster if we kept a single variable for a single object? Wouldn't that speed up access time? Eg: BST possibly has to run through some pointers first before it gets to the actual data.
Except for when dynamic allocation is needed, is there a reason to use them?
Eg: using linked list/ BST / std::vector in a situation where a simple (non-dynamic) array could be used.
Each thing you are storing is being kept in it's own variable (or storage location). Data structures apply organization to your data. Imagine if you had 10,000 things you were trying to track. You could store them in 10,000 separate variables. If you did that, then you'd always be limited to 10,000 different things. If you wanted more, you'd have to modify your program and recompile it each time you wanted to increase the number. You might also have to modify the code to change the way in which the calculations are done if the order of the items changes because the new one is introduced in the middle.
Using data structures, from simple arrays to more complex trees, hash tables, or custom data structures, allows your code to both be more organized and extensible. Using an array, which can either be created to hold the required number of elements or extended to hold more after it's first created keeps you from having to rewrite your code each time the number of data items changes. Using an appropriate data structure allows you to design algorithms based on the relationships between the data elements rather than some fixed ordering, giving you more flexibility.
A simple analogy might help to understand. You could, for example, organize all of your important papers by putting each of them into separate filing cabinet. If you did that you'd have to memorize (i.e., hard-code) the cabinet in which each item can be found in order to use them effectively. Alternatively, you could store each in the same filing cabinet (like a generic array). This is better in that they're all in one place, but still not optimum, since you have to search through them all each time you want to find one. Better yet would be to organize them by subject, putting like subjects in the same file folder (separate arrays, different structures). That way you can look for the file folder for the correct subject, then find the item you're looking for in it. Depending on your needs you can use different filing methods (data structures/algorithms) to better organize your information for it's intended use.
I'll also note that there are times when it does make sense to use individual variables for each data item you are using. Frequently there is a mixture of individual variables and more complex structures, using the appropriate method depending on the use of the particular item. For example, you might store the sum of a collection of integers in a variable while the integers themselves are stored in an array. A program would need to be pretty simple though before the introduction of data structures wouldn't be appropriate.
Sorry, but you didn't just find a great new way of doing things ;) There are several huge problems with this approach.
How could this be done without requring programmers to massively (and nontrivially) rewrite tons of code as soon as the number of allowed items changes? Even when you have to fix your data structure sizes at compile time (e.g. arrays in C), you can use a constant. Then, changing a single constant and recompiling is sufficent for changes to that size (if the code was written with this in mind). With your approach, we'd have to type hundreds or even thousands of lines every time some size changes. Not to mention that all this code would be incredibly hard to read, write, maintain and verify. The old truism "more lines of code = more space for bugs" is taken up to eleven in such a setting.
Then there's the fact that the number is almost never set in stone. Even when it is a compile time constant, changes are still likely. Writing hundreds of lines of code for a minor (if it exists at all) performance gain is hardly ever worth it. This goes thrice if you'd have to do the same amount of work again every time you want to change something. Not to mention that it isn't possible at all once there is any remotely dynamic component in the size of the data structures. That is to say, it's very rarely possible.
Also consider the concept of implicit and succinct data structures. If you use a set of hard-coded variables instead of abstracting over the size, you still got a data structure. You merely made it implicit, unrolled the algorithms operating on it, and set its size in stone. Philosophically, you changed nothing.
But surely it has a performance benefit? Well, possible, although it will be tiny. But it isn't guaranteed to be there. You'd save some space on data, but code size would explode. And as everyone informed about inlining should know, small code sizes are very useful for performance to allow the code to be in the cache. Also, argument passing would result in excessive copying unless you'd figure out a trick to derive the location of most variables from a few pointers. Needless to say, this would be nonportable, very tricky to get right even on a single platform, and liable to being broken by any change to the code or the compiler invocation.
Finally, note that a weaker form is sometimes done. The Wikipedia page on implicit and succinct data structures has some examples. On a smaller scale, some data structures store much data in one place, such that it can be accessed with less pointer chasing and is more likely to be in the cache (e.g. cache-aware and cache-oblivious data structures). It's just not viable for 99% of all code and taking it to the extreme adds only a tiny, if any, benefit.
The main benefit to datastructures, in my opinion, is that you are relationally grouping them. For instance, instead of having 10 separate variables of class MyClass, you can have a datastructure that groups them all. This grouping allows for certain operations to be performed because they are structured together.
Not to mention, having datastructures can potentially enforce type security, which is powerful and necessary in many cases.
And last but not least, what would you rather do?
string string1 = "string1";
string string2 = "string2";
string string3 = "string3";
string string4 = "string4";
string string5 = "string5";
Console.WriteLine(string1);
Console.WriteLine(string2);
Console.WriteLine(string3);
Console.WriteLine(string4);
Console.WriteLine(string5);
Or...
List<string> myStringList = new List<string>() { "string1", "string2", "string3", "string4", "string5" };
foreach (string s in myStringList)
Console.WriteLine(s);