To optimize my application dictionary is a better option or arraycollection where both satisfy my needs. Please let me know. Thank you in advance.
Factually, arraycollection, arrays, dictionary, all act on the same internal data structure ,i.e a hashmap(or something similar).
Just that an arraycollection would have some pre-initialized memory slots unlike a dictionary.
Also, arraycollections are easier to deal with incase you need to iterate or do operations like shift, unshift etc.
So the question you should be asking is, what operations do i need to do on the data, and then decide which one you need to chose.
Related
I have a function that predicts a words being typed and returns the possibilities in an array. Unfortunately those aren’t sorted by frequency used. So I have a list of 10K ordered words listed by most frequent to less frequent. What would be an efficient way to compare the words in the array and the ordered list to return the most frequent one? (i.e the one it encounters first?)
I was tipped off by a friend to use a binary search tree but I really don't see how that helps me. From what I understood from the following website, only numerical values can be used.. Am I wrong in thinking so? Is there a better way of doing the aforementioned task?
Thanks in advance
You could create a dictionary with words as keys and frequencies as values. Then iterate over your result array, use the dictionary to obtain the frequency value for each item, and predict the item with the highest frequency.
I wouldn't use a vanilla binary search tree here. It would be possible - as Taylor Kirkpatrick says, you could just create a tree with words as keys and frequencies and use that to find the frequency for each result word, in much the same way as the dictionary solution.
The problem is that you cannot guarantee that a simple binary tree will be balanced. From the sound of it your data would probably be OK, since your words are in frequency order. The worst case would be if the words were in alphabetic order - then your binary tree would end up being identical to a linked list - it would never branch, since every node would attach to the right of the previous one. So the computational complexity of a search would be the same as iterating over the array of words - O(n) instead of O(log2N) (which is the best case for binary trees).
Of course, you could guard against this by randomising the list of words before doing the insert. But to my mind it's just easier to use a dictionary. I don't know what the actual implementation of Swift dictionaries is (and we won't until they open source it in a couple of months), but you can take it as read that it will out perform a vanilla BT for value retrieval.
I don't know what the background to this problem is - if you are learning CS it might be worth implementing the BST just for intellectual growth - in this case, with only 10,000 items you might find the performance differences are ultimately quite small. But if you are a working programmer trying to solve a problem, go with the dictionary approach.
You put all your words into a dictionary or a set. That's it. Dictionary if you have data associated with the words, set if you have no data and just want to know if the word is in the list or not.
You might want to use a Trie.
Put your word list into it. For every character entered, you traverse the Trie as deeply as you can and then show all paths to leaf nodes as possible completions.
Since the world like you have is likely static, you can precompute the Trie and load from disk/network/whatever at startup if performance is a concern.
You can use a binary search tree with anything as the actual value. To actually make use of the tree, use the frequency of the words as the numerical value. This is actually a pretty good solution to your problem. Each node of the tree will contain this word and a numeric value that represents the frequency of the word.
Here are a few links to help you out with making it.
Hope that helps.
I have an application for storing many strings in a TStringList. The strings will be largely similar to one another and it occurs to me that one could compress them on the fly - i.e. store a given string in terms of a mixture of unique text fragments plus references to previously stored fragments. StringLists such as lists of fully-qualified path and filenames should be able to be compressed greatly.
Does anyone know of a TStringlist descendant that implement this - i.e. provides read and write access to the uncompressed strings but stores them internally compressed, so that a TStringList.SaveToFile produces a compressed file?
While you could implement this by uncompressing the entire stringlist before each access and re-compressing it afterwards, it would be unnecessarily slow. I'm after something that is efficient for incremental operations and random "seeks" and reads.
TIA
Ross
I don't think there's any freely available implementation around for this (not that I know of anyway, although I've written at least 3 similar constructs in commercial code), so you'd have to roll your own.
The remark Marcelo made about adding items in order is very relevant, as I suppose you'll probably want to compress the data at addition time - having quick access to entries already similar to the one being added, gives a much better performance than having to look up a 'best fit entry' (needed for similarity-compression) over the entire set.
Another thing you might want to read up about, are 'ropes' - a conceptually different type than strings, which I already suggested to Marco Cantu a while back. At the cost of a next-pointer per 'twine' (for lack of a better word) you can concatenate parts of a string without keeping any duplicate data around. The main problem is how to retrieve the parts that can be combined into a new 'rope', representing your original string. Once that problem is solved, you can reconstruct the data as a string at any time, while still having compact storage.
If you don't want to go the 'rope' route, you could also try something called 'prefix reduction', which is a simple form of compression - just start out each string with an index of a previous string and the number of characters that should be treated as a prefix for the new string. Be aware that you should not recurse this too far back, or access-speed will suffer greatly. In one simple implementation, I did a mod 16 on the index, to establish the entry at which prefix-reduction started, which gave me on average about 40% memory savings (this number is completely data-dependant of course).
You could try to wrap a Delphi or COM API around Judy arrays. The JudySL type would do the trick, and has a fairly simple interface.
EDIT: I assume you are storing unique strings and want to (or are happy to) store them in lexicographical order. If these constraints aren't acceptable, then Judy arrays are not for you. Mind you, any compression system will suffer if you don't sort your strings.
I suppose you expect general flexibility from the list (including delete operation), in this case I don't know about any out of the box solution, but I'd suggest one of the two approaches:
You split your string into words and
keep separated growning dictionary
to reference the words and save list of indexes internally
You implement something related to
zlib stream available in Delphi, but operating by the block that
for example can contains 10-100
strings. In this case you still have
to recompress/compress the complete
block, but the "price" you pay is lower.
I dont think you really want to compress TStrings items in memory, because it terribly ineffecient. I suggest you to look at TStream implementation in Zlib unit. Just wrap regular stream into TDecompressionStream on load and TCompressionStream on save (you can even emit gzip header there).
Hint: you will want to override LoadFromStream/SaveToStream instead of LoadFromFile/SaveToFile
I have an application that searches files on the computer (configurable path, type etc). Currently it adds information to a database as soon as a matching file is found. Rather than that I want to hold the information in memory for further manipulation before inserting to database. The list may contain a lot of items. I consider performance as important factor. I may need iterating thru the items, so a structure that can be coded easily is another key issue. and how can I achieve php style associative arrays for this job?
If you're using Delphi 2009, you can use a TDictionary. It takes two generic parameters. The first should be a string, for the filename, and the second would be whatever data type you're associating with. It also has three built-in enumerators, one for key-value pairs, one for keys only and one for values only, which makes iterating easy.
Another solution would be to use just a standard TStringList.
As long as it's sorted and has some duplicate setting other than dupAccept, you can use indexof or indexofname to find items in the list quickly.
It also has the Objects addition which allows you to store object information attached to the name. Starting with D2009, TStringList has the OwnsObject property which allows you to delegate object cleanup to the TStringList. Prior to D2009 you have to handle that yourself.
Much of this will depend on how you are going to use the list and to what scale. If you are going to use it as a stack, or queue, then a TList would work fine. If your needing to search through the list for a specific item then you will need something that allows faster retrieval. TDictionary (2009) or TStringList (pre 2009) would be the most likely choice.
Dynamic arrays are also a possiblity, but if you use them you will want to minimize the use of SetLength as each time it is called it will re-allocate memory. TList manages this for you, which is why I suggested using a TList. if you KNOW how many you will deal with in advance, then use a dynamic array, and set its length on the onset.
If you have more items than will fit in memory then your choices also change. At that point I would either use a database table, or a tFileStream to store the records to be processed, then seek to the beginning of the table/stream for processing.
Try using the AVL-Tree by http://sourceforge.net/projects/alcinoe/ as your associative Array. It has an iterate-method for fast iteration. You may need to derive from his baseclass and implement your own comparator, but it's easy to use.
Examples are included.
I am refactoring some existing Delphi code into a class.
The current code uses a global variable defined as a dynamic array array of byte. At initialization time the code figures out the size of the array and uses SetLength to allocate it. It is convenient both as the buffer to obtain the data and as the runtime container for a later processing.
I want to move this variable as one of the object attributes.
But I am not sure if it is ok to maintain its type. Is it considered good practice?
The alternative I am considering is to tranform it to a dynamic container like a TList. I will keep the very same code for obtaining the data, with a local dynamic array but moving it to the container for the rest of its lifespan. Is it worth the effort? I know that elegance always pay off at the end, but I don't really see the value of the effort at this moment. Any thoughts?
Dynamic arrays are great, but really only for fixed dimentions. If they have to grow, especially in single record increments, this can cause eventual errors from the memory manager (and possible performance issues) since the array has to be reallocated and copied to the new bigger destination. TList does at least have a 'growing' mechanism that is called less frequently.
I know that elegance always pay off at the end,
Is that so? Note that changing working code always includes the risk to break something. It must IMHO be decided in every situation if the gained elegance is worth the risk.
In your case, if you add and remove items during runtime, I would use a TList since it is much easier for these operations. If you just initialize the length once and the arrays is constant after initialization you can just keep the dynamic array. There's definitely no "good practice" saying that you shouldn't use dynamic arrays.
is there any condition in which merge sort can be done without extra memory
my prof said it has and he will give bonus point on that.
You want to google in place merge sort.
Here is one of the result :
http://thomas.baudel.name/Visualisation/VisuTri/inplacestablesort.html
Use linked list. This will avoid the O(n) extra space needed during Mergeing of 2 lists. However, you cannot do anything about the space taken by recursion calls i.e O(lg(n)).
Yes, the answer to this is to use in-place merge sort
Given that this is a homework question, I can only point you to The Art of Computer Programming. A good programmer should be able to use the standard references in our field to research a question such as this.