I have a class that have multiple instance variables. I want to achieve two purposes with the class. It's possible that I may only use some variables for one purpose and sometime use both.
Here's a more concrete example. I want to create a class that every time the user tap the screen, a dog sprite and cat sprite appear with an animation. If tapped again, they continue to perform different animation. However, sometime I only want the dog sprite to appear and update. And some other rare times, I want the cat sprite to appear after a couple of taps after the dog sprite appeared.
The question is: does instance variable allocate too much memory? I'm highly concerned with performance, because I'm planning to make a memory-intensive game. Since it's hard to predict when I actually use all the instance variable, should I divide them into two classes? Let's divide the possible scenarios to get a better idea.
Only the Dog Sprite is used and the cat sprite never appears : The cat's instance variable is left untouched if left in one class.
The dog sprite appear first, then the cat sprite appear later : Both sprite will eventually appear. It's possible to divide it into two classes, but some methods are duplicated since methods such as the touch advance logic and animation are similar. But if we leave it in once class, scenario 1 could occur, which could possibly be solve without a lot of duplicate code being reproduced.
Other things could occur, but the problems is already discussed above. These are the pro and con from my point of view:
One Class Approach
Pro
Avoid some duplicate logic
No need to import multiple header that leads to some similar instance variable
Con
Possibly leave half of instance variables unused (including NSString, CCSprite, a lot of integers and floats, CCAnimation, CCLabelBMFont)
Two Class Approach
Pro
Less instance variables
Possibly inherit from the class without inheriting some unnecessary variables in the future
Con
Some logic are reproduced
It's difficult to decide which option I should use. Any suggestions would be great! Thank you in advance!
if (didHelp)
for (int x = 0; x < 100; x++)
NSLog(#"Thanks!");
I'm highly concerned with performance
You and thousands of other inexperienced developers. Seriously, there are two things you're most likely going to experience:
your idea is way out of proportion and no amount of performance optimization will make it work -> change your idea
performance won't matter the least bit and you simply wasted time
Performance is among the least important things a game developer needs to consider at the start of a project.
Why?
Case #2 is self evident.
Assessing case #1 with reasonable accuracy before you even get started requires experience. Even then it's difficult. Simply have a backup plan if feature X proves to be too technically challenging (or impossible). If you can't assess performance, and your idea won't work with any backup plan, again you have two options:
implement a different idea
create a quick prototype to find out the peak performance parameters: memory usage, CPU & GPU utilization, loading times, and whatever other fitness tests seem appropriate to find out if your idea is feasible within a few days, if not hours.
does instance variable allocate too much memory?
No, and allocated memory has very little to do with performance.
You can use class_getInstanceSize to see how much memory a class instance uses. Rarely ever will a class instance use more than 500 Bytes. However, this only counts memory allocated for instance variables - not the memory the instance variables may point to. In a cocos2d app it's fair to say that 95% of your memory usage will come from textures.
It's difficult to decide which option I should use
Always strive to:
write readable code
write maintainable code
write less code
write safer code
write code only once (avoid duplication)
EmbodiedD,
You are certainly worried about too much here. The heap is going to get quite large in most applications. One simple class will be irrelevant. When you have 1000 instances of a data intensive class then you might have to start thinking about profiling.
If you are worried about organization, that's another thing altogether.
If you are loading classA with var1 and var2 or loading classA with var1 and class2 with var2, its more a matter of how you were taught to do abstraction.
This is a somewhat open-ended question, and there are indefinitely many ways to approach this question. Therefore, this is my approach and may or may not fit in every scenarios.
There are cases when an instance variable could be replace -- however this should not affect your decision if necessarily needed. Instance variable should be used when needed. Do not perform endless calculation just to substitute a single instance variable. Do try to limit your instance variables into variables when it is not needed outside a certain scope. Thanks to the informative users that posted on here, instance variable left unused impact performance at such a microscopic scale that you should not worry.
From my point of view, a class should only have one focus -- on function and and should pass on any other information to other class that need it. Information should remain encapsulated -- with one function to maintain reusability in other projects.
One should focus on the relationship of the function. IT-IS is the relationship that say one object should inherit another. In reality, it's like a Sienna-IS a car. A boat-IS a vehicle. Therefore, these objects should inherit any information from it's superclass. On contrast, IT-HAS say that these class contain something, usually of a quality or component, that cannot be inherited. A sienna-IS a car, but a tire-IS-NOT a sienna. Rather, a sienna-HAS a tire.
Another important relationship is delegation. The fancy definition say it perform a task on behalf of another, much like how delegates in the US represent the people of their states. Basically, it pass a certain information saying to the other class, who should in good practice, not affect the other former class. The class should not know exactly who it pass on to, but know enough to pass on certain information. This process of not knowing the exact identity of the delegate is called coupling.
In my case, of cats and dogs, delegation along with IT-IS is subjectively the best answer. Your opinion may differ. A base class should contain all the information that the Cat and Dog share. And any other information that is needed, such as the sprite's position, should be passed on as a delegate to the other class. And based on what I wrote, a class should not, in normal circumstances, programmed to do two function; for a class do one function and pass on all other dutiful needs to another.
Related
This one's kind of an open ended design question I'm afraid.
Anyway: I have a big two-dimensional array of stuff. This array is mutable, and is accessed by a bunch of threads. For now I've just been dealing with this as a Arc<Mutex<Vec<Vec<--owned stuff-->>>>, which has been fine.
The problem is that stuff is about to grow considerably in size, and I'll want to start holding references rather than complete structures. I could do this by inverting everything and going to Vec<Vec<Arc<Mutex>>, but I feel like that would be a ton of overhead, especially because each thread would need a complete copy of the grid rather than a single Arc/Mutex.
What I want to do is have this be an array of references, but somehow communicate that the items being referenced all live long enough according to a single top-level Arc or something similar. Is that possible?
As an aside, is Vec even the correct data type for this? For the grid in particular I really want a large, fixed-size block of memory that will live for the entire length of the program once it's initialized, and has a lot of reference locality (along either dimension.) Is there something else/more specialized I should be using?
EDIT:Giving some more specifics on my code (away from home so this is rough):
What I want:
Outer scope initializes a bunch of Ts and somehow collectively ensures they live long enough (that's the hard part)
Outer scope initializes a grid :Something<Vec<Vec<&T>>> that stores references to the Ts
Outer scope creates a bunch of threads and passes grid to them
Threads dive in and out of some sort of (problable RW) lock on grid, reading the Tsand changing the &Ts in the process.
What I have:
Outer thread creates a grid: Arc<RwLock<Vector<Vector<T>>>>
Arc::clone(& grid)s are passed to individual threads
Read-heavy threads mostly share the lock and sometimes kick each other out for the writes.
The only problem with this is that the grid is storing actual Ts which might be problematically large. (Don't worry too much about the RwLock/thread exclusivity stuff, I think it's perpendicular to the question unless something about it jumps out at you.)
What I don't want to do:
Top level creates a bunch of Arc<Mutex<T>> for individual T
Top level creates a `grid : Vec<Vec<Arc<Mutex>>> and passes it to threads
The problem with that is that I worry about the size of Arc/Mutex on every grid element (I've been going up to 2000x2000 so far and may go larger). Also while the threads would lock each other out less (only if they're actually looking at the same square), they'd have to pick up and drop locks way more as they explore the array, and I think that would be worse than my current RwLock implementation.
Let me start of by your "aside" question, as I feel it's the one that can be answered:
As an aside, is Vec even the correct data type for this? For the grid in particular I really want a large, fixed-size block of memory that will live for the entire length of the program once it's initialized, and has a lot of reference locality (along either dimension.) Is there something else/more specialized I should be using?
The documenation of std::vec::Vec specifies that the layout is essentially a pointer with size information. That means that any Vec<Vec<T>> is a pointer to a densely packed array of pointers to densely packed arrays of Ts. So if block of memory means a contiguous block to you, then no, Vec<Vec<T>> cannot give that you. If that is part of your requirements, you'd have to deal with a datatype (let's call it Grid) that is basically a (pointer, n_rows, n_columns) and define for yourself if the layout should be row-first or column-first.
The next part is that if you want different threads to mutate e.g. columns/rows of your grid at the same time, Arc<Mutex<Grid>> won't cut it, but you already figured that out. You should get clarity whether you can split your problem such that each thread can only operate on rows OR columns. Remember that if any thread holds a &mut Row, no other thread must hold a &mut Column: There will be an overlapping element, and it will be very easy for you to create a data races. If you can assign a static range of of rows to a thread (e.g. thread 1 processes rows 1-3, thread 2 processes row 3-6, etc.), that should make your life considerably easier. To get into "row-wise" processing if it doesn't arise naturally from the problem, you might consider breaking it into e.g. a row-wise step, where all threads operate on rows only, and then a column-wise step, possibly repeating those.
Speculative starting point
I would suggest that your main thread holds the Grid struct which will almost inevitably be implemented with some unsafe methods, e.g. get_row(usize), get_row_mut(usize) if you can split your problem into rows/colmns or get(usize, usize) and get(usize, usize) if you can't. I cannot tell you what exactly these should return, but they might even be custom references to Grid, which:
can only be obtained when the usual borrowing rules are fulfilled (e.g. by blocking the thread until any other GridRefMut is dropped)
implement Drop such that you don't create a deadlock
Every thread holds a Arc<Grid>, and can draw cells/rows/columns for reading/mutating out of the grid as needed, while the grid itself keeps book of references being created and dropped.
The downside of this approach is that you basically implement a runtime borrow-checker yourself. It's tedious and probably error-prone. You should browse crates.io before you do that, but your problem sounds specific enough that you might not find a fitting solution, let alone one that's sufficiently documented.
I am trying to squeeze every bit of efficiency out of my application I am working on.
I have a couple arrays that follow the following conditions:
They are NEVER appended to, I always calculate the index myself
The are allocated once and never change size
It would be nice if they were thread safe as long as it doesn't cost performance
Some hold primitives like floats, or unsigned ints. One of them does hold a class.
Most of these arrays at some point are passed into a glBuffer
Never cleared just overwritten
Some of the arrays individual elements are changed entirely by = others are changed by +=
I currently am using swift native arrays and am allocating them like var arr = [GLfloat](count: 999, repeatedValue: 0) however I have been reading a lot of documentation and it sounds like Swift arrays are much more abstract then a traditional C-style array. I am not even sure if they are allocated in a block or more like a linked list with bits and pieces thrown all over the place. I believe by doing the code above you cause it to allocate in a continuous block but i'm not sure.
I worry that the abstract nature of Swift arrays is something that is wasting a lot of precious processing time. As you can see by my above conditions I dont need any of the fancy appending, or safety features of Swift arrays. I just need it simple and fast.
My question is: In this scenario should I be using some other form of array? NSArray, somehow get a C-style array going, create my own data type?
Im looking into thread safety, would a different array type that was more thread safe such as NSArray be any slower?
Note that your requirements are contradictory, particularly #2 and #7. You can't operate on them with += and also say they will never change size. "I always calculate the index myself" also doesn't make sense. What else would calculate it? The requirements for things you will hand to glBuffer are radically different than the requirements for things that will hold objects.
If you construct the Array the way you say, you'll get contiguous memory. If you want to be absolutely certain that you have contiguous memory, use a ContiguousArray (but in the vast majority of cases this will give you little to no benefit while costing you complexity; there appear to be some corner cases in the current compiler that give a small advantage to ContinguousArray, but you must benchmark before assuming that's true). It's not clear what kind of "abstractness" you have in mind, but there's no secrets about how Array works. All of stdlib is open source. Go look and see if it does things you want to avoid.
For certain kinds of operations, it is possible for other types of data structures to be faster. For instance, there are cases where a dispatch_data is better and cases where a regular Data would be better and cases where you should use a ManagedBuffer to gain more control. But in general, unless you deeply know what you're doing, you can easily make things dramatically worse. There is no "is always faster" data structure that works correctly for all the kinds of uses you describe. If there were, that would just be the implementation of Array.
None of this makes sense to pursue until you've built some code and started profiling it in optimized builds to understand what's going on. It is very likely that different uses would be optimized by different kinds of data structures.
It's very strange that you ask whether you should use NSArray, since that would be wildly (orders of magnitude) slower than Array for dealing with very large collections of numbers. You definitely need to experiment with these types a bit to get a sense of their characteristics. NSArray is brilliant and extremely fast for certain problems, but not for that one.
But again, write a little code. Profile it. Look at the generated assembler. See what's happening. Watch particularly for any undesired copying or retain counting. If you see that in a specific case, then you have something to think about changing data structures over. But there's no "use this to go fast." All the trade-offs to achieve that in the general case are already in Array.
First, I understand the difference between value and reference types -this isn't that question. I am rewriting some of my code in Swift, and decided to also refactor some of the classes. Therefore, I thought I would see if some of the classes make sense as structs.
Memory: I have some model classes that hold very large arrays, that are constantly growing in size (unknown final size), and could exist for hours. First, are there any guidelines about a suggested or absolute size for a struct, since it lives on the stack?
Refactoring Use: Since I'm refactoring what right now is a mess with too much dependency, I wonder how I could improve on that. The views and view controllers are mostly easily, it's my model, and what it does, that's always left me wishing for better examples to follow.
WorkerManager: Singleton that holds one or two Workers at a time. One will always be recording new data from a sensor, and the other would be reviewing stored data. The view controllers get the Worker reference from the WorkerManager, and ask the Worker for the data to be displayed.
Worker: Does everything on a queue, to prevent memory access issues (C array pointers are constantly changing as they grow). Listening: The listening Worker listens for new data, sends it to a Processor object (that it created) that cleans up the data and stores it in C arrays held by the Worker. Then, if there is valid data, the Worker tells the Analyzer (also owned by the worker) to analyze the data and stores it in other C arrays to be fed to views. Both the Processor and Analyzer need state to know what has happened in the past and what to process and analyze next. The pure raw data is stored in a separate Record NSManaged object. Reviewer Takes a Record and uses the pure raw data to recreate all of the analyzed data so that it can be reviewed. (analyzed data is massive, and I don't want to store it to disk)
Now, my second question is, could/should Processor and Analyzer be replaced with structs? Or maybe protocols for the Worker? They aren't really "objects" in the normal sense, just convenient groups of related methods and the necessary state. And since the code is nearly a thousand lines for each, and I don't want to put it all in one class, or even the same file.
I just don't have a good sense of how to remove all of my state, use pure functions for all of the complex mathematical operations that are performed on the arrays, and where to put them.
While the struct itself lives on the stack, the array data lives on the heap so that array can grow in size dynamically. So even if you have an array with million items in it and pass it somewhere, none of the items are copied until you change the new array due to the copy-on-write implementation. This is described in details in 2015 WWDC Session 414.
As for the second question, I think that 2015 WWDC Session 414 again has the answer. The basic check that Apple engineers recommend for value types are:
Use a value type when:
Comparing instance data with == makes sense
You want copies to have independent state
The data will be used in code across multiple threads
Use a reference type (e.g. use a class) when:
Comparing instance identity with === makes sense
You want to create shared, mutable state
So from what you've described, I think that reference types fit Processor and Analyzer much better. It doesn't seem that copies of Processor and Analyzer are valid objects if you've not created new Producers and Analyzers explicitly. Would you not want the changes to these objects to be shared?
To make this language agnostic let's pseudo code something along the lines of:
for(int i=0;i<=N;i++){
double d=0;
userDefinedObject o=new userDefinedObject();
//effectively do something useful
o.destroy();
}
Now, this may get into deeper details between Java/C++/Python etc, but:
1 - Is doing this with primitives wrong or just sort of ugly/overkill (d could be defined above, and set to 0 in each iteration if need be).
2 - Is doing this with an object actually wrong? Now,I know Java will take care of the memory but for C++ let's assume we have a proper destructor that we call.
Now - the question is quite succinct - is this wrong or just a matter of taste?
Thank you.
Java Garbage Collector will take care of any memory allocation without a reference, which means that if you instantiate on each iteration, you will allocate new memory and lose the reference to the previous one. Said this, you can conclude that the GC will take care of the non-referenced memory, BUT you also have to consider the fact that memory allocation, specifically object initialization takes time and process. So, if you do this on a small program, you're probably not going to feel anything wrong. But let say you're working with something like Bitmap, the allocation will totally own your memory.
For both cases, I'd say it is a matter of taste, but in a real life project, you should be totally sure that you need to initialize within a loop
I'm pretty sure this is a silly newbie question but I didn't know it so I had to ask...
Why do we use data structures, like Linked List, Binary Search Tree, etc? (when no dynamic allocation is needed)
I mean: wouldn't it be faster if we kept a single variable for a single object? Wouldn't that speed up access time? Eg: BST possibly has to run through some pointers first before it gets to the actual data.
Except for when dynamic allocation is needed, is there a reason to use them?
Eg: using linked list/ BST / std::vector in a situation where a simple (non-dynamic) array could be used.
Each thing you are storing is being kept in it's own variable (or storage location). Data structures apply organization to your data. Imagine if you had 10,000 things you were trying to track. You could store them in 10,000 separate variables. If you did that, then you'd always be limited to 10,000 different things. If you wanted more, you'd have to modify your program and recompile it each time you wanted to increase the number. You might also have to modify the code to change the way in which the calculations are done if the order of the items changes because the new one is introduced in the middle.
Using data structures, from simple arrays to more complex trees, hash tables, or custom data structures, allows your code to both be more organized and extensible. Using an array, which can either be created to hold the required number of elements or extended to hold more after it's first created keeps you from having to rewrite your code each time the number of data items changes. Using an appropriate data structure allows you to design algorithms based on the relationships between the data elements rather than some fixed ordering, giving you more flexibility.
A simple analogy might help to understand. You could, for example, organize all of your important papers by putting each of them into separate filing cabinet. If you did that you'd have to memorize (i.e., hard-code) the cabinet in which each item can be found in order to use them effectively. Alternatively, you could store each in the same filing cabinet (like a generic array). This is better in that they're all in one place, but still not optimum, since you have to search through them all each time you want to find one. Better yet would be to organize them by subject, putting like subjects in the same file folder (separate arrays, different structures). That way you can look for the file folder for the correct subject, then find the item you're looking for in it. Depending on your needs you can use different filing methods (data structures/algorithms) to better organize your information for it's intended use.
I'll also note that there are times when it does make sense to use individual variables for each data item you are using. Frequently there is a mixture of individual variables and more complex structures, using the appropriate method depending on the use of the particular item. For example, you might store the sum of a collection of integers in a variable while the integers themselves are stored in an array. A program would need to be pretty simple though before the introduction of data structures wouldn't be appropriate.
Sorry, but you didn't just find a great new way of doing things ;) There are several huge problems with this approach.
How could this be done without requring programmers to massively (and nontrivially) rewrite tons of code as soon as the number of allowed items changes? Even when you have to fix your data structure sizes at compile time (e.g. arrays in C), you can use a constant. Then, changing a single constant and recompiling is sufficent for changes to that size (if the code was written with this in mind). With your approach, we'd have to type hundreds or even thousands of lines every time some size changes. Not to mention that all this code would be incredibly hard to read, write, maintain and verify. The old truism "more lines of code = more space for bugs" is taken up to eleven in such a setting.
Then there's the fact that the number is almost never set in stone. Even when it is a compile time constant, changes are still likely. Writing hundreds of lines of code for a minor (if it exists at all) performance gain is hardly ever worth it. This goes thrice if you'd have to do the same amount of work again every time you want to change something. Not to mention that it isn't possible at all once there is any remotely dynamic component in the size of the data structures. That is to say, it's very rarely possible.
Also consider the concept of implicit and succinct data structures. If you use a set of hard-coded variables instead of abstracting over the size, you still got a data structure. You merely made it implicit, unrolled the algorithms operating on it, and set its size in stone. Philosophically, you changed nothing.
But surely it has a performance benefit? Well, possible, although it will be tiny. But it isn't guaranteed to be there. You'd save some space on data, but code size would explode. And as everyone informed about inlining should know, small code sizes are very useful for performance to allow the code to be in the cache. Also, argument passing would result in excessive copying unless you'd figure out a trick to derive the location of most variables from a few pointers. Needless to say, this would be nonportable, very tricky to get right even on a single platform, and liable to being broken by any change to the code or the compiler invocation.
Finally, note that a weaker form is sometimes done. The Wikipedia page on implicit and succinct data structures has some examples. On a smaller scale, some data structures store much data in one place, such that it can be accessed with less pointer chasing and is more likely to be in the cache (e.g. cache-aware and cache-oblivious data structures). It's just not viable for 99% of all code and taking it to the extreme adds only a tiny, if any, benefit.
The main benefit to datastructures, in my opinion, is that you are relationally grouping them. For instance, instead of having 10 separate variables of class MyClass, you can have a datastructure that groups them all. This grouping allows for certain operations to be performed because they are structured together.
Not to mention, having datastructures can potentially enforce type security, which is powerful and necessary in many cases.
And last but not least, what would you rather do?
string string1 = "string1";
string string2 = "string2";
string string3 = "string3";
string string4 = "string4";
string string5 = "string5";
Console.WriteLine(string1);
Console.WriteLine(string2);
Console.WriteLine(string3);
Console.WriteLine(string4);
Console.WriteLine(string5);
Or...
List<string> myStringList = new List<string>() { "string1", "string2", "string3", "string4", "string5" };
foreach (string s in myStringList)
Console.WriteLine(s);