Manually set node size and colour ranking limits Gephi - gephi

Is there a way to manually set the minimum and maximum node size and colour ranking based on attributes such as node degree or weighted degree)? In Gephi, when I want to use this ranking, it automatically takes the minimum and maximum of the parameter (which in my case is either degree or weighted degree). I have multiple network files and I want to compare them and every network has a different min and maximum for degree and weighted degree so when I plot them, I cannot compare them because they are plotted based on min and max of every network and only relative to individual network. Is there a way to manually enter an attribute minimum and maximum values for node ranking? I am using Gephi 0.9.2 on Mac.

Hope this helps. This is for In-Degree but it generalises for all attributes

I managed to get it done. What I did is, first import the network file into Gephi and compute weighted degree or Degree statistics. Go to Data tab and look for min and maximum value of these attributes. I check such min and max values in all my networks. then I choose the lowest min value and highest max value out of all my networks. These values you will use in the next step. Next, I import a single network in Gephi, computed Degree and weighted degree statistics. Then went to Data tab and there I added manually two nodes (with any label of your own choice) and then input the min values estimated in the previous step of the required attributes to one node and max values to the second node. When ordering nodes or assigning color or setting size of nodes based on attributes, Gephi then considers those nodes into account and plot the network accordingly. Then I go to filter tab and go to partition tab and select partition by range. There I exclude the nodes based on the attribute values that I manually specified as min and max. Those nodes will disappear from your network and there you have the final network. This way, I was able to compare different networks together visually.

Related

Changing Clustering Coefficient in Cytoscape

If I change any parameter like clustering coefficient, average shortest path length, it is not showing any effect on the visualization of the network.
That is correct. If you want the network to reflect a parameter like that, you need to use an edge-weighted layout and then select that column as the weight. Note that if you manually change the value, you will need to relayout the network.
-- scooter

Do you "count" the dataset size in number of images or number of ground truth bounding boxes?

I'm currently making a custom dataset with 1 class. The images i am labeling contains several of these objects in each image (between 30-70). I therefore wonder if I should count each of the objects in each image as "1 datapoint" when evaluating the size of the dataset?
I.e: Does more objects per image require less images?
Being this a detection problem, the size of the dataset is given both by the number of images and the number of objects. There is no reason to choose one of the two because they are both equally important numbers.
If you really want to define "size" you probably have to start from the error metric. Usually for object detection mIoU (Mean Intersection over Union) is used. This metric is at the object level so it doesn't care if you have 10 or 1 million images.
Finally, it could be that having many objects per image will allow you to use a smaller number of total images, but this can only be confirmed experimentally.

Normalization of Graph

I have a Radial Distribution Graph (having x-axis ie.,[r], y-axis ie., [g(r)] values).
One of the values in the graph has g(r) high value which is around 45 and the rest have around less than 5. Hence it makes the graphs unclear to differentiate.
Since the values are high in this case, I need to normalize it.
How can i normalize the values (the y-axis [g(r)])?
I have attached the graph for visualization.
Any suggestions?

How to compute histograms using weka

Given a dataset with 23 points spread out over 6 dimensions, in the first part of this exercise we should do the following, and I am stuck on the second half of this:
Compute the first step of the CLIQUE algorithm (detection of all dense cells). Use
three equal intervals per dimension in the domain 0..100,and consider a cell as dense if it contains at least five objects.
Now this is trivial and simply a matter of counting. The next part asks the following though:
Identify a way to compute the above CLIQUE result by only using the functions of
Weka provided in the tabs of Preprocess, Classify , Cluster , or Associate .
Hint : Just two tabs are needed.
I've been trying this for over an hour now, but I can't seem to get anywhere near a solution here. If anyone has a hint, or maybe a useful tutorial which gives me a little more insight into weka it would be very much appreciated!
I am assuming you have 23 instances (rows) and 6 attributes (dimensions)
Use three equal intervals per dimension
Use pre-process tab to discretize your data to 3 equal bins. See image or command line. You use 3 bins for intervals. You may choose to change useEqualFrequency to false and true and try again. I think true may give better results.
weka.filters.unsupervised.attribute.Discretize -B 3 -M -1.0 -R first-last
After that cluster your data. This will give show you near instances. Since you would like to find dense cells. I think SOM may be appropriate.
a cell as dense if it contains at least five objects.
You have 23 instances. Therefore try for 2x2=4 cluster centers, then go for 2x3=6,2x4=8 and 3x3=9. If your data points are near. Some of the cluster centers should always hold 5 instances no matter how many cluster centers your choose.

Algorithm for variability analysis

I work with a lot of histograms. In particular, these histograms are of basecalls along segments on the human genome.
Each point along the x-axis is one of the four nitrogenous bases(A,C,T,G) that compose DNA and the y-axis represents how many times a base was able to be "called" (or recognized by a sequencer machine, so as to sequence the genome, which is simply determining the identity of each base along the genome).
Many of these histograms display roughly linear dropoffs (when the machines aren't able to get sufficient read depth) that fall to 0 or (almost-0) from plateau-like regions. When the score drops to zero, it means the sequencer isn't able to determine the identity of the base. If you've seen the double helix before, it means the sequencer can't figure out the identify of one half of a rung of the helix. Certain regions of the genome are more difficult to characterize than others. Bases (or x data points) with high numbers of basecalls, on the order of >=100, are able to be definitively identified. For example, if there were a total of 250 calls for one base, and we had 248 T's called, 1 G called, and 1 A called, we would call that a T. Regions with 0 basecalls are of concern because then we've got to infer from neighboring regions what the identity of the low-read region could be. Is there a straightforward algorithm for assigning these plots a score that reflects this tendency? See box.net/shared/nbygq2x03u for an example histo.
You could just use the count of base numbers where read depth was 0... The slope of that line could also be a useful indicator (steep negative slope = drop from plateau).

Resources