Error trying to create count labels for geom_bar - plotnine

I am trying to create a bar plot with labels using plotnine. According to the documentation, you can use label="stat(count)" in the aesthetic for the geom_text to print the position count for each bar. This is the equivalent of using the ..count.. keyword in ggplot2 in R.
python version is 3.6.7
plotnine version is 0.5.1
According to the documentation, this code should work:
import numpy as np
import pandas as pd
from plotnine import *
from plotnine.stats import *
from plotnine.data import mtcars
(ggplot(mtcars, aes('factor(cyl)', fill='factor(am)'))
+ geom_bar( position='fill')
+ geom_text(aes(label='stat(count)'), stat='count', position='fill')
)
When I try this I get this message:
PlotnineError: "Could not evaluate the 'label' mapping: 'stat(count)' (original error: name 'stat' is not defined)
If I replace the expression label='stat(count)' with label='99' the code runs and displays a correct plot, except of course all the labels are the constant value 99 not the actual counts.

I refreshed all my libraries and restarted the notebook server and now it works. I must have had a bad install somewhere.

Related

GluonTS example airpassengers dataset not found

I am trying to run the GluonTS example code, going through some struggle to install the libraries, now I get the following error:
FileNotFoundError: C:\Users\abcde\.mxnet\gluon-ts\datasets\airpassengers\test
The C:\Users\abcde\.mxnet\gluon-ts\datasets\airpassengers\ does exist but contains only train folder. Have tried reinstalling but to no avail. Any ideas how to fix this and run the example, even if finding the dataset in correct format elsewhere?
EDIT: To clarify, I was referring to an example on https://ts.gluon.ai/stable/
import matplotlib.pyplot as plt
from gluonts.dataset.util import to_pandas
from gluonts.dataset.pandas import PandasDataset
from gluonts.dataset.repository.datasets import get_dataset
from gluonts.mx import DeepAREstimator, Trainer
dataset = get_dataset("airpassengers")
deepar = DeepAREstimator(prediction_length=12, freq="M", trainer=Trainer(epochs=5))
model = deepar.train(dataset.train)
# Make predictions
true_values = to_pandas(list(dataset.test)[0])
true_values.to_timestamp().plot(color="k")
prediction_input = PandasDataset([true_values[:-36], true_values[:-24], true_values[:-12]])
predictions = model.predict(prediction_input)
for color, prediction in zip(["green", "blue", "purple"], predictions):
prediction.plot(color=f"tab:{color}")
plt.legend(["True values"], loc="upper left", fontsize="xx-large")
There was an incorrect import on the earlier version of the example, which was since corrected, also I needed to specify regenerate=True while getting the dataset, so:
dataset = get_dataset("airpassengers", regenerate=True).

Pandas time series index attribute error when using TsTables & PyTables in creating a table class

I am trying to create a table structure through tb.IsDescription class, then create a .h5 file and populate it with a Pandas Dataframe with Datetime index, using TsTables package. I have already tested the Dataframe and the date time Indexing and both seem to be fine. I believe the issue is with the TsTable package, as it remains 'Unused import statement'. The error I get is: " AttributeError: module 'pandas.tseries' has no attribute 'index' ". The reason I am using the TsTAble is that I have heard it is faster than other modules. Any suggestions how to resolve this issue, or any substitute method?
import numpy as np
import pandas as pd
import tables as tb
import datetime as dt
path = r'C:\Users\--------\PycharmProjects\pythonProject2'
no = 5000000 # number of time steps
co = 3 # number of time series
interval = 1. / (12 * 30 * 24 * 60) # the time interval as a year fraction
vol = 0.2 # volatility
rn = np.random.standard_normal((no, co))
rn[0] = 0.0 # sets the initial random numbers to zero
paths = 100 * np.exp(np.cumsum(-0.5 * vol ** 2 * interval + vol * np.sqrt(interval) * rn, axis=0))
# simulation based on an Euler discretization
paths[0] = 100 # Sets the initial values of the paths to 100
dr = pd.date_range('2019-1-1', periods=no, freq='1s')
print(dr[-6:]) # the date range appears fine
df = pd.DataFrame(paths, index=dr, columns=['ts1', 'ts2', 'ts3'])
print(df.info(verbose=True)) # df is pandas Dataframe and appears fine
print(df.head()) # tested a fraction of the data, it is fine
import tstables as tstab # I get Unused import statement
class ts_desc(tb.IsDescription):
timestamp = tb.Int64Col(pos=0) # The column for the timestamps
ts1 = tb.Float64Col(pos=1) # The column to store numerical data
ts2 = tb.Float64Col(pos=2)
ts3 = tb.Float64Col(pos=3)
h5 = tb.open_file(path + 'tstab.h5', 'w')
ts = h5.create_ts('/', 'ts', ts_desc)
ts.append(df) # !!!!! the error I get is from this code line !!!!
# value error raised is: if rows.index.__class__ != pandas.tseries.index.DatetimeIndex:
AttributeError: module 'pandas.tseries' has no attribute 'index' `
I suspect you have run into a version compatibility issue between tstables and your pandas versions (assuming you are running any recent pandas version). Based on the tstables PyPI page, the last release of tstables was in 2015. A check of the tstables github project page shows there was an issue with Pandas 0.20.3 and use of datetime. The error message is the same as yours: module 'pandas.tseries' has no attribute 'index' in tstables See this: tstables breaks down with Pandas 20.3
The issue has a link to another build that works with Pandas 0.20.3. Development notes state "Removed 'convert_datetime64' parameter on line 245". Not sure if it will work with more recent versions, but worth a try. See this: schwed2 tstables build
If that doesn't solve the problem, I suggest running the simple examples provided or run the setup tests. (Note: I could not find the bpi_2014_01.csv file to test the bitcoin/bpi example.)
Good luck.

scv.pl.proportions(): numpy.AxisError in `Cellrank` workflow

I am new to use python to anlyze scRNA-seq. I run the cellrank workflow and always found this error.
Here is my code for Cellrank:
import scvelo as scv
import scanpy as sc
import cellrank
import numpy as np
scv.settings.verbosity = 3
scv.settings.set_figure_params("scvelo")
cellrank.settings.verbosity = 2
import warnings
warnings.simplefilter("ignore", category=UserWarning)
warnings.simplefilter("ignore", category=FutureWarning)
warnings.simplefilter("ignore", category=DeprecationWarning)
adata = sc.read_h5ad('./my.h5ad') # my data
**scv.pl.proportions(adata)**
The errorcode:
Traceback (most recent call last):
File "test_cellrank.py", line 25, in <module>
**scv.pl.proportions(adata)**
...........
**numpy.AxisError: axis 1 is out of bounds for array of dimension 1**
I tried to use SeuratDisk or loom to get h5ad from a seurat object. I thought that must be some problem in this progress.
Here is the anndata object from tutorial:
>>> adata
AnnData object with n_obs × n_vars = 2531 × 27998
obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'palantir_pseudotime'
var: 'highly_variable_genes'
uns: 'clusters_colors', 'clusters_fine_colors', 'day_colors', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
obsp: 'connectivities', 'distances'
Here is mine:
>>> adata
AnnData object with n_obs × n_vars = 5443 × 18489
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'nCount_SCT', 'nFeature_SCT', 'SCT_snn_res.0.8', 'seurat_clusters', 'SCT_snn_res.0.5', 'SCT_snn_res.0.6',
'SCT_snn_res.0.7', 'S.Score', 'G2M.Score', 'Phase', 'old.ident', 'new.ident', 'nCount_MAGIC_RNA', 'nFeature_MAGIC_RNA'
var: 'SCT_features', '_index', 'features'
obsm: 'X_tsne', 'X_umap'
layers: 'SCT'
So, What packages or protocols should I follow to convert a seurat into a h5ad?
Thank you for your help!
scv.pl.proportions gives the proportion of spliced and unspliced reads in your dataset. These count tables must be added to your adata layers before you can call this function.
Your adata object does not have these layers. I think that is why you are seeing this error.
Conversion from Seurat to h5ad can be accomplished using two step process given here

Python Dask Apply Function and STore Result in Same Column

Hello i am bit new on Dask and i am trying to do the following things
i have a CSV file I am reading file everything works fine
import pandas
import os
import json
import math
import numpy as np
import dask
from dask.distributed import Client
import dask.dataframe as df
import dask.multiprocessing
client = Client(n_workers=3, threads_per_worker=4, processes=False, memory_limit='2GB')
df = df.read_csv("netflix_titles.csv")
now i have function
def toupper(x):
return x.upper()
i would like to apply this to a column now the issue is want to save the result in same column seems like i cannot do that
df["title"].map(toupper).compute()
The following line works but i want
df["title"] = df["title"].map(toupper).compute()
ValueError: Not all divisions are known, can't align partitions. Please use set_index to set the index.
Image
Maybe try this after read_csv.
df.title = df.title.map(toupper)
df.to_csv("netflix_titles.csv", index=False, single_file=True)
to_csv has a optional argument with default valuecompute=True so you don't need to explicit do compute().

Why does my python code gives the type error as the dict object is not callable when loading a list of dictionaries into a Tokenizer object?

I am trying to program a sarcasm detection model using sarcasm data set from Kaggle using Jupiter notebook. I have downloaded the dataset to my pc and have modified it as a list of dictionaries. the dictionary consists of three keys as article_link, is_sarcastic, and headline.
my code below gives the following error:
TypeError Traceback (most recent call last)
in
7 tokenizer.fit_on_texts(sentences)
8
----> 9 my_word_index=tokenizer.word_index()
10
11 print(len(word_index))
TypeError: 'dict' object is not callable
import os
import pandas
os.getcwd()
import json
os.chdir('C:/Users/IMALSHA/Desktop/AI content writing/Cousera Deep Neural Networks course/NLP lectures')
#loading data
with open('Sarcasm_Headlines_Dataset.json','r') as json_file:
data_set=json.load(json_file)
#defining lists
sentences=[]
labels=[]
urls=[]
for item in data_set:
sentences.append(item['headline'])
labels.append(item['is_sarcastic'])
urls.append(item['article_link'])
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer=Tokenizer(oov_token="<oov>")
tokenizer.fit_on_texts(sentences)
word_index=tokenizer.word_index()
print(len(word_index))
print(word_index)
sequences=tokenizer.texts_to_sequences(sentences)
paded=pad_sequences(sequences)
print(paded[2])
The problem is the following:
word_index=tokenizer.word_index()
Probably, you want to store tokenizer's word_index into word_index variable. Instead, you are calling tokenizer.word_index as if it was a method/function, but it is a dictionary.
So, I think that you have to apply the following correction:
word_index=tokenizer.word_index

Resources