Batch processing of micasense images for extracting vegetation indices in excel file - image-processing

import numpy as np
import rasterio
from rasterio.plot import show
b3 ='D:\Rice\Day_1\Micasense\T1\S1\IMG_0370_3.tif')
b4 ='D:\Rice\Day_1\Micasense\T1\S1\IMG_0370_4.tif')
red ='float64')
nir ='float64')
np.seterr(divide='ignore', invalid='ignore')
check = np.logical_or ( red > 0, nir > 0 )
ndvi = np.where ( check, (nir - red ) / ( nir + red ), -999 )
array([[0.52380952, 0.56526611, 0.53551913, ..., 0.44277822, 0.4546773 ,
[0.51214361, 0.57994723, 0.5954023 , ..., 0.4851632 , 0.48997135,
[0.48920086, 0.54266958, 0.56402621, ..., 0.48092745, 0.45281639,
[0.15959253, 0.40617935, 0.42018072, ..., 0.30028599, 0.22739726,
[0.30884808, 0.40158616, 0.41691395, ..., 0.23501427, 0.25222552,
[0.40364188, 0.45576208, 0.45708304, ..., 0.32093023, 0.32592593,
import datetime
import pyexcel as pe
data =[0.52380952], [0.56526611], [0.53551913], [0.44277822], [0.4546773,]
pe.save_as(array=data, dest_file_name="P (21).csv")
import pyexcel
pyexcel.save_as(array=ndvi,dest_file_name="D:\Rice\Day_1\T (1)\S_1\P (21).csv",dest_delimiter=':')
[At first, I wanna mention that micasense was taken from a ground-based structure not using UAV. I used the above code to extract the vegetation index(NDVI) from a single image set(one red and one NIR) image. I have 21 sets like this in a folder. and Now I need
to process the image sets in batch
extracted vegetation values separately and save them in an excel file and if possible I need only the mean of filtered values.
for example only the mean of that values above 0.3 or something like that.
Thanks and Regards
Haque Md Asrakul]


Error with Specs parameter of Plotly Subplots

I am getting Value Error:
The 'specs' argument to make_subplots must be a 2D list of dictionaries with dimensions (1 x 1).
Received value of type <class 'list'>: [[{'secondary_y': False}], [{'secondary_y': True}], [{'colspan': 1}, None]]
I refer to the existing post plotly subplots issue with specs, value error and followed the same but error still persists.
Below is the code snippet:
import talib as ta
import yfinance as yf
import pandas as pd
import as pio
import plotly.graph_objects as go
from plotly.subplots import make_subplots
Extracting the data
VIP = yf.Ticker('VIPIND.NS')
df = VIP.history(period="max")
df.reset_index(inplace = True)
df['Date'] = pd.to_datetime(df['Date'])
Creating the technical indicators
df['EMA_Close'] = ta.EMA(df.Close,100)
df['MA_Close'] = ta.MA(df.Close,60)
Creating Plots
Declaring subplots
fig = make_subplots(rows=2, cols=1)#, shared_xaxes=True,print_grid=True)
fig = make_subplots(specs=[[{"secondary_y": False}],[{"secondary_y": True}],[{"colspan": 1}, None]])
Ploting the first row with OHLC, EMA and MA lines
fig.add_trace(go.Candlestick(x=df["Date"], open=df["Open"], high=df["High"],
low=df["Low"], close=df["Close"], name="OHLC",showlegend=True),
row=1, col=1,secondary_y=False)
fig.add_trace(go.Scatter(x=df['Date'], y=df['EMA_Close'], showlegend=True,
name="EMA Close",line=dict(color="MediumPurple")
), row=1, col=1,secondary_y=False)
fig.add_trace(go.Scatter(x=df['Date'], y=df['MA_Close'], showlegend=True,
name="MA Close",line=dict(color="Orange")
), row=1, col=1,secondary_y=False)
Ploting the second row with MACD & MACDSig lines and MACDHist as histogram/bar
y=df['MACDhist'],showlegend=True,name="MACD Hist",marker=dict(color='black')
), row=2, col=1,secondary_y=False)
fig.add_trace(go.Scatter(x=df['Date'], y=df['MACDsig'], showlegend=True,
name="MACD Signal",line=dict(color="MediumPurple")
), row=2, col=1,secondary_y=True)
fig.add_trace(go.Scatter(x=df['Date'], y=df['MACD'], showlegend=True,
), row=2, col=1,secondary_y=True)
Upadting the layout of the plot
fig.update_layout(height=600, width=1250)
title='OHLC and Volume',
yaxis_title='Prices (Rs)',
margin=dict(l=20, r=20, t=40,b=20),)
# Providing desired Fonts for the plots
font_family="Courier New",
title_font_family="Times New Roman",
Requesting guidance on where am I going wrong.
You are getting the error because the dimension of your specs does not match the number of rows and cols you defined in your subplot. You have 2 rows and 1 col, which means your specs must be a list with 2x1 shape (i.e. a list of two lists. Here is an example:
specs=[[{"secondary_y": True, "colspan": X, "rowspan": X, "b": 0.05, etc}] ,
[{"secondary_y": False}]]).
Also, keep in mind the max value that colspan can take is the value you define for the col parameter. Finally, if you need to pass more settings for each subplot you can simply add them inside their corresponding dictionary

"LightGBMError: Do not support special JSON characters in feature name"

My 'X' data is a pandas data frame of time-series. I extracted features of X data using Tsfresh and try to apply LightGBM algorithm to classify the data into 0(Bad) and 1(Good). But it shows an error. Columns of my X data are`
'0__cwt_coefficients__coeff_1__w_20__widths(2, 5, 10, 20)',
'0__cwt_coefficients__coeff_1__w_10__widths(2, 5, 10, 20)',
'0__quantile__q_0.4', '0__fft_coefficient__attr"imag"coeff_39',
'0__cwt_coefficients__coeff_13__w_10__widths(2, 5, 10, 20)',
'0__fft_coefficient__attr"angle"_coeff_92', '0__maximum',
dtype='object', length=225)
My code is
import lightgbm as lgb
d_train = lgb.Dataset(X_train, label=y_train)
lgbm_params = {'learning_rate':0.05, 'boosting_type':'dart',
'metric':['auc', 'binary_logloss'],
clf = lgb.train(lgbm_params, d_train, 50)
for i in range(0, X_test.shape[0]):
if y_pred_lgbm[i]>=.5:
cm_lgbm = confusion_matrix(y_test, y_pred_lgbm)
sns.heatmap(cm_lgbm, annot=True)
I tried below code to change my columns but it does not work.
import re
X = X.rename(columns = lambda u:re.sub('[^A-Za-z0-9_]+', '', u))
After applying that rename function the columns looks as below
'0__quantile__q_04', '0__fft_coefficient__attr_imag__coeff_39',
'0__fft_coefficient__attr_angle__coeff_92', '0__maximum',
dtype='object', length=225)
What should I do to get rid of this error?
u cant put like '_' these kind of symbol in column names or the lgb will report this kind of error

How to fine tune a masked language model?

I'm trying to follow the huggingface tutorial on fine tuning a masked language model (masking a set of words randomly and predicting them). But they assume that the dataset is in their system (can load it with from datasets import load_dataset; load_dataset("dataset_name")). However, my input dataset is a long string:
text = "This is an attempt of a great example. "
dataset = text * 3000
I followed their approach and tokenized each it:
from transformers import AutoTokenizer
from transformers import AutoModelForMaskedLM
import torch
from transformers import DataCollatorForLanguageModeling
model_checkpoint = "distilbert-base-uncased"
model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
def tokenize_long_text(tokenizer, long_text):
individual_sentences = long_text.split('.')
tokenized_sentences_list = tokenizer(individual_sentences)['input_ids']
tokenized_sequence = [x for xs in tokenized_sentences_list for x in xs]
return tokenized_sequence
tokenized_sequence = tokenize_long_text(tokenizer, long_text)
Following by chunking it into equal length segments:
def chunk_long_tokenized_text(tokenizer_text, chunk_size):
# Compute length of long tokenized texts
total_length = len(tokenizer_text)
# We drop the last chunk if it's smaller than chunk_size
total_length = (total_length // chunk_size) * chunk_size
return [tokenizer_text[i : i + chunk_size] for i in range(0, total_length, chunk_size)]
chunked_sequence = chunk_long_tokenized_text(tokenized_sequence, 30)
Created a data collator for random masking:
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15) # expects a list of dicts, where each dict represents a single chunk of contiguous text
Example of how it works:
d = {}
d['input_ids'] = chunked_sequence[0]
>>>{'input_ids': [101,
for chunk in data_collator([ d ])["input_ids"]:
print(f"\n'>>> {tokenizer.decode(chunk)}'")
>>>'>>> [CLS] this is a great [MASK] [SEP] [CLS] this is a great [MASK] [SEP] [CLS] this is a great [MASK] [SEP] [CLS] this is a great [MASK] [SEP] [CLS] this'
However, the remaining steps (which I believe is just the training component) seem to only work using their trainer method, which can only take their dataset.
How can this work with a dataset in the form of a string?

question for dask output when using dask.array.map_overlap

I would like to use dask.array.map_overlap to deal with the scipy interpolation function. However, I keep meeting errors that I cannot understand and hoping someone can answer this to me.
Here is the error message I have received if I want to run .compute().
ValueError: could not broadcast input array from shape (1070,0) into shape (1045,0)
To resolve the issue, I started to use .to_delayed() to check each partition outputs, and this is what I found.
Following is my python code.
Step 1. Load netCDF file through Xarray, and then output to dask.array with chunk size (400,400)
df = xr.open_dataset('./Brazil Sentinal2 Tile/' + data_file +'.nc')
lon, lat = df['lon'].data, df['lat'].data
slon = da.from_array(df['lon'], chunks=(400,400))
slat = da.from_array(df['lat'], chunks=(400,400))
data = da.from_array(df.isel(band=0), chunks=(400,400))
Step 2. declare a function for da.map_overlap use
def sumsum2(lon,lat,data, hex_res=10):
hex_col = 'hex' + str(hex_res)
lon_max, lon_min = lon.max(), lon.min()
lat_max, lat_min = lat.max(), lat.min()
b = box(lon_min, lat_min, lon_max, lat_max, ccw=True)
b = transform(lambda x, y: (y, x), b)
b = mapping(b)
target_df = pd.DataFrame(h3.polyfill( b, hex_res), columns=[hex_col])
target_df['lat'] = target_df[hex_col].apply(lambda x: h3.h3_to_geo(x)[0])
target_df['lon'] = target_df[hex_col].apply(lambda x: h3.h3_to_geo(x)[1])
tlon, tlat = target_df[['lon','lat']].values.T
abc = lNDI(points=(lon.ravel(), lat.ravel()),
values= data.ravel())(tlon,tlat)
target_df['out'] = abc
print(np.stack([tlon, tlat, abc],axis=1).shape)
return np.stack([tlon, tlat, abc],axis=1)
Step 3. Apply the da.map_overlap
b = da.map_overlap(sumsum2, slon[:1200,:1200], slat[:1200,:1200], data[:1200,:1200], depth=10, trim=True, boundary=None, align_arrays=False, dtype='float64',
Step 4. Using to_delayed() to test output shape
print(b.to_delayed().flatten()[0].compute().shape, )
(1065, 3)
(1045, 0)
(1090, 3)
(1070, 0)
which is saying that the output from da.map_overlap is only outputting 1-D dimension ( which is (1045,0) and (1070,0) ), while in the da.map_overlap, the output I am preparing is 2-D dimension ( which is (1065,3) and (1090,3) ).
In addition, if I turn off the trim argument, which is
c = da.map_overlap(sumsum2,
print(c.to_delayed().flatten()[0].compute().shape, )
The output becomes
(1065, 3)
(1065, 3)
(1090, 3)
(1090, 3)
This is saying that when trim=True, I cut out everything?
#-- print out the values
(1065, 3)
array([], shape=(1045, 0), dtype=float64)
#-- print out the values
array([[ -47.83683837, -18.98359832, 1395.01848583],
[ -47.8482856 , -18.99038681, 2663.68391094],
[ -47.82800624, -18.99207069, 1465.56517187],
[ -47.81897323, -18.97919009, 2769.91556363],
[ -47.82066663, -19.00712956, 1607.85927095],
[ -47.82696896, -18.97167714, 2110.7516765 ],
[ -47.81562653, -18.98302933, 2662.72112163],
[ -47.82176881, -18.98594465, 2201.83205114],
[ -47.84567 , -18.97512514, 1283.20631652],
[ -47.84343568, -18.97270783, 1282.92117225]])
Any thoughts for this?
Thank You.
I guess I got the answer. Please let me if I am wrong.
I am not allowing to use trim=True is because I change the shape of output array (after surfing the internet, I notice that the shape of output array should be the same with the shape of input array). Since I change the shape, the dask has no idea how to deal with it so it returns the empty array to me (weird).
Instead of using trim=False, since I didn't ask cutting-out the buffer zone, it is now okay to output the return values. (although I still don't know why the dask cannot concat the chunked array, but believe is also related to shape)
The solution is using delayed function on da.concatenate, which is
delayed(da.concatenate)([e.to_delayed().flatten()[idx] for idx in range(len(e.to_delayed().flatten()))])
In this case, we are not relying on the concat function in map_overlap but use our own concat to combine the outputs we want.

Convert multi class classification to binary image classification

I would like to convert a multi-class image to binary, to result in two bati / non-bati classes for that I started by writing this little code, but I don't know how to assign the value 0 to my classes ( 2,3,4,5,6) and the value 1 to class 1.
here is my code
import georasters as gr
import pandas as pd
img = gr.from_file('D:/Thèse/Partie 2/ZMU/filtre_classification_2018_7classes_fin.tif')
img =img.loc[:,['value','x','y']]
img = img.rename(columns = {'value':'class', 'x':'Latitude', 'y':"Longitude"})
#df['class'] = df['class'].map({'first_element':1, 'second_element':2,'third_element':3})
df = pd.DataFrame([img], columns=["1","3","5","7","9","11"])
You could simply use df.loc to replace your classes
This should set everything with class=1 to 0 and everything else to 1.
