How to put Scikit-image Segmentation result into a GeoJSON format with the New Zealand Transverse Mercator coordinate - geojson

I am running SLIC Superpixel segmentation using skimage. I would like to get the segment results as a geojson file with associated coordinate NZTM.
My original imagery has the NZTM - New Zealand Transverse Mercator. How to do this?
Below are my script
import arcpy
import skimage
import skimage.io as skio
from skimage.segmentation import slic
import matplotlib.pyplot as plt
import numpy as np
from skimage.color import rgb2gray
from skimage.filters import sobel
from skimage.segmentation import felzenszwalb, slic, quickshift, watershed
from skimage.segmentation import mark_boundaries
from skimage.util import img_as_float
#############Load the image
im1 = skio.imread(r'C:\Data\SLICO\clip1.png')
segments_slic = slic(im1, n_segments=400, compactness=10, sigma=1,
start_label=1)
plt.imshow(mark_boundaries(im1, segments_slic))
plt.show()
Thank you for your help.

Related

Clustering on 97 features of categorical data

I am trying to apply unsupervised learning on a data with 97 features and around 6500 rows/samples. All features have discrete data (mostly from 1-10) with some being binary (0/1). What are some of the best clustering algorithms to apply on this data. Thank You!
It's impossible to say which clustering algo will perform best on your given dataset. You just have to try several methodologies and inspect the final results that you get. Here are several clustering algos that you can try.
https://github.com/ASH-WICUS/Notebooks/blob/master/Clustering%20Algorithms%20Compared.ipynb
Here is a small sample.
import statsmodels.api as sm
import numpy as np
import pandas as pd
mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data
df_cars = pd.DataFrame(mtcars)
df_cars.head()
from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import KMeans
from matplotlib import pyplot
# define dataset
X = df_cars[['mpg','hp']]
# define the model
model = KMeans(n_clusters=8)
# fit the model
model.fit(X)
# assign a cluster to each example
yhat = model.predict(X)
X['kmeans']=yhat
pyplot.scatter(X['mpg'], X['hp'], c=X['kmeans'], cmap='rainbow', s=50, alpha=0.8)
# plot X & Y coordinates and color by cluster number
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df_cars, x="hp", y="mpg", color="kmeans", size='mpg', hover_data=['kmeans'])
fig.show()

How does the training happen in LBPH recognizer in open-cv?

import os
import cv2
import numpy as np
from PIL import Image
class training:
def train():
reconizer = cv2.face.LBPHFaceRecognizer_create()
\\ some code part
reconizer.train(faces,Id) \\\\ <----- this line
reconizer.save('reconizer/traindata.yml')
cv2.destroyAllWindows()
What is exactly getting trained in this step. is it some machine learning algorithm going behind or else ?

How to convert results of Scikit-image Segmentation into a .shp format

I am running SLIC Superpixel segmentation using skimage. I would like to get the segment results as a .shp file format with associated coordinate NZTM. The segmentation result is numpy.ndarray with 2 dimensions.
My original imagery has the NZTM - New Zealand Transverse Mercator. How to do this?
Below are my script
import arcpy
import skimage
import skimage.io as skio
from skimage.segmentation import slic
from PIL import Image
import matplotlib.pyplot as plt
from shapely.geometry import shape, Point, Polygon, LineString
import numpy as np
from skimage.color import rgb2gray
from skimage.filters import sobel
from skimage.segmentation import slic
from skimage.segmentation import mark_boundaries
from skimage.util import img_as_float
#############Load the image
im1 = skio.imread(r'C:\Data\Deep_Learning\SLICO\kang1019_clip1.png')
segments_slic = slic(im1, n_segments=400, compactness=10, sigma=1,
start_label=1)
Thank you for your help.

How can I use dask_ml preprocessing in a dask distributed cluster

How can I do dask_ml preprocessing in a dask distributed cluster? My dataset is about 200GB and Every time I categorize the dataset preparing for OneHotEncoding, it looks like dask is ignoring the client and try to load the dataset in the local machine's memory. Maybe I miss something:
from dask_ml.preprocessing import Categorizer, DummyEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
import pandas as pd
import dask.dataframe as dd
df = dd.read_csv('s3://some-bucket/files*.csv', dtypes={'column': 'category'})
pipe = make_pipeline(
Categorizer(),
DummyEncoder(),
LogisticRegression(solver='lbfgs')
)
pipe.fit(df, y)
Two immediate things to address:
You have not instantiated a distributed scheduler in your code.
You should probably use the LogisticRegression instance from
dask-ml rather than scikit-learn.
Working Code Example
Below is a minimal code example that works.
Note that the preprocessing functions accept only Dask Dataframes while the LogisticRegression estimator accepts only Dask arrays. You can split the pipeline or use a custom FunctionTransformer (from this answer). See this open Dask issue for more context.
from dask_ml.preprocessing import Categorizer, DummyEncoder
from dask_ml.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer
import pandas as pd
import dask.dataframe as dd
from dask.distributed import Client
client = Client()
from dask_ml.datasets import make_classification
X, y = make_classification(chunks=50)
# define custom transformers to include in pipeline
def trans_array(array):
return dd.from_array(array)
transform_array = FunctionTransformer(trans_array)
def trans_df(dataframe):
return dataframe.to_dask_array(lengths=True)
transform_df = FunctionTransformer(trans_df)
pipe = make_pipeline(
transform_array,
Categorizer(),
DummyEncoder(),
transform_df,
LogisticRegression(solver='lbfgs')
)
pipe.fit(X,y)

How to know if my data has been scaled by StandardScaler?

"I have scaled my dataset by using Standard Scaler , Now how to know it has been scaled, I am sure it has been scaled but how to see it"
As #Coderji said you can always find out the mean and standard deviation, which should be equal to 0 and 1 respectively.
However, there is another method to visualize it.
from sklearn import datasets
import numpy as np
from sklearn.preprocessing import StandardScaler
I am using iris dataset for this example.
iris = datasets.load_iris()
X = iris.data
sc = StandardScaler()
sc.fit(X)
x = sc.transform(X)
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x[:,1])
See this Output for sepel length
Similarly you can see for all variables or a simple pairplot will do the job.
This gives an idea that the data is standardised visually.

Resources