RDKit: How to change the atom label fontsize? - font-size

When drawing structures with RDKit, the atom label font size and the ring size are not in a good proportion. The labels are either too small or too large or misaligned.
Unfortunately, the documentation about this is meager. I found this:
https://rdkit.org/docs/source/rdkit.Chem.Draw.MolDrawing.html
But I don't know whether this is related and how I would have to put it together. I'm missing simple practical code examples.
I tried also Draw.MolToQPixmap, but there I experienced that of the atom labels are misaligned and so far I learnt that the reason is the difficulty to make this cross-platform consistent and furthermore Draw.MolToPixmap uses old drawing code. I should use e.g. Draw.MolToImage instead. But there similar like with Draw.MolToFile the font size is simply too small. I'm not sure whether this is a cross-platform issue as well (I'm on Win10). So, the solution would be to simply set the fontsize, but how?
I know that there is a RDKit mailing list where I asked this question already without an answer so far. Here on SO, there is maybe a broader audience and I can attach images for illustration.
Code:
from rdkit import Chem
from rdkit.Chem import Draw
smiles = ' FC1OC2N3C4[Si]5=C6B7C(C=CC6=CC4=CC2=CC1)C=CC=C7C=C5C=C3'
mol = Chem.MolFromSmiles(smiles)
img = Draw.MolToFile(mol,"Test.png",size=(300,150))
Result: (using Draw.MolToFile, alignment ok, but too small atom labels)
Result: (using Draw.MolToQPixmap, misaligned and/or font too large for small pictures)
Edit: (with the suggestion of #Oliver Scott)
I get 3 times the same output with the same fontsize. I must be a stupid mistake or misunderstanding somewhere.
Code:
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit import Chem
smiles = 'FC1OC2N3C4[Si]5=C6B7C(C=CC6=CC4=CC2=CC1)C=CC=C7C=C5C=C3'
mol = Chem.MolFromSmiles(smiles)
def drawMyMol(fname, myFontSize):
d = rdMolDraw2D.MolDraw2DCairo(350, 300)
d.SetFontSize(myFontSize)
print(d.FontSize())
d.DrawMolecule(mol)
d.FinishDrawing()
d.WriteDrawingText(fname)
drawMyMol("Test1.png", 6)
drawMyMol("Test2.png", 12)
drawMyMol("Test3.png", 24)
Result:
6.0
12.0
24.0

The newer RDKit drawing code is more flexible than these older functions. Try using the rdMolDraw2D drawing code. You can set the options for drawing as below. The documentation has a list of the available options:
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit import Chem
smiles = 'FC1OC2N3C4[Si]5=C6B7C(C=CC6=CC4=CC2=CC1)C=CC=C7C=C5C=C3'
mol = Chem.MolFromSmiles(smiles)
# Do the drawing.
d = rdMolDraw2D.MolDraw2DCairo(350, 300)
d.drawOptions().minFontSize = 22
d.DrawMolecule(mol)
d.FinishDrawing()
d.WriteDrawingText('test.png')
The default minimum font size is 12 and the max is 40.
Result:
To get into a PIL image you could do it like this:
from PIL import Image
import io
# Change the last line of the above to get a byte string.
png = d.GetDrawingText()
# Now read into PIL.
img = Image.open(io.BytesIO(png))
# Now you can do whatever you need to do with the PIL image.

Thanks to the help of #Oliver Scott, I finally got what I was looking for:
Apparently, the font size is relative (default 0.5), not absolute in points, at least in RDKit 2020.03, which I am using. Maybe this has changed in RDKit 2020.09?
Code: (to get PNGs files)
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit import Chem
smiles = 'FC1OC2N3C4[Si]5=C6B7C(C=CC6=CC4=CC2=CC1)C=CC=C7C=C5C=C3'
mol = Chem.MolFromSmiles(smiles)
def myMolToPNG(fname, myFontSize):
d = rdMolDraw2D.MolDraw2DCairo(350, 300)
d.SetFontSize(myFontSize)
d.DrawMolecule(mol)
d.FinishDrawing()
d.WriteDrawingText(fname)
myMolToPNG("Test1.png", 0.5)
myMolToPNG("Test2.png", 1.0)
myMolToPNG("Test3.png", 1.5)
Result:
Code: (to get a QPixmap, e.g for a PyQt QTableWidget)
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit import Chem
from PyQt5.QtGui import QPixmap
smiles = 'FC1OC2N3C4[Si]5=C6B7C(C=CC6=CC4=CC2=CC1)C=CC=C7C=C5C=C3'
mol = Chem.MolFromSmiles(smiles)
def myMolToQPixmap(myFontSize):
d = rdMolDraw2D.MolDraw2DCairo(350, 300)
d.SetFontSize(myFontSize)
d.DrawMolecule(mol)
d.FinishDrawing()
png = d.GetDrawingText()
pixmap = QPixmap()
pixmap.loadFromData(png)
return pixmap

You can use SetPreferCoordGen and Compute2DCoords.
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import rdDepictor
rdDepictor.SetPreferCoordGen(True)
smiles = 'FC1OC2N3C4[Si]5=C6B7C(C=CC6=CC4=CC2=CC1)C=CC=C7C=C5C=C3'
mol = Chem.MolFromSmiles(smiles)
rdDepictor.Compute2DCoords(mol)
PILmol = Draw.MolToImage(mol, size=(300,150))
You get this PIL Image
Works in 2020.09, but I did not test it in 2020.03.

Related

Image recognition difficulties with OCR - reading numbers from a picture

I am trying to develop a python script which can read numbers from pictures, to be more exact I am trying to get the gas consumption. The numbers' locations are always the same. There are two "types" of pics, bright and dark. (I am taking photos every 10 mins so I have a lot of examples if needed)
I would like to get as a result 8 digits. e.g. 10974748 (from the dark pic)
I am mainly using Pytesseract and OpenCV2.
So far the best solution seemes to be that first I crop the needed part of the picture than I use pytesseract.image_to_string() with config = --psm 7. But unfortunately it is really not a reliable solution, it can not recognize the same digit combinations when there were no consumption but photos were taken.
import cv2
import numpy as np
import os
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
directory = r"C:\Users\user\Desktop\test_pcs\test"
for image in os.listdir(directory):
OriginalImagePath = os.path.join(directory, image)
OriginalImage = cv2.imread(OriginalImagePath)
x_start, y_start = int(1110), int(445)
x_end, y_end = int(1690), int(520)
cropped_image = OriginalImage[y_start:y_end, x_start:x_end]
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
cv2.imshow("Cropped", cropped_image)
cv2.waitKey(0)
print(text + " " + OriginalImagePath)
cv2.destroyAllWindows()
After that I tried using thresholding, but sadly I get worse results than with the simple image_to_string. Adaptive thresholding gives an output image which seems not that bad but tesseract can't read it.
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
img = cv.imread(r"C:\Users\user\Desktop\test_pcs\new2\2022-10-30_14-49-30.jpg",0)
img = cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
#'Adaptive Mean Thresholding'
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
cv.THRESH_BINARY,11,2)
#'Adaptive Gaussian Thresholding'
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv.THRESH_BINARY,11,2)
images = [img, th2, th3]
for i in range(3):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.show()
x_start, y_start = int(1110), int(450)
x_end, y_end = int(1690), int(520)
cropped_image = th2[y_start:y_end, x_start:x_end]
plt.imshow(cropped_image,'gray')
text = (pytesseract.image_to_string(cropped_image, config="--psm 7 outputbase digits"))
print("digits: " + text)
I also tried to read the digits character by character but it failed as well.
Now I am trying to get better pictures somehow but the options are quite limited.
I would be greateful for any suggestions as I am doing this for my thesis.

Difference between example Acrobot plant A matrix and standard form

In section 3.4.1 of the Underactuated Robotics notes (https://underactuated.mit.edu/acrobot.html#section4), the manipulator equations are linearized around a fixed point and the matrix A_lin is derived.
While verifying the linearization of my own attempt at making an acrobot, I used the python notebook provided in Example 3.5 (LQR for the Acrobot and Cart-pole) to obtain the A matrix of the linearized Acrobot (Plant from the Examples module). I did this by simply adding 'print(linearized_acrobot.A())' on line 21 of the LQR for Acrobot block. Interestingly, I noticed that the bottom right 2x2 block is nonzero, which is different from the form derived in the notes. What is the reason behind the difference? For convenience I'll leave the code below:
import matplotlib.pyplot as plt
import mpld3
import numpy as np
from IPython.display import HTML, display
from pydrake.all import (AddMultibodyPlantSceneGraph, ControllabilityMatrix,
DiagramBuilder, Linearize, LinearQuadraticRegulator,
MeshcatVisualizerCpp, Parser, Saturation, SceneGraph,
Simulator, StartMeshcat, WrapToSystem)
from pydrake.examples.acrobot import (AcrobotGeometry, AcrobotInput,
AcrobotPlant, AcrobotState)
from pydrake.solvers.mathematicalprogram import MathematicalProgram, Solve
from underactuated import FindResource, running_as_notebook
from underactuated.meshcat_cpp_utils import MeshcatSliders
from underactuated.quadrotor2d import Quadrotor2D, Quadrotor2DVisualizer
if running_as_notebook:
mpld3.enable_notebook()
def UprightState():
state = AcrobotState()
state.set_theta1(np.pi)
state.set_theta2(0.)
state.set_theta1dot(0.)
state.set_theta2dot(0.)
return state
def acrobot_controllability():
acrobot = AcrobotPlant()
context = acrobot.CreateDefaultContext()
input = AcrobotInput()
input.set_tau(0.)
acrobot.get_input_port(0).FixValue(context, input)
context.get_mutable_continuous_state_vector()\
.SetFromVector(UprightState().CopyToVector())
linearized_acrobot = Linearize(acrobot, context)
print(linearized_acrobot.A())
print(
f"The singular values of the controllability matrix are: {np.linalg.svd(ControllabilityMatrix(linearized_acrobot), compute_uv=False)}"
)
acrobot_controllability()
Great question. The AcrobotPlant in Drake has default parameters which include some joint friction, which leads to non-zero elements in the bottom corner. If you amend your code with
acrobot = AcrobotPlant()
context = acrobot.CreateDefaultContext()
params = acrobot.get_mutable_parameters(context)
print(params)
params.set_b1(0)
params.set_b2(0)
then the bottom-right 2x2 elements of the linearized A are zero as expected.

pyqtgraph LUT histogram element how to apply same transform to the numpy array separately

I made a GUI to edit an image(16 bit grayscale) , everything looks good in the GUI but I need to repeat a step the GUI does for me on my own, I used pyqtgraph... the imageview widget provides a histogram feature
if I move the yellow bars, I can change the maximum and minimum intensity range, in this case from 1500 to 10000 would make the image visible in this case.
I need to repeat that step of processing the image without using using the GUI,I took a look at the source code, and it mentions a look up table(LUT) to perform the calculation, yet I didn't comprehend the code enough to find where that step is being down and trying to implement it myself.
any help on how to apply a Look up table transformation to a 16 bit image would be helpful
import sys
import cv2
import numpy as np
import pyqtgraph as pg
from PyQt5.QtCore import *
from PyQt5.QtGui import *
from PyQt5.QtWidgets import *
import pco
from PyQt5.QtWidgets import QScrollArea
import time
class MainWindow(QWidget):
def __init__(self):
super().__init__()
self.initUI()
def initUI(self):
img_tif = cv2.imread("my_file.tif",cv2.IMREAD_ANYDEPTH)
img_tifr = cv2.rotate(img_tif, cv2.ROTATE_90_COUNTERCLOCKWISE)
img = np.asarray(img_tifr)
self.image = pg.image()
self.image.getHistogramWidget().setLevels(0,50000)
self.image.ui.menuBtn.hide()
self.image.ui.roiBtn.hide()
self.image.setImage(img)
def main():
app = QApplication(sys.argv)
main_window = MainWindow()
app.exec_()
sys.exit(0)
if __name__ == '__main__':
main()
I ended up finding an answer, by following this
How to convert a 16 bit to an 8 bit image in OpenCV?
hope it helps anyone else

tesseract not able to read all digits accurately

I'm using Tesseract to recognize numbers from images of a screen taken with a phone camera. I've done some preprocessing of the image: processed image, and using Tesseract, I'm able to get some mixed results. Using the following code on the above images, I get the following output: "EOE". However, with this image, processed image, I get an exact match: "39:45.8"
import cv2
import pytesseract
from PIL import Image, ImageEnhance
from matplotlib import pyplot as plt
orig_name = "time3.jpg";
image_name = "time3_.jpg";
img = cv2.imread(orig_name, 0)
img = cv2.medianBlur(img, 5)
img_th = cv2.adaptiveThreshold(img, 255,\
cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY, 11, 2)
cv2.imshow('image', img_th)
cv2.waitKey(0)
cv2.imwrite(image_name, img_th)
im = Image.open(image_name)
time = pytesseract.image_to_string(im, config = "-psm 7")
print(time)
Is there anything I can do to get more consistent results?
I did three additional things to get it correct for the first Image.
You can set a whitelist for Tesseract. In your case we know that
there will only charachters from this List 01234567890.:. This
improves the accuracy significantly.
I resized the image to make it easier for tesseract.
I switched from psm mode 7 to 11 (Recoginze as much as possible)
Code:
import cv2
import pytesseract
from PIL import Image, ImageEnhance
orig_name = "./time1.jpg";
img = cv2.imread(orig_name)
height, width, channels = img.shape
imgResized = cv2.resize(img, ( width*3, height*3))
cv2.imshow("img",imgResized)
cv2.waitKey()
im = Image.fromarray(imgResized)
time = pytesseract.image_to_string(im, config ='--tessdata-dir "/home/rvq/github/tesseract/tessdata/" -c tessedit_char_whitelist=01234567890.: -psm 11 -oem 0')
print(time)
Note:
You can use Image.fromarray(imgResized) to convert an opencv image to a PIL Image. You don't have to write to disk and read it again.

Healpy plotting: How do i make a figure with subplots using the healpy.mollview projection?

I've just recently started trying to use healpy and i can't figure out how to make subplots to contain my maps. I have a thermal emission map of a planet as function of time and i need to look at it at several moments in time (lets say 9 different times) and superimpose some coordinates, to check that my planet is rotating the right way.
So far, i can do 2 things.
Make 9 different figures with the superimposed coordinates.
Make a figure with 9 subplots containing 9 different maps but that superimposes all of my coordinates on all of my subplots, instead of just the time-appropriate ones.
I'm not sure if this is a very simple problem but it's been driving me crazy and i cant find anything that works.
I'll show you what i mean:
OPTION 1:
import healpy as hp
import matplolib.pyplot as plt
MAX = 10**(23)
MIN = 10**10
for i in range(9):
t = 4000+10*i
hp.visufunc.mollview(Fmap_wvpix[t,:],
title = "Map at t="+str(t), min = MIN, max=MAX))
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),1 ],
d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),2],
'k*',markersize = 6)
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),1 ],
d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),2],
'r*',markersize = 6)
This makes 9 figures that look pretty much like this :
Flux map superimposed with some stars at time = t
But i need a lot of them so i want to make an image that contains 9 subplots that look like the image.
OPTION 2:
fig = plt.figure(figsize = (10,8))
for i in range(9):
t = 4000+10*i
hp.visufunc.mollview(Fmap_wvpix[t,:],
title = "Map at t="+str(t), min = MIN, max=MAX,
sub = int('33'+str(i+1)))
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),1 ],
d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),2],
'k*',markersize = 6)
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),1 ],
d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),2],
'r*',markersize = 6)
This gives me subplots but it draws all the projplot stars on all of my subplots! (see following image)
Subplots with too many stars
I know that i need a way to call the axes that has the time = t map and draw the stars for time = t on the appropriate map, but everything i've tried so far has failed. I've mostly tried to use projaxes thinking i can define a matplotlib axes and draw the stars on it but it doesnt work. Any advice?
Also, i would like to draw some lines on my map as well but i cant figure out how to do that either. The documentation says projplot but it won't draw anyting if i don't tell it i want a marker.
PS: This code is probably useless to you as it won't work if you don't have my arrays. Here's a simpler version that should run:
import numpy as np
import healpy as hp
import matplotlib.pyplot as plt
NSIDE = 8
m = np.arange(hp.nside2npix(NSIDE))*1
MAX = 900
MIN = 0
fig = plt.figure(figsize = (10,8))
for i in range(9):
t = 4000+10*i
hp.visufunc.mollview(m+100*i, title = "Map at t="+str(t), min = MIN, max=MAX,
sub = int('33'+str(i+1)))
hp.visufunc.projplot(1.5,0+30*i, 'k*',markersize = 16)
So this is supposed to give me one star for each frame and the star is supposed to be moving. But instead it's drawing all the stars on all the frames.
What can i do? I don't understand the documentation.
If you want to have healpy plots in matplotlib subplots, the following would be the way to go. The key is to use plt.axes() to select the active subplot and to use the hold=True keyword in the healpy functions.
import healpy as hp
import numpy as np
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(ncols=2)
plt.axes(ax1)
hp.mollview(np.random.random(hp.nside2npix(32)), hold=True)
plt.axes(ax2)
hp.mollview(np.arange(hp.nside2npix(32)), hold=True)
I have just encountered this question looking for a solution to the same problem, but managed to find it from the documentation of mollview (here).
As you notice there, they say that 'sub' received the same syntax as the function subplot (from matplotlib). This format is:
( # of rows, # of columns, # of current subplot)
E.g. to make your plot, the value sub wants to receive in each iteration is
sub=(3,3,i)
Where i runs from 1 to 9 (3*3).
This worked for me, I haven't tried this with your code, but should work.
Hope this helps!

Resources