Click here to Skip to main content
15,889,216 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
This is the first time I am trying to do python execution in GNU parallel.

I have the below python script. I am trying to run it in parallel with a text.txt document loading the variables. The text document has the variables one on each line.

I execute the below script with this code
parallel --bar -a PairNames.txt python3 CreateDataTablePythonScriptv2.py


Here is the python script being executed:
import sqlite3
import sys

PairName = sys.argv[1]
print(PairName)
DTBLocation = '//mnt//c//Users//Jonathan//OneDrive - Mazars in Oman//Trading//Systems//FibMatrix//Testing Trade Analysis//SQLite//Trade Analysis.db
connection = sqlite3.connect(DTBLocation)
cursor = connection.cursor()

TableName = PairName+'_DATA'
print(TableName)
cursor.execute("""CREATE TABLE IF NOT EXISTS {}
(
    Date_Time INTEGER,
    Open REAL,
    Max_60m_Box REAL

 )""".format(TableName))
connection.commit()
connection.close()



It executes correctly the first variable just fine. But the remainder of the variables do print correctly from the print command for the PairName, but for print(TableName) I get the below displays:

GBPUSD
_DATAD

USDCHF
_DATAF

NZDJPY
_DATAY


Its weird to me that it prints the PairName just fine and correctly, but then the PairName does not show up when concating the TableName.

Also, its weird that an extra letter gets added to the end of DATA for each one. It appears that the extra letter at the end of the DATA is the last letter of the input variable. I don't know why its choping the 5 letters off and how it puts it at the end of the DATA.

Sorry if this is a simple answer, but I cannot figure it out and especially since there is not too much documentation / tutorials for a newbie on GNU Parallel in this situation. Thanks for the help!

What I have tried:

I printed the tablename.
I watched this video at https://www.youtube.com/watch?v=OpaiGYxkSuQ&ab_channel=OleTange[^]
I tried moving the TableName concat to right under the PairName
I printed the type of the PairName, and it is a string
I tried seperating the varibales in the txt document by tabs and commas instead of next line

I tried assigning the "_DATA" to a variable and then concating the two objects. But it had same result:
TableEnd = '_DATA'
TableName = PairName + TableEnd


If I remove the concat of PairName+'_DATA' and just use PairName only as the TableName, then it works correctly.
Posted
Updated 13-Feb-21 5:25am
v6
Comments
Richard MacCutchan 12-Feb-21 9:25am    
It looks to me as though some of the print statements are overwriting other parts. That is something that is most likely caused by running these as parallel jobs. And since each one is trying to access the same database file you may well find a number of other problems (check the database). I suggest you run these jobs in sequence.
Member 15071189 12-Feb-21 9:30am    
Yes, something is happening with the concat when in parallel. Its weird because it works correctly if I don't concat the strings; if I just create a Table name with the arg only, it works in parallel accessing the same database.

Have you checked that your input file is not in DOS format (i.e. ends in a CRLF rather than just an LF)? You can check this using the FILE command:
$ file test.txt 
test.txt: ASCII text, with CRLF line terminators
$

If it is in DOS format you can convert using tr:
tr -d '\r' < input.file > output.file
 
Share this answer
 
Comments
Member 15071189 12-Feb-21 12:11pm    
That is the cause. Thank you! Your idea fixed it.
k5054 12-Feb-21 12:18pm    
You're welcome! Glad to be of assistance
I know this has been solved, but it occurred to me that there might be an issue with the code as presented. Namely, SQLite does not do concurrent writes to the database, so unless the full problem is compute bound, its probably quicker to do the table creation in a loop, rather than trying to parallelize it.

I tested the code given and got a number of
sqlite3.OperationalError: database is locked
error messages when running this. I was able to get around this by adding a timeout to the connect statement e.g
Python
connection = sqlite3.connect(DTBLocation, timeout=1)

Running this with a list of 100 6-char names in parallel on my system took about 0.6 seconds to complete.
Using file.readlines() and a loop reduced that time to 0.2 seconds. e.g.
Python
pairs = open('PairNames.txt', 'r')
Lines = pairs.readlines()

for line in Lines:
    TableName = line.strip() '_DATA'
    cursor.execute("""CREATE TABLE ... """.format(Tablename))

connection.commit()
 
Share this answer
 
Comments
Member 15071189 13-Feb-21 12:24pm    
I appreciate you sharing your insight. The first solution did fix the problem I was having. And yes, your solution of going serial is valid. The reason parallel matters is not in creating the tables per se, but all the scripts I have to do after creating the table. Creating tables was simple and quick script for me to practice; for me to understand this method, so I can apply the lessons to the other, more complex scripts that have to be run in parallel. And yes, I now have learned the concurrent writing limitation of sqlite is a problem. I am actually as a result, tonight, switching to PostegreSQL for this reason. This way, I can solve the problem for the aforementioned future scripts that I do need to run in parallel. Thank you for your contribution.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900