Click here to Skip to main content
15,881,248 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
This is input dataset:
Python
<code></code>
| text     | term_index     |
| -------- | -------------- |
| liked aluminum body|[0, 1, 1]|
| lightweight screen beautiful|[0, 1, 0]|


This is my code
Python
<pre>spacy_eg = spacy.load('en') 

def tokenize_eng(text):
    return [tok.text for tok in spacy_eg.tokenizer(text)]
TEXT =  Field(sequential = True, use_vocab = True, 
                    tokenize = tokenize_eng, lower = True,
                    init_token = '<s>', eos_token = '</s>', fix_length =104) 
INDEX = Field(sequential = False, use_vocab = False, init_token = '<s>', eos_token = '</s>', fix_length =104)
fields1= [('text', TEXT), ('term_idex', INDEX)]
train= TabularDataset(path ='/content/trial.csv', format = 'csv', fields = fields1) 

TEXT.build_vocab(train, vectors=GloVe(name="6B", dim=300), max_size = 10000, min_freq =2) 
dataset_iter = Iterator(
        train, batch_size=10,
        train=True, shuffle= True)
for batch in dataset_iter:
    batch.text[0]
    batch.term_index[0]


Python
run> 
ValueError                                Traceback (most recent call last)
<ipython-input-35-51f333c9c2b1> in <module>()
---->1 for batch in dataset_iter: 
2     batch.text[0]
3     batch.term_index[0]

ValueError: invalid literal for int() with base 10: 'complete opposite ergonomic design'



How can I deal with it?

What I have tried:

I have checked Field function several times; still cannot solve it.
Posted
Updated 20-Jul-21 4:04am
Comments
Linlin Zeng 21-Jul-21 5:03am    
TEXT.vocab.vectors.size() # torch.Size([4, 300])
I suppose when building vocab, there is something wrong
'''
import spacy
import pandas as pd
from torchtext.legacy.data import Field, TabularDataset, BucketIterator, LabelField
from sklearn.model_selection import train_test_split
from torchtext.vocab import GloVe
'''
This is all packages I used; not import bird

Getting your code to run does not mean it is right! :laugh:
Think of the development process as writing an email: compiling successfully means that you wrote the email in the right language - English, rather than German for example - not that the email contained the message you wanted to send.

So now you enter the second stage of development (in reality it's the fourth or fifth, but you'll come to the earlier stages later): Testing and Debugging.

Start by looking at what it does do, and how that differs from what you wanted. This is important, because it give you information as to why it's doing it. For example, if a program is intended to let the user enter a number and it doubles it and prints the answer, then if the input / output was like this:
Input   Expected output    Actual output
  1            2                 1
  2            4                 4
  3            6                 9
  4            8                16
Then it's fairly obvious that the problem is with the bit which doubles it - it's not adding itself to itself, or multiplying it by 2, it's multiplying it by itself and returning the square of the input.
So with that, you can look at the code and it's obvious that it's somewhere here:
C#
int Double(int value)
   {
   return value * value;
   }

Once you have an idea what might be going wrong, start using the debugger* to find out why. Put a breakpoint on the first line of the method, and run your app. When it reaches the breakpoint, the debugger will stop, and hand control over to you. You can now run your code line-by-line (called "single stepping") and look at (or even change) variable contents as necessary (heck, you can even change the code and try again if you need to).
Think about what each line in the code should do before you execute it, and compare that to what it actually did when you use the "Step over" button to execute each line in turn. Did it do what you expect? If so, move on to the next line.
If not, why not? How does it differ?
Hopefully, that should help you locate which part of that code has a problem, and what the problem is.
This is a skill, and it's one which is well worth developing as it helps you in the real world as well as in development. And like all skills, it only improves by use!

* pdb — The Python Debugger — Python 3.9.6 documentation[^]
 
Share this answer
 
Comments
Linlin Zeng 21-Jul-21 3:50am    
oh; actually, it is more helpful if you can tell me how to modify it
OriginalGriff 21-Jul-21 4:48am    
Helpful for whom?
Not for me - that code is ... um ... different.
Not for you - that just gets you something you can hand in without learning anything from the process.
Linlin Zeng 21-Jul-21 5:25am    
okay; I found when running :
'''
TEXT.vocab.vectors.size()
'''
--> torch.Size([4, 300])
since the size should be 479 * 300; maybe I build a wrong vocab?
Python
for batch in dataset_iter:
    batch.text[0]
    batch.term_index[0]

What are these statements supposed to do, they do not make much sense as Python. Are you using some third party library which you have not mentioned?
 
Share this answer
 
Comments
Linlin Zeng 21-Jul-21 3:48am    
dataset_iter = Iterator(
train, batch_size=10,
train=True, shuffle= True)
here dataset_iter is an iteration of train after using batch size; train is a tabular dataset after using Field to transfer word into tensors.
Richard MacCutchan 21-Jul-21 4:17am    
Well that is fairly obvious, but it does not answer my question. What is the statement batch.text[0] supposed to do? And what library are you using, I could not find any definition of Field, is it a class or method, and if so where does it come from?
Linlin Zeng 21-Jul-21 5:23am    
Sorry for misunderstanding your reply; I am using following packages;

'''import spacy
import pandas as pd
from torchtext.legacy.data import Field, TabularDataset, BucketIterator, LabelField
from sklearn.model_selection import train_test_split
from torchtext.vocab import GloVe'''

for batch in dataset_iter:
batch.text[0]
batch.term_index[0]
it suppose to print:
torchtext.data.batch.Batch of size 10
.text torch.cuda.LongTensor of size 104*10
.term_index torch.cuda.LongTensor of size 104*10;
just tell the batch can be run successfully

Richard MacCutchan 21-Jul-21 5:56am    
OK, well you need to check the documentation for the class that creates that iterator, to find out exactly what sort of object is returned as batch. As it stands text[0] and term_index[0] do not look like methods that will print anything.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900