Click here to Skip to main content
15,884,298 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Using early stopping my model stops training at around 7 epochs because of overfitting
MAX_SEQUENCE_LENGTH = 1000
MAX_NUM_WORDS = 20000
EMBEDDING_DIM = 100
VALIDATION_SPLIT = 0.2
output_nodes = 759

embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(embedding_matrix),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False)

print('Training model.')
output_nodes = y_train.shape[1]
# train a 1D convnet with global maxpooling
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = GlobalMaxPooling1D()(x)
x = Dense(128, activation='relu')(x)
preds = Dense(output_nodes, activation='softmax')(x)

model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])


What I have tried:

I tried increasing the input nodes, reducing the batch size and using the k-folds method to improve performance. However, can't get the accuracy to cross 50%. Any thoughts on how can I achieve higher accuracy? I am trying to predict authors based on text. My data has 98k rows
Posted
Comments
[no name] 24-Mar-20 12:44pm    
Maybe they just ripped each other off and there is nothing unique to distinguish one "author" from the other. You're assuming everyone actually has their own "unique" writing style. I think you'll have better luck matching "gendres" than authors.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900