Without code supplied it is a bit of a shot in the dark, we used something similar to the below example -
1) a Lot of tips to be found on reducing memory consumption at
Reducing memory consumption in Java[
^]
2) a Great article with tips on how to tune your code to be more efficient from Oracle at
Tuning For a Small Memory Footprint[
^]
3) Using the right Garbage collector
Minimize Java Memory Usage with the Right Garbage Collector[
^]
4) a Tool to test and rectify your Java performance at
Rapidly Optimize Java Performance[
^]
With your supplied issue, I will do the following -
a) Read the file line by line -
b) Split each line into columns -
c) Extract the value from the first column to perform duplicate check -
d) Check if the value exists in a HashSet -
e) If the value is duplicated in the HashSet, perform your action (like deleting etc.) -
f) If the value is not duplicated, add it to the HashSet.
g) Repeat steps b-f for each line in your file -
h) For Garbage Collection, see below -
The above will only store unique values in memory, minimizing your RAM usage.
If you want to be explicit about garbage collection in your code, you can invoke it using the 'System.gc()' method.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
public class CheckForDuplicatesInFirstColumn {
public static void main(String[] args) {
String filename = "your_file_to_read_from.csv";
HashSet<String> uniqueValues = new HashSet<>();
try (BufferedReader br = new BufferedReader(new FileReader(filename))) {
String line;
while ((line = br.readLine()) != null) {
String[] columns = line.split(",");
String value = columns[0];
if (uniqueValues.contains(value)) {
System.out.println("Duplicate found: " + line);
} else {
uniqueValues.add(value);
}
}
} catch (IOException e) {
e.printStackTrace();
}
System.gc();
}
}