|
|
|
|
Not sure whether there is any free software, but I've heard people using Vector space model[^] and other IR techniques for that purpose.
|
|
|
|
|
One I know is Copyscape[^]
I learnt that, some article/ blog/ eBook writers (!) uses it while outsourcing to ghost writers.
Thanks,
Milind
|
|
|
|
|
I could be way off base as I've never written a dissertation myself, however...
Don't you usually have an advisor for these things? Could you not simply ask what the university's process is for checking for plagiarism?
Honesty is the best policy (usually,) and it seems to me if you are upfront and honest about your concerns, they should be willing to help.
|
|
|
|
|
HobbyProggy wrote: I do triplecheck if i have quoted all statements from websites and books but i fear i could read past something and forgett it.
Reference as you. I find it strange (and I know several people that do this) where they quote something and then months later they have to go back and find the references. Particularly hard to do when you're quoting something found only on paper, so you can't even do a search. What a royal waste of time.
I quote something, I hit the footnote button and reference it immediately.
Marc
|
|
|
|
|
|
You're doing it wrong.[^]
Quote: Plagiarize
Let no one else's work evade your eyes
Remember why the good Lord made your eyes
So don't shade your eyes
But plagiarize, plagiarize, plagiarize
Only be sure always to call it please "research"
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, waging all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
Reminds me of a quote from my grad school days...
"Stealing from one is called plagiarism.
Stealing from many is called research."
I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone - Bjarne Stroustrup
The world is going to laugh at you anyway, might as well crack the 1st joke!
My code has no bugs, it runs exactly as it was written.
|
|
|
|
|
I don't understand. You're writing it, right? Then I'd think you would know whether anything was plagiarized without putting it through a scanner.
We can program with only 1's, but if all you've got are zeros, you've got nothing.
|
|
|
|
|
The issue (as I understand it) is that since your brain is a fantastic sponge (but not always the best at recalling all the details,) the possibility exists that the OP could have recalled the exact wording of something read somewhere, but not exactly where it was read. Thus, you end up with details in your dissertation that match the wording of someone else's previous work closely enough to trigger a match in a plagiarism checker, even though the intention was not to plagiarize.
Running your work through various engines can help to reveal such slip-ups.
|
|
|
|
|
Why go through all this trouble when there's this[^]?
/ravi
|
|
|
|
|
Use the one the University uses, for project submissions. They can't fault it, can they.
RA
|
|
|
|
|
|
Impressive!
The signature is in building process.. Please wait...
|
|
|
|
|
Really beautiful; in my eye has a kind of "surreal" quality with a top-note of the "supernatural." thanks, Bill
«A man will be imprisoned in a room with a door that's unlocked and opens inwards ... as long as it does not occur to him to pull rather than push» Wittgenstein
|
|
|
|
|
Greetings Folks,
I think this is an unusual situation and I am concerned about a few parts, but especially the machine and environment they want me to use.
As a later, hopefully final note, the bottleneck is then the memory swapping at the L1 and L2 cache memory of the CPU, including the memory "Bridges". So CPU count does not matter.
The story is that the company I work for uses an in house application that is fairly CPU intensive, file IO intensive and database intensive. It currently runs on 11 machines. Note that the databases seem to keep up with all 11 machines.
OK, So they wanted me to write a multi-threaded version of the application that would run "on one machine".
Can do (can did), but I'm concerned about a problem with resource limitations on processing. It's a question of where is the bottleneck. It seems to have handled the database IO OK when 11 machines were running. I told them that ideally I'd like a 2 CPU Athelon (with 16 cores each) machine and a number of SSD drives. This is my reasoning.
With that many cores, I wouldn't run out of threads. (The code does limit how many jobs are started at any time). We'll assume that it has 64 Gig of RAM, so there shouldn't be a bottleneck there. Now it does a lot of file IO, but testing shows the delay is at the file read. Writes are so fast that I assume the OS (Windows by the way) is buffering to RAM and I don't see it writing to disk. That doesn't work with reads, so my stats show 1.5 seconds is common for say 5 files read.
So there are two questions. The first is that if I have all these cores, RAM, SSD drives, no database limit. Where is the bottleneck? Is it the bus and does anyone have any thoughts on it?
The second question... the one that matters. I don't get the machine I want. I get the machine they give me and worse yet, they want it to run on VM Ware. I don't think they understand the problem and I understand why. This is an unusual model. I am replacing current physical machines with a "virtual machine" with each thread I create to process a "job". I am making my own specialized version of VM Ware. They say they can continuously give me more cores if I need them... why I have detailed performance statistics going into a JSON message to a database. (By the way, I assume it will be Intel, so there will be fewer, but faster cores. I'd rather have more, slower cores.) My real question is (especially if my bottleneck is the bus), how much of a penalty do I pay for VM Ware? If it controls/channels data throughput on the bus, I could take a huge hit. I'd appreciate any thoughts on this, especially if you know how VM Ware is going to effect it. Thanks, Mike
*** *** *** *** ***
OK then. For the people saying Profile, then what do I want to profile? I'm fairly familiar with Perfmon, but I don't think it is going to help here, This is why... Please show me the error of my ways. Remember, I am recording the DateTime.Now.Ticks, before and after 9 processes (db and file IO) to a JSON message (which is tossed on a UDP message for another server to write to database), so I have some good information to start with.
"There are five major resource areas that can cause bottlenecks and affect server performance: physical disk, memory, process, CPU, and network."
1. I will see physical disk problems if my diagnostics show any sudden changes in a read operation statistics.
2. Memory - Yah, I have to check this one and I will scream at the hardware people if it happens, but they swear it won't.
3. Process - At startup I record the threads in use and the application throttles how many jobs are being processed at once, so I know how many threads are being used. It can be changed dynamically from the web page that monitors the application by writing to the database which the application periodically polls. Again, this is a good one to look at so I can wretch at the hardware people if I don't have enough cores.
4. CPU - Same issue.
5. Network - Same issue, though file read times over the network will be recorded and file write times seem to be limited by RAM since hey seem to be buffered... They are very fast.
Yes, the Perfmon will be helpful, but the original question still stands. If these are not the bottlenecks because of multiple cores, Mucho RAM and multiple SSD drives, what bottleneck do I hit and when. Is it the bus? Will VM Ware make it worse? ... Thanks much.
modified 12-Jan-15 10:51am.
|
|
|
|
|
It's the only way you can answer a question about where your bottleneck is.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, waging all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
My instinct is to say you need to instrument the application and measure it to identify where exactly the bottleneck is - anything else is really speculation. You could get a rough idea as to where the hold-ups are by watching perfmon but that will not tell you the underlying cause.
|
|
|
|
|
Sound like file read is the real bottleneck or is it "the visible part" of your slow database, but are you really "only" reading or is some processing after reading done. Like some database stuff: fetchting, indexing or temp table.
Multi-threading and multi-cpu makes only sense if it solves the problem. But it isnt - so I think a bunch of RAM is fine enough.
My tip is, that the problem is in the database (design?) or on the db-server. The reads are so slow
Press F1 for help or google it.
Greetings from Germany
|
|
|
|
|
Okay, whenever I hear someone say they've made a multi-threaded version of anything that wasn't multi-threaded before, alarm bells start to go off in my head. Multi-threading is hard to get right, and reading that post made me think that you have a lot of contention going on in there. You mentioned that it's taking a long time to read files but not to write to disk - that suggests to me that you have more than one process trying to get access to the same file at the same time. Looking at that would be the first thing I would prioritise if I were profiling the application.
|
|
|
|
|
I find this a very interesting scenario, and I would like to have a better sense of what's involved ... what the trade-offs are ... in using single-box lotsa-cores, lotsa-boxes few cores, lotsa ram, server-client configurations (SSD where ? spinning-platters where ?), network structure, server software, db choice andd configuration, optimal storage strategies, etc.
I would very much enjoy reading an article here on how you went about profiling and how your implementation strategy was planned, how it evolved as you profiled and experimented, etc.
thanks, Bill
«A man will be imprisoned in a room with a door that's unlocked and opens inwards ... as long as it does not occur to him to pull rather than push» Wittgenstein
|
|
|
|
|
FYI Bob, the bottleneck is then the memory swapping at the L1 and L2 cache memory of the CPU, including the memory "Bridges". So CPU count does not matter.
|
|
|
|
|
|