Click here to Skip to main content
15,867,308 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
so i have these (as structured data)

test -> expr
expr -> ambig1 ambig2 ambig3
expr -> ambig1 ambig2 ambig4
expr -> ambig1 ambig2
expr2 -> ambig2 ambig3
expr2 -> ambig2

what I need is this

test -> expr

expr -> ambig1 ambig2 ambig3
expr -> ambig1 ambig2 ambig4
expr -> ambig1 ambig2

expr2 -> ambig2 ambig3
expr2 -> ambig2


where each section is in it's own container (array/dictionary whatever)

i have ideas but they're tough to implement and I'm not sure they will work

I already have a GetCommonPrefix method

I'll accept answers in any code, any language - but no LINQ

These data structures can be anything you like. Arrays, strings, whatever. The general algorithm will be the same, even if the particulars are different.

What I have tried:

I've tried several things. None worked. Pasting the code here would take too much space.
Posted
Updated 10-May-19 4:17am
v2
Comments
CHill60 10-May-19 8:48am    
What do you mean by "structured data"? Are these strings or objects or lists or what?
There's not enough here to even get started on helping you
honey the codewitch 10-May-19 8:49am    
you can consider them strings for the purpose of the question.

but they are arrays of System.Object - where the object is usually a string. Consider the objects to be comparable with equals

Consider the Left of a -> to be the first element, if that helps

The problem is independent of whether they are arrays or strings, the algorithm will be the same either way.

Paste and run code here:
https://www.onlinegdb.com/online_python_interpreter[^]

I would not call it exactly hard.
Python
lines = [   "test -> expr",
            "expr -> ambig1 ambig2 ambig3",
            "expr -> ambig1 ambig2 ambig4",
            "expr -> ambig1 ambig2",
            "expr2 -> ambig2 ambig3", 
            "expr2 -> ambig2"]

def get_matcher(s):
    return s[:s.find("->")+2]
    
def group_data(lines):
    groups = []
    new_group = []
    matcher = get_matcher(lines[0])
    for line in lines:
        if line.startswith(matcher):
            new_group.append(line)
        else:
            groups.append(new_group)
            new_group = [line]
            matcher = get_matcher(line)
            
    groups.append(new_group)
    return groups    
    
print(group_data(lines))
 
Share this answer
 
Comments
megaadam 10-May-19 10:30am    
The above assumes that data is "ordered" like the data you provided.

If it is unordered eg:
lines = [
"test -> expr",
"expr -> ambig1 ambig2 ambig3",
"test -> ambig1 ambig2 ambig4",

Then it is better to make a dictionary [Python] or map [C++] where the keys are "matcher" and values are "lines". So instead of making a list of lists it will be a dictionary of lists
honey the codewitch 10-May-19 10:34am    
yeah I'm using a dictionary in my code, but thanks for the heads up anyway. I'm basically adapting what you solved to see if it works. It looks too simple. I'm almost positive the solution will be recursive or at least use nested loops, but maybe I'm wrong. If you're interested, this is part of a larger problem called left factoring LL grammars

https://www.tutorialspoint.com/compiler_design/left_factoring.asp

megaadam 10-May-19 11:03am    
Well i would think that IF your data is recursive (i.e. if the "ambig" items contain new lists of "L -> R" expressions) then the code might need to be recursive, or at at least the parser needs to build a recursive data structure.
honey the codewitch 10-May-19 11:39am    
I don't think I'm explaining myself well. I mean recursive in the same way a sort() method might be. recursive comparisons, not comparisons over a recursive data structure, if that makes sense.

adding, it's possible to make those things recursive.

expr -> term expr

honey the codewitch 10-May-19 11:51am    
Aaand I solved it. Sure enough, an inner loop was required (could have been done recursively)

It's kind of involved. Your solution didn't quite do it, but it almost worked. I am using a variation of the concept in it anyway
I tend to implement this like

Create dictionary collection to store results with string as key, list of string as value
For each string
    Get prefix
    If dictionary contains prefix as a key
        Retrieve value from dictionary and add result to it
    Else
        Add new item to dictionary with prefix as key and result in value collection
Loop
 
Share this answer
 
Comments
honey the codewitch 10-May-19 9:08am    
but how do I know what the prefix is without other items to compare it to?

like in

expr -> ambig2 ambig3

is the prefix expr->ambig2 or expr-> ambig2 ambig3

the trouble is I only know the prefix when compared to other items.
F-ES Sitecore 10-May-19 9:21am    
In that case it might be better to simply store the data in a tree structure with each item a node. Similar to the above algorithm you'll split the item into its parts, then for each part look for a node in the tree and if it doesn't exist add it, then move down a level for each item. So you'll end up with something like

test
-- expr
expr
-- ambig1
---- ambig2
------ ambig3
------ ambig4
etc

you can store the original data as the value of each node and at the end of the process you'll have a representation of the data nested by prefix.
honey the codewitch 10-May-19 9:28am    
you might have something there but i have to wrap my head around it. MOAR coffee. =) Thanks.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900