Home > Back-end >  Reimplement an algorithm to create a refine list
Reimplement an algorithm to create a refine list

Time:02-04

I'm trying to reimplement an algorithm to create a refine keywords list. I don't have the original source code, only the tool .exe file, so I only have the input and the expected output.

The problem here is that the output of my function doesn't match with the output of the original one. Here's the code that I'm using:

string[] inputLines = File.ReadAllLines("Input.txt");
Dictionary<string, int> keywordsCount = new Dictionary<string, int>();
List<string> refineList = new List<string>();

//Get Keywords Count
foreach (string fileName in inputLines)
{
    string[] fileNameSplitted = fileName.Split('_');
    for (int i = 0; i < fileNameSplitted.Length; i  )
    {
        string currentKeyWord = fileNameSplitted[i];
        if (!string.Equals(currentKeyWord, "SFX", StringComparison.OrdinalIgnoreCase))
        {
            if (keywordsCount.ContainsKey(fileNameSplitted[i]))
            {
                keywordsCount[fileNameSplitted[i]]  = 1;
            }
            else
            {
                keywordsCount.Add(fileNameSplitted[i], 1);
            }
        }
    }
}

//Get final keywords
foreach (KeyValuePair<string, int> keyword in keywordsCount)
{
    if (keyword.Value > 2 && keyword.Key.Length > 2)
    {
        refineList.Add(keyword.Key);
    }
}

The input file:

SFX_AMB_BIRDSONG
SFX_AMB_BIRDSONG_MISC
SFX_AMB_BIRDSONG_SEAGULL
SFX_AMB_BIRDSONG_SEAGULL_BUSY
SFX_AMB_BIRDSONG_VULTURE
SFX_AMB_CAVES_DRIP
SFX_AMB_CAVES_DRIP_AUTO
SFX_AMB_CAVES_LOOP
SFX_AMB_DESERT_CICADAS
SFX_AMB_EARTHQUAKE
SFX_AMB_EARTHQUAKE_SHORT
SFX_AMB_EARTHQUAKE_STREAMED
SFX_AMB_FIRE_BURNING
SFX_AMB_FIRE_CAMP_FIRE
SFX_AMB_FIRE_JET
SFX_AMB_FIRE_LAVA
SFX_AMB_FIRE_LAVA_DEEP
SFX_AMB_FIRE_LAVA_JET1
SFX_AMB_FIRE_LAVA_JET2
SFX_AMB_FIRE_LAVA_JET3
SFX_AMB_FIRE_LAVA_JET_STOP
SFX_AMB_UNDW_BUBBLE_RELEASE
SFX_AMB_UNDW_BUBBLE_RELEASE_AUTO
SFX_AMB_WATER_BEACH1
SFX_AMB_WATER_BEACH2
SFX_AMB_WATER_BEACH3
SFX_AMB_WATER_CANALS
SFX_AMB_WATER_FALL_HUGE
SFX_AMB_WATER_FALL_NORMAL
SFX_AMB_WATER_FALL_NORMAL2
SFX_AMB_WATER_FALL_NORMAL3
SFX_AMB_WATER_FOUNTAIN
SFX_CS_LUX_PORTAL_LIGHTNING
SFX_CS_LUX_PORTAL_LIGHTNING1
SFX_CS_LUX_PORTAL_LIGHTNING2
SFX_CS_LUX_PRIEST_COWER
SFX_CS_LUX_PRIEST_MEDAL
SFX_CS_LUX_PRIEST_MEDITATE
SFX_CS_LUX_PRIEST_SCREAM
SFX_CS_LUX_PRIEST_SNIFF1
SFX_CS_LUX_PRIEST_SNIFF2
SFX_CS_LUX_PRIEST_SPIRITS
SFX_CS_LUX_PRIEST_SPIRITS2
SFX_CS_LUX_PRIEST_SPIRITS3
SFX_CS_LUX_PRIEST_SURPRISE
SFX_MON_BM05_TOO_WALK1
SFX_MON_BM05_TOO_WALK2
SFX_MON_BM06_SQU_WALK1
SFX_MON_BM06_SQU_WALK2
SFX_MON_BR06_HAL_ATTACK1
SFX_MON_BR06_HAL_ATTACK2
SFX_MON_BR06_HAL_DIE
SFX_MON_BR06_HAL_HIT
SFX_MON_BR06_HAL_IDLE
SFX_MON_BR06_HAL_IDLE_EATING
SFX_MON_BR06_HAL_LAND1
SFX_MON_BR06_HAL_LAND2
SFX_MON_BR06_HAL_SCRAPE
SFX_MON_BR06_HAL_SLAM
SFX_MON_BR06_HAL_SURPRISE
SFX_MON_BR06_HAL_WALK1
SFX_MON_BR06_HAL_WALK2
SFX_MON_BU01_MUM_ATTACK1
SFX_MON_BU01_MUM_ATTACK2
SFX_MON_BU01_MUM_DIE
SFX_MON_BU01_MUM_HIT
SFX_MON_BU01_MUM_IDLE_RETRIEVE
SFX_MON_BU01_MUM_IDLE_RETRIEVE_GROW
SFX_MON_BU01_MUM_SURPRISE
SFX_MON_BU01_MUM_WALK1
SFX_MON_BU01_MUM_WALK2
SFX_WATER_SPLASH_BIG
SFX_WATER_SPLASH_BIG1
SFX_WATER_SPLASH_BIG2
SFX_WATER_SPLASH_BIG3
SFX_WATER_SPLASH_MED1
SFX_WATER_SPLASH_MED2
SFX_WATER_SPLASH_MED3
SFX_WATER_SPLASH_MEDIUM
SFX_WATER_SPLASH_OUT
SFX_WATER_SPLASH_OUT1
SFX_WATER_SPLASH_OUT2
SFX_WATER_SPLASH_SMALL

And the expected output (from the original tool):

AMB
MON
WATER
LUX
BR06
HAL
SPLASH
PRIEST
FIRE
BU01
MUM
LAVA
BIRDSONG
WALK1
WALK2
JET
IDLE
EARTHQUAKE
FALL
SURPRISE
BIG
CAVES

What should I modify to make that my method matches with the original output?

Thanks in advance!

CodePudding user response:

How about taking it as a block of text, splitting on line endings or underscores and getting the unique remnants:

File.ReadAllText(path)
  .Split(new[]{'\r','\n','_'},StringSplitOptions.RemoveEmptyEntries)
  .Distinct();

Hang on.. maybe it's only words three plus length, that appear thrice or more:

File.ReadAllText(path)
  .Split(new[]{'\r','\n','_'},StringSplitOptions.RemoveEmptyEntries)
  .GroupBy(w => w)
  .Where(g => g.Key.Length > 2 && g.Count() > 2)
  .Select(g => g.Key)

If you have a fixed list of words to exclude you can do e.g. .Except(new[]{ "SFX", "..." }) on the end..

CodePudding user response:

You can do it with plain LINQ, use a GroupBy and convert it to a dictionary. On that Dictionary you can add additional criteria where you e.g. check the minimum amount of occurrences. You don't need to worry about several if-else conditions and keeps it pretty readable:

string[] inputLines = File.ReadAllLines("Input.txt");

var output = inputLines
    .SelectMany(s =>
        s.Split('_')
            .Where(w => w != "SFX")
        )
    .GroupBy(g => g)
    .ToDictionary(s => s.Key, s => s.Count())
    .Where(w => w.Key.Length > 2 && w.Value > 2);

enter image description here

  •  Tags:  
  • Related