18 Replies Latest reply: May 15, 2014 10:06 AM by Grant Perkins RSS

    Removing "artifacts"

    Chickenman _

      We are exporting some Btrieve customer data to Access and then bringing the Access table into Monarch for cleanup, analysis etc.

       

      In the Access table, the Street Address field (column) has a '?' in a box in the first position of most, but not all, street address fields. When brought into Monarch, this is carried forth as a box (square).

       

      I am attempting to use Strip function but don't know how to define this character. :mad:

       

      Is there a way to clean up either in Access or Monarch?

       

      When stepping through the wizard to import into Monarch, the Sample Data looks like:

       

      \vHIGHWAY 57 should be HIGHWAY 57

      \x162001 Main St should be 2001 Main St

      \x0E39 Meadoow Lane should be 39 Meadow Lane

      etc

       

      all those preceded by '\n' come in fine :confused:

       

      CM

        • Removing "artifacts"
          Olly Bond

          Hello Chickenman,

           

          If you weren't working in Access it would be easy - a combination of the Options Input for treating non ASCII characters, ignoring print control characters etc, and if necessary, a run through MSRP or Utility to clean it up before taking the data into Monarch. Can you get the data from Btrieve into a format like CSV instead?

           

          Best wishes,

           

          Olly

            • Removing "artifacts"
              Chickenman _

              Hello Olly,

               

              Co-incidentally I am looking at export options now to see if that will be a possibility.

               

              Access was chosen as the entire database is being exported to one file with automatic generation of separate tables for each of the database tables similar to the V10 option to export summaries. Slick, but as necessary I can forego that convenience.

               

              Thanx for the quick reply.

               

              CM

                • Removing "artifacts"
                  Olly Bond

                  Hello Chickenman,

                   

                  I'm not familiar enough with Access to know the options, but I would have thought that it would be possible to treat the characters within Access before importing to Monarch as well. Perhaps I'm biased, but if I've ever got to handle data I'll try to use Monarch rather than mucking about in Microsoft Office...

                   

                  If it's just one field causing problems, perhaps a calculated field could be defined in the Access table that strips out the first n characters?

                   

                  Good luck!

                   

                  Olly

                    • Removing "artifacts"
                      Chickenman _

                      Olly,

                       

                      I have very limited Access experience so you may very well be correct.

                       

                      A quick web search revealed a number of low cost utilities that claim to strip the junk out of Access, but this is pretty much a one-shot deal so I hesitate to obtain yet another piece of software, go down the learning cure (albeit short), etc

                       

                      CM

                        • Removing "artifacts"
                          Olly Bond

                          Hello CM,

                           

                          If all the bad data starts with "\xyz123 " then a calculated field in Monarch to remove this would do a lot of the work for you.

                           

                          Something like:

                           

                          if(left(field;1)="\";lsplit(field;2;" ";2);field)

                           

                          might help?

                           

                          Best wishes,

                           

                          Olly

                            • Removing "artifacts"
                              Chickenman _

                              Olly,

                               

                              The beginning characters are all over the lot so that doesn't seem possible.

                               

                              Grant,

                               

                              Yes, it is running over Pervasive SQL. Didn't see anything in the forum that looked useful. I tried exporting from Monarch to Excel and using a variant of parsing carriage returns from address in Excel to no avail.

                               

                              I'm an operations guy and need quick easy solutions as these are just tools. If I were a DBA, this might be the kind of challenge that would break up a dull day...:rolleyes:

                               

                              CM

                                • Removing "artifacts"
                                  Grant Perkins

                                  Olly,

                                   

                                  The beginning characters are all over the lot so that doesn't seem possible.

                                   

                                  Grant,

                                   

                                  Yes, it is running over Pervasive SQL. Didn't see anything in the forum that looked useful. I tried exporting from Monarch to Excel and using a variant of parsing carriage returns from address in Excel to no avail.[/quote]

                                   

                                  Pretty sure they are 'print control' characters but once passed through the Access part they become ascii. No time ot check this at the moemnt but maybe you could export from Access and run the file through Monarch Utility in one form or another and them re-import to access?

                                   

                                  How are you (or whoever) getting the the data from Btrieve to Access? SQL? Is there a setting in there somewhere to deal with the line start characters? Or maybe a wrong ddf (data dafinition file)?

                                   

                                   

                                   

                                  I'm an operations guy and need quick easy solutions as these are just tools. If I were a DBA, this might be the kind of challenge that would break up a dull day...:rolleyes:

                                   

                                  CM[/quote]

                                   

                                  I totally agree with that! No friendly DBA available to you today?

                                   

                                  Grant

                                    • Removing "artifacts"
                                      Chickenman _

                                      It's sort of an on-the-fly DDF and that's the rub - it can pull the fields but cannot determine format and translation for all of them.

                                       

                                      I now have the export as csv and will fiddle with the utility as this is now a burr under my saddle.

                                       

                                      At the end of the day it would be no big trick to just have someone brute force hit all the cells and delete the offending character - just over 2000 rows would occupy someone for a couple hours. But we like the "sufficient level of technology indistinguishable from magic" approach over here

                        • Removing "artifacts"
                          Grant Perkins

                          CM,

                           

                          As I understand it the ? in Access is the equivalent of the 'box' in 'text' files and basically indicates some sort of generic spurious character that the application has no information about in order to help it decide what to do.

                           

                          In the sample you posted the characters are inconsistent and so may take some considerable work to successfully and reliably remove them on a line by line basis. Therefore I would be tempted to go back to fixing in the Access import.

                           

                          It looks like an interpretation/compatibility issue to me leaving an incomplete or partial conversion into Access. It's a while since I was involved in anything like that so I doubt I can offer anything useful directly. However the Btrieve software is a product of Pervasive Software these days and there is a support forum at cs.pervasive.com that might contain the information that solves your issue. Or a pointer to it. There are so many potential variables in the components that I think only someone totally familiar with the conversion implementation and the tool versions in use will be able to dig through the potential solutions.

                           

                          This Wikipedia link may be of interest.

                           

                          http://en.wikipedia.org/wiki/Newline[/URL]

                           

                          in particular the "In programming languages" section which will likely explain the "\n" puzzle.

                           

                          I think of the input was a print file rather than an Access table you could robably uise the 'Ignore Unused Print Control characters' option to strip the line starts but that's not really something one can do with a database input.

                           

                          HTH.

                           

                           

                          Grant

                           

                          Edit:  Looks like Olly had similar thoughts while I was creating this entry!

                          • Removing "artifacts"
                            Chickenman _

                            I appreciate the help offered by Olly and Grant.

                             

                            While it would have been nice to simply work through Monarch, in the end an Excel was obtained from the export and the offending character in column D was got rid of by =RIGHT(D2,LEN(D2)-1) so we can go on with life

                             

                            CM

                              • Removing "artifacts"
                                Grant Perkins

                                I appreciate the help offered by Olly and Grant.

                                 

                                While it would have been nice to simply work through Monarch, in the end an Excel was obtained from the export and the offending character in column D was got rid of by =RIGHT(D2,LEN(D2)-1) so we can go on with life

                                 

                                CM[/quote]

                                 

                                So is this a one-off exercise (assume it is based on the previous post to the above)? In which case a pragmatic resolution trumps any other.

                                 

                                 

                                Grant

                                  • Removing "artifacts"
                                    Chickenman _

                                    Yes, as much as we like to play around and learn new things we chose the route that got us off the dime as this is a tiny part of a much larger project.

                                     

                                    CM

                                      • Removing "artifacts"
                                        Data Kruncher

                                        I'm a little late to this party, but it might be useful to know that you can determine precisely what a mystery character is using the Asc() function.

                                         

                                        If the problem child is the first character in a string, then

                                        Asc(Left(MyField,1))[/CODE]

                                         

                                        will return the numeric ASCII code of the character. The first "normal" code is the space character (ASCII 32), and the last is the tilde ~ character (ASCII 126).

                                         

                                        Additionally, under the Options -> Input menu, you can elect to "ignore unused print control characters (0-31)", where 0 to 31 refers to ASCII values.

                                         

                                        HTH,

                                        Kruncher