Integrating and Blending Data – inModeler
How to Format Source Data Files for Loading
Estimated reading: 2 minutes
89 views
There are multiple ways text-based data files (CSV, TSV, Pipe delimited) can be formatted. To ensure a seamless experience loading your text-based data file into Inzata, we recommend following the guidelines below:
- Text file delimited by one of the following way is supported:
- Comma
- Semicolon
- Tab
- Tilde
- Pipe
- A text file represents one table (referred to as a cluster in an Inzata Project) with the data column types:
- attributes (identifiers, codes or descriptions for categorical variables. The values of an attribute column can be created as any alphanumeric text.)
- labels (A label is part of an attribute object. An attribute can be represented by more display forms, e.g. the “Person” attribute has the “Last Name”, “SSN” and “Email Address” labels).
- facts (quantitative measures defined as numbers). Use numeric values only.
- The first row of this file contains the column names with a delimiter. These names are pre-selected as attribute and fact identifiers (names) during the “Creating a New Load” process article. We recommend using a file header with business names of attributes and facts.
- One column is a primary key which has the unique values in the text file or across all the files loaded incrementally into one cluster.
- We recommend (as the first – data integration) using a set of smaller files to create a logical data model in inModeler. Any file with from 100 to 1000 records is sufficient for this. If a user wants to load big data then to such crated enterprise LDM, then the IM ETLp system is designed for such data processing. This option also supports automated, scheduled data loading on a regular basis.
- The NULL value is represented as an empty string (without quotes) in a text file.
- String values in a file data can be used with double quotes or apostrophes as a character indicating beginning and end of a string.
- Double quote character cannot be used in data except using it as a string separator mentioned above.The special characters [%’/#@$<>*] are available in data but they are not available in the first row as a column’s header.
- There are following available Date and Time formats:
- Date:
YYYYMMDD
DD/MM/YYYY
D/M/YYYY
MM/DD/YYYY
M/D/YYYY
MM-DD-YYYY
DD-MM-YYYY
YYYY-MM-DD
YYYY-DD-MM
YYYY/DD/MM
- Date Time:
the same as date including a time with format HH:MM
- Time:
The available format: HH:MM.