Tokenizer Module
Source: /ellie_engine/tokenizer
Creating Tokenizer
Example Implementation
Creating ProgramRepository
To create tokenizer you should create a struct for [Program Repository]()
trait. ProgramRepository is the file reader bridge between tokenizer and user-defined code.
Create a struct that implements ProgramRepository
trait.
#[derive(Clone)]
struct Repository {
used_modules: Vec<String>,
main_hash: usize,
module_name: String,
target_path: String,
}
What is ProgramRepository::read_main
?
Signature of ProgramRepository::read_main
:
impl ProgramRepository for Repository {
fn read_main(&self) -> MainProgram {
//...
}
//...
}
What is ellie_engine::utils::MainProgram
? And whats content of it?
By default every file’s should have a unique file_hash
, contents of it file_content
and directory information such as file_name
and start_director
. file_hash
is a unique hash, but Id doesn’t need to be hash of file content. It can be anything that is unique.
Implementing ProgramRepository::read_main
In this example we got the hash of the file but its not necessary.
impl ProgramRepository for Repository {
fn read_main(&self) -> MainProgram {
let main_file_content = fs::read_to_string(&self.target_path).unwrap();
let file_name = self.target_path.split("/").collect::<Vec<&str>>();
let file_name = file_name[file_name.len() - 1];
let mut main_file_hasher = DefaultHasher::new();
main_file_content.hash(&mut main_file_hasher);
let first_page_hash = main_file_hasher.finish();
self.main_hash = first_page_hash as usize;
MainProgram {
file_content: main_file_content,
file_name:
file_hash: first_page_hash as usize,
start_directory: format!(
"<ellie_module_{}>",
self.cli_compiler_settings.compiler_settings.name
),
}
}
//...
}
Now ProgramRepository::read_main
function is implemented. Lets implement ProgramRepository::read_module
function.
What is ProgramRepository::read_module
?
Signature of ProgramRepository::read_module
:
impl ProgramRepository for Repository {
fn read_module(
&mut self,
link_module: bool,
current_path: String,
requested_path: String,
) -> ResolvedImport {
//...
}
//...
}
So what is ResolvedImport
? ResolvedImport
is message that we’re sending to tokenizer.
#[derive(Default, Debug)]
pub struct ResolvedImport {
pub found: bool,
pub resolve_error: String,
pub hash: usize,
pub path: String,
pub matched: ImportType,
}
-
ResolvedImport::found
is a boolean that tells tokenizer if the file is found or not. If the file is not found tokenizer will throw an error. The error message is dependend onResolvedImport::resolve_error
if its provided it will display that custom error message. If not it will display default error message. -
ResolvedImport::hash
is a hash of the file. It can be anything that is unique. -
ResolvedImport::path
is a path of the file. It can be anything that is unique. -
ResolvedImport::matched
This is the part where everything gets complicated.pub enum ImportType { Code(String), Module(Module), }
So this enum has two variants.
Code
andModule
.Code
variant is for importing code.Module
variant is for importing pre-compiled modules, we will talk about this later. -
What is parameters of
ProgramRepository::read_module
?link_module
is a boolean that tells tokenizer if the file is a module or not. Think of it like you are importing a module from package manager. Usually import imports from path this is why we needlink_module
parameter.current_path
is a path of the file that is importing the module.requested_path
is a path of the module that is being imported.
Implementing ProgramRepository::read_module
Providing Code ImportType::Code
In this example we will provide ImportType::Code
(Text) to tokenizer. We will read the file and return it as ImportType::Code
.
First of all, we should see how paramters could be:
Link module: false
Current path: <ellie_module_main>/main.ei
Requested path: ./deb.ei
What’s this <ellie_module_main>
?
Ellie does not have a directory manager built-in because of maximum flexibility. So we need to provide a directory manager. In this example we will use <ellie_module_main>
as our main directory. We already know our main.ei
file’s path so its not that hard to figure out where is ./deb.ei
file.
impl ProgramRepository for Repository {
fn read_module(
&mut self,
link_module: bool,
current_path: String,
requested_path: String,
) -> ResolvedImport {
let starter_name = format!("<ellie_module_{}>", self.module_name);
match parse_module_import(¤t_path, &requested_path) {
Ok(path) => {
...
},
Err(err) => {
...
}
}
Wow what happened here? Do you remember module_name
variable? We used it in Repository
struct. Also parse_module_import
function comes from ellie_engine::ellie_core::module_path
module. It takes two parameters current_path
and requested_path
. It returns Ok(path)
if the path is valid. If the path is not valid it returns Err(err)
. Now how to handle errors?
impl ProgramRepository for Repository {
fn read_module(
&mut self,
link_module: bool,
current_path: String,
requested_path: String,
) -> ResolvedImport {
let starter_name = format!("<ellie_module_{}>", self.module_name);
match parse_module_import(¤t_path, &requested_path) {
Ok(path) => {
...
}
Err(err) => {
ResolvedImport {
found: false,
resolve_error: "Cannot access outside of workspace".to_string(),
..Default::default()
}
}
}
If parse_module_import
gives a error its definetly because of the path is not valid. So we will return ResolvedImport
with found
as false
and resolve_error
as Cannot access outside of workspace
. Now lets handle Ok(path)
.
impl ProgramRepository for Repository {
fn read_module(
&mut self,
link_module: bool,
current_path: String,
requested_path: String,
) -> ResolvedImport {
let starter_name = format!("<ellie_module_{}>", self.module_name);
match parse_module_import(¤t_path, &requested_path) {
Ok(path) => {
let real_path = path
.replace(
&starter_name,
Path::new(&self.target_path)
.absolutize()
.unwrap()
.parent()
.unwrap()
.to_str()
.unwrap(),
).clone();
if Path::new(&real_path).exists() {
let mut file = File::open(&real_path).unwrap();
let mut contents = String::new();
file.read_to_string(&mut contents).unwrap();
let mut hasher = DefaultHasher::new();
contents.hash(&mut hasher);
ResolvedImport {
found: true,
hash: hasher.finish().try_into().unwrap(),
path,
matched: ImportType::Code(contents),
}
} else {
ResolvedImport {
found: false,
resolve_error: "Path does not exist".to_string(),
..Default::default()
}
}
}
Err(err) => {
...
}
}
}
}
In this part we got target_path
from our struct. Its the main.ei
file’s absolute path. path
is <ellie_module_main>/deb.ei
and we replaced <ellie_module_main>
with absolute path of our main.ei
file. Which in this case its /home/ellie/main.ei
. Now we are changing <ellie_module_main>
with this path. So the path is now /home/ellie/deb.ei
. Now we are checking if the file exists. If it exists we will read the file and return it as ImportType::Code
. If it does not exist we will return ResolvedImport
with found
as false
and resolve_error
as Path does not exist
.
Providing Module ImportType::Module
This functionality will be removed in the future. So we will not talk about it.
Building Pager
Signature:
pub fn new(
main: String,
main_file_name: String,
path: String,
import_resolver: E,
initial_hash: usize,
) -> Self {
...
}
main
is the main file’s content.main_file_name
is the main file’s name.path
is the main file’s path.import_resolver
is closure for importing modules. |link_module: boolean, path: string, current_path: string| -> ResolvedImport
Pager
is interface for tokenizing workspace. It takes main
file’s content and main_file_name
and path
and import_resolver
and initial_hash
. initial_hash
is the hash of main
file’s content. It returns Pager
struct.
What is Pager::new?
let main_program = program_repository.read_main();
let mut pager = Pager::new(
main_program.file_content,
main_program.file_name,
format!("{}/", main_program.start_directory),
move |link_module, path, module_identifier| {
program_repository.read_module(link_module, path.clone(), module_identifier)
},
main_program.file_hash,
);
match pager.run() {
Ok(_) => Ok(pager.pages),
Err(errors) => Err(errors),
}
Now we are creating Pager
with main_program.file_content
and main_program.file_name
and format!("{}/", main_program.start_directory)
and closure for importing modules and main_program.file_hash
. Now we are running pager
and if it returns Ok
we are returning pager.pages
. If it returns Err
we are returning Err(errors)
.
In next stage, we will talk about Parser
and how to use pager.pages
.