Puchaczov / Musoq
- воскресенье, 22 июля 2018 г. в 00:15:11
C#
Use SQL on various data sources
Musoq is handy tool that allows using SQL on various data sources.
Musoq exposes raw data sets as queryable sources. This allows you to write queries to those sources. It uses concepts of schemas and tables to logically group your tables. What would be used as query source? Virtually anything! Those below are just ideas but some of them had been already implemented.
You can also mix sources between each other.
select * from #os.files('path/to/folder', false) where Extension = '.exe' or Extension = '.png'
To run it, you need .NET Core 2.1
runtime. You can find it . If you've got it, then download
. You can find latest releases here.
Never tried it (I definitively have to). It was written to run everywhere, there is no any OS Specific code, however I'm almost sure that the current version will need some tweaks to sucessfully run on different platforms.
*
to select all columns.column.Name
.select * from #schema.table(@Arg2, ...) where ColumnName = @Arg1
FROM ... WHERE ... SELECT...
with p as (select 1 from #source) select 2 from #source.method(p)
LibraryBase
works in mixed sources context.You can easily plug-in your own data source. There is fairly simple plugin api that all sources uses. To read in details how to do it, jump into wiki section of this repo .
#Os | Exposes operating system tables. One of them are disk and files sources |
#Zip | Exposes compressed (.zip) files from the hard disk so that you can decompress files that fits sophisticated conditions. |
#Json | Exposes json file as queryable source. |
#Csv | Exposes csv file as queryable source. |
#FlatFile | Exposes FlatFile file as queryable source. |
#Time | Exposes time as queryable source. |
Gets all files from folder that has extension .exe
or .png
select * from #os.files('path/to/foder', false) where Extension = '.exe' or Extension = '.png'
Gets all hours from 7 to 12 (excludingly) for all saturday and sundays from 01.04.2018 00:00:00
to 30.04.2018 00:00:00
select DateTime, DayOfWeek + 1 from #time.interval('01.04.2018 00:00:00', '30.04.2018 00:00:00', 'hours') where Hour >= 7 and Hour < 12 and (DayOfWeek + 1 = 6 or DayOfWeek + 1 = 7)
Shows .cs
files from folders some_path_to_dir_1
, some_path_to_dir_2
, some_path_to_dir
and their subfolders (uses disk plugin).
select Name, Sha256File(), CreationTime, Length from #os.directory('some_path_to_dir_1', true)
where Extension = '.cs' take 3
union all (Name)
select Name, Sha256File(), CreationTime, Length from #os.directory('some_path_to_dir_2', true)
where Extension = '.cs' take 4
union all (Name)
select Name, Sha256File(), CreationTime, Length from #os.directory('some_path_to_dir', true)
where Extension = '.cs' take 5
Groups by Country
and City
.
select Country, City, Count(City) from #A.Entities() group by Country, City
Accessing complex objects and passing it to method.
select Inc(Self.Array[2]) from #A.Entities()
Compressing files from folder (uses AggregateFiles
grouping method)
select Compress(AggregateFiles(), './Results/some_out_name.zip', 'fastest') from #os.directory('./Files', false)
Decompresses only those files that fits the condition. Files are extracted to directory ./Results/DecompressWithFilterTest
select Decompress(AggregateFiles(File), './Results/DecompressWithFilterTest') from #zip.file('./Files.zip')
where Level = 1
Querying .json
file.
select Name, Age from #json.file('./JsonTestFile_First.json', './JsonTestFile_First.schema.json', ' ')
where schema is defined as:
{
"Age": "int",
"Name": "string",
"Books": []
}
and file to be queried is:
[
{
"Name": "Aleksander",
"Age": 24,
"Books": [
{
"Name": "A"
},
{
"Name" : "B"
}
]
},
{
"Name": "Mikolaj",
"Age": 11,
"Books": []
},
{
"Name": "Marek",
"Age": 45,
"Books": []
}
]
You can easily check it by typing a query that asks the source about columns it has. It's super easy and looks like desc #git.commits('path/to/repo')
. All plugins supports it out of the box!
On the one hand, I needed something that allows me performing queries on my own bank account file, on the other hand something that simultaneously filters with respect to file names and their content. For some reason, I would like it to be a single tool rather than a set of tools. That's how the musoq was born in my mind, with extensible plugins system and user defined grouping operators. All that Musoq does, you can achieve by "hand writting" all scripts manually however I found it usefull to automate this process and as a result avoid wasting time to create it. Fast querying was my goal. At a second glance, you might see that Musoq transpiles SQL code into C# code and then compiles it with Roslyn. In that case, writing script is redundant and all you have to do is to write a query and it will do the magic with your data source.
As the language looks like sql, it doesn't mean it is fully SQL compliant. It uses SQL syntax and repeats some of it's behaviour hoverwer, some differences may appear. It will also implement some experimental syntax and behaviours that are not used by database engines.
Hopefully, I will list the incompatibilities here:
Parent group aggregations
Non standard set operators based on keys rather than rows.
There is no support for huge sources exceeds memory
This project is licensed under the GPL-2.0 License - see the LICENSE file for details