Data Types
Column Types
Timestamps
Timestamps are interpreted to be timezoneless and stored as an offset from the UNIX epoch.
Supported conversions:
- Integer numeric types: Interpreted as UNIX timestamp in seconds
- Floating point numeric types: Interpreted as UNIX timestamp in seconds with decimal precision
- Strings: JDBC compliant java.sql.Timestamp format
YYYY-MM-DD HH:MM:SS.fffffffff
(9 decimal place precision)
Dates
Date values describe a particular year/month/day, in the form YYYY-MM-DD
. The range of values supported for the Date type is 0000-01-01 to 9999-12-31.
- cast(timestamp as date): The year/month/day of the timestamp is determined, based on the local timezone, and returned as a date value.
- cast(string as date): If the string is in the form
YYYY-MM-DD
, then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned.
Union Types
The first part in the deserialized union is the tag which lets us know which part of the union is being used. In this example 0
means the first data_type from the definition which is an int
and so on.
|
|
To create a union you have to provide this tag to the create_union
UDF:
|
|
Literals
Floating Point Types
Decimal literals provide precise values and greater range for floating point numbers than the DOUBLE type. Decimal types are needed for use cases in which the (very close) approximation of a DOUBLE is insufficient, such as financial applications, equality and inequality checks, and rounding operations.
Date Definition Statements
Create/Drop/Truncate Table
Managed and External Tables
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.
External table files can be accessed and managed by processes outside of Hive.
If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration.
CTSAS
CTAS(create table as select) has these restrictions:
- The target table cannot be a partitioned table.
- The target table cannot be an external table.
- The target table cannot be a list bucketing table.
##Macro and Function
Create/Drop Temporary Macro
|
|
Temporary Functions
|
|
Data Retrieval: Queries
HAVING Clause
The
LIMIT Clause
|
|
REGEX Column Specification
|
|
GROUP BY
Multiple aggregations can be done at the same time, however, no two aggregations can have different DISTINCT columns.
|
|
When using group by clause, the select statement can only include columns included in the group by clause. Of course, you can have as many aggregation functions in the select statement as well.
|
|
hive.map.aggr controls how we do aggregations. The default is false. If it is set to true, Hive will do the first-level aggregation directly in the map task. This usually provides better efficiency, but may require more memory to run successfully.
|
|
Order/Sort/Distribute/Cluster By
There are some limitations in the
The difference between
cluster By
is a short-cut for both distribute By
and sort By
.